Skip to content

Suggestion: Distribute SubT models through multiple of smaller docker layers #700

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
AravindaDP opened this issue Nov 11, 2020 · 7 comments
Labels
enhancement New feature or request

Comments

@AravindaDP
Copy link

AravindaDP commented Nov 11, 2020

It would be helpful to have smaller multiple layers rather than one gigantic layer when using docker pull using slower and unreliable connection. (Specially given that it's not trivial to resume if a layer download breaks in middle. While it's not trivial at least it's possible to resume from completed layers. so having multiples of smaller layers are helpful in such a situation)

So I suggest at least breaking 6GB+ layer containing all models into 3 separate layers containing resources from each circuit.

Also It would be beneficial to have subt repo cloning after model layers (Assuming models would change relatively infrequently) so model layers doesn't have to be pulled again each time subt repo is changed

@zbynekwinkler
Copy link

I believe the images are built here:

subt/docker/release.bash

Lines 73 to 84 in ad0c3e1

if [[ "$COMMAND" == "--build" ]] || [[ "$COMMAND" == "--all" ]]; then
echo -e "\e[1;34m## Building cloudsim_sim image\e[0m\n"
./build.bash cloudsim_sim --no-cache
echo -e "\e[1;34m## Building cloudsim_bridge image\e[0m\n"
./build.bash cloudsim_bridge --no-cache
# Only build the subt_sim_entry if building for the public dockerhub registry
if [[ "$repo" != *"private" ]]; then
echo -e "\e[1;34m## Building subt_sim_entry image\e[0m\n"
./build.bash subt_sim_entry --no-cache
fi
fi

Note the --no-cache. I believe we get no caching benefits for subsequent releases.

@AravindaDP
Copy link
Author

AravindaDP commented Nov 11, 2020

My knowledge on docker is minimal. I thought layer hashes are based on content of the layer and preceding layers (Like in block chains) So I thought we'll still get same set of base layers as long as base image/ros package versions stay same even though no caching used. May be I'm mistaken.

Anyway my major point is gigantic monolithic layer. Even though it's very painful experience, I'm still fine with having to download dozen of GB as long as it's split into small layers so I could resume if connection lost in the middle.

Probably still there is replication of resources (Like shader images etc) among tiles that could be optimized for resource usage. But guess it requires major refactoring to resolve this.

@peci1
Copy link
Collaborator

peci1 commented Nov 18, 2020

Nope, the --no-cache is only for build time of the image. It means that no cached layers should be used during the image build. But if you pull the final image to another machine, it will of course use the layer cache (unless you tell it not to). As docker's layer caching is based on the text of the command after RUN and not some results or side-effects of the commands, the --no-cache is basically needed to tell Docker to rebuild the whole thing. Without --no-cache, the apt update would only be run the first time you build the image, and all subsequent builds would consider it as cached and thus not call it. But that's often not what you want.

Models actually are downloaded before cloning the repo: https://github.com/osrf/subt/blob/ad0c3e1e6cf13236116e8f1bc4f2fc71800835b6/docker/cloudsim_sim/Dockerfile . I think the problem is that as the release script uses --no-cache, each rebuild will pull some updated dependency package, which then invalidates all following layers. To get the caching behavior you ask for, it'd be needed to install ign fuel with its minimum dependency set, then download models, and then install the rest. And hope that ign-fuel dependencies do not get updates so often.

Of course there are even better ways - e.g. building a resource-only docker using multi-stage docker build, which could create a layer that contains only the model files and not the programs needed to download them. That could be a pretty stable and infrequently changed image.

As a last option, you can always build the docker images yourself using the referenced release script. When building the image, resources will get downloaded one-by-one, which might be easier for your connection. However, I still think that if the connection drops for more than a few hundred milliseconds to re-negotiate a download, the whole download step will fail anyways and the downloaded models won't get cached. That's the time when you can start playing with editing the Dockerfile, or running it without the --no-cache switch.

@nkoenig nkoenig added the enhancement New feature or request label Nov 18, 2020
@AravindaDP
Copy link
Author

@peci1 Yes I agree with you on importance of using --no-cache to make sure that build is not using a cache layer from earlier build that contains older apt packages.

I overlooked the order of model download and repo cloning. Yes I'm aware that it would have a different hash if any of layers above changed in between a build (OS image, ign and any dependencies, basically very high probability unless we push it as much as up in layer list as possible or using a multi stage build as you suggested)

I also have observed another avenue of optimization. I see lot of models share same set of resources specially texture images. when time permits I'll check if we could symlink them to one model after download without breaking the simulation. (Here I'm not suggesting to change models in ign fuel server but to consolidate after downloading them)

Probably this simply a problem only I'm complaining. May be I should simply restrict local development to catkin installation and use an on cloud development workflow for docker image building and testing.

@nkoenig
Copy link
Contributor

nkoenig commented Dec 17, 2020

Models have been broken out into a separate docker image from the source code. See PR #718

The layer size for the models has been reduced to 3.82GB.

@peci1
Copy link
Collaborator

peci1 commented Jan 12, 2021

I guess #718 could close this issue? I know the OP requested more model layers, but it seems that now only the tech repo gets downloaded, so that should do it. What happened to the other models like niosh that were previously downloaded separately?

@AravindaDP
Copy link
Author

I also believe newer refactoring of docker images fulfill the intended requirement. Just couldn't test the latest build yet though. Any maintainers feel free to close this at your discretion if it is required.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants