-
Notifications
You must be signed in to change notification settings - Fork 100
Suggestion: Distribute SubT models through multiple of smaller docker layers #700
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
I believe the images are built here: Lines 73 to 84 in ad0c3e1
Note the |
My knowledge on docker is minimal. I thought layer hashes are based on content of the layer and preceding layers (Like in block chains) So I thought we'll still get same set of base layers as long as base image/ros package versions stay same even though no caching used. May be I'm mistaken. Anyway my major point is gigantic monolithic layer. Even though it's very painful experience, I'm still fine with having to download dozen of GB as long as it's split into small layers so I could resume if connection lost in the middle. Probably still there is replication of resources (Like shader images etc) among tiles that could be optimized for resource usage. But guess it requires major refactoring to resolve this. |
Nope, the Models actually are downloaded before cloning the repo: https://github.com/osrf/subt/blob/ad0c3e1e6cf13236116e8f1bc4f2fc71800835b6/docker/cloudsim_sim/Dockerfile . I think the problem is that as the release script uses Of course there are even better ways - e.g. building a resource-only docker using multi-stage docker build, which could create a layer that contains only the model files and not the programs needed to download them. That could be a pretty stable and infrequently changed image. As a last option, you can always build the docker images yourself using the referenced release script. When building the image, resources will get downloaded one-by-one, which might be easier for your connection. However, I still think that if the connection drops for more than a few hundred milliseconds to re-negotiate a download, the whole download step will fail anyways and the downloaded models won't get cached. That's the time when you can start playing with editing the Dockerfile, or running it without the |
@peci1 Yes I agree with you on importance of using I overlooked the order of model download and repo cloning. Yes I'm aware that it would have a different hash if any of layers above changed in between a build (OS image, ign and any dependencies, basically very high probability unless we push it as much as up in layer list as possible or using a multi stage build as you suggested) I also have observed another avenue of optimization. I see lot of models share same set of resources specially texture images. when time permits I'll check if we could symlink them to one model after download without breaking the simulation. (Here I'm not suggesting to change models in ign fuel server but to consolidate after downloading them) Probably this simply a problem only I'm complaining. May be I should simply restrict local development to catkin installation and use an on cloud development workflow for docker image building and testing. |
Models have been broken out into a separate docker image from the source code. See PR #718 The layer size for the models has been reduced to 3.82GB. |
I guess #718 could close this issue? I know the OP requested more model layers, but it seems that now only the tech repo gets downloaded, so that should do it. What happened to the other models like niosh that were previously downloaded separately? |
I also believe newer refactoring of docker images fulfill the intended requirement. Just couldn't test the latest build yet though. Any maintainers feel free to close this at your discretion if it is required. |
It would be helpful to have smaller multiple layers rather than one gigantic layer when using docker pull using slower and unreliable connection. (Specially given that it's not trivial to resume if a layer download breaks in middle. While it's not trivial at least it's possible to resume from completed layers. so having multiples of smaller layers are helpful in such a situation)
So I suggest at least breaking 6GB+ layer containing all models into 3 separate layers containing resources from each circuit.
Also It would be beneficial to have subt repo cloning after model layers (Assuming models would change relatively infrequently) so model layers doesn't have to be pulled again each time subt repo is changed
The text was updated successfully, but these errors were encountered: