Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Turn your computer into lxplus with a single command #20

Open
ibab opened this issue Feb 16, 2016 · 42 comments
Open

Turn your computer into lxplus with a single command #20

ibab opened this issue Feb 16, 2016 · 42 comments

Comments

@ibab
Copy link
Member

ibab commented Feb 16, 2016

Date: TBD
Time: TBD
Location: TBD
Vidyo for remote participants (just click on the link to proceed): CERN_OpenScience
Session format: Work-Along


Lesson guide: @ibab, @lukasheinrich
Lesson materials: Coming…
Slides: Coming…


Join the chat at https://gitter.im/CERNStudyGroup/cernstudygroup.github.io
If you haven't done so yet, introduce yourself (#26) and list your project(s) (#24).


Original post:

Using docker, we can easily get access to tools available on cvmfs by running:

docker run -t -i --rm --privileged hepsw/cvmfs-lhcb bash

(Example for lhcb)
This allows you to easily use and work on your collaboration's software on your laptop, workstation or server.

This could also allow you to make workflows that depend on custom physics software reproducible.
(Just include a Dockerfile or script that calls docker in your repo)

Would someone be interested in learning more about this in one of the study groups?

Edit: Changed sbinethepsw

@ibab ibab changed the title Turn your server into lxplus with a single command Turn your computer into lxplus with a single command Feb 16, 2016
@lukasheinrich
Copy link

Hi,

I've been very interested in getting a lxplus like environment on top of cern/slc6-base. My start was this:

https://github.com/lukasheinrich/hepsw-docker/tree/master/lxpluslike

image at: https://hub.docker.com/r/lukasheinrich/lxplus-like/

basically I tried to snapshot what yum has installed on lxplus and tried to install the same. The same steps could be either done on top of hepsw/cvmfs-lhcb (I think @sbinet retrired the binet/* images) or on top of plain scl6-base with the option to mount cvmfs at runtime via

docker run -v /cvmfs:/cvmfs ...

So I would love to see this explored more. Maybe some people have good contacts to LXPLUS people? (@pherterich ? )

@ibab
Copy link
Member Author

ibab commented Feb 16, 2016

Yeah, maybe we can come up with some default images people might want to use and write some instructions on how they can be used to do interesting things.

docker run -v /cvmfs:/cvmfs

Wouldn't this require you to set up cvmfs on the host?

@lukasheinrich
Copy link

yes this will require a cvmfs setup on the host. But the advantage is that you don't need to run the container in privileged mode. Also, the docker image itself doesn't need to know anything about cvmfs which keeps the images more lightweight.

@lukasheinrich
Copy link

btw this we use exactly this /cvmfs mounting + custom lightweight docker images for RECAST. On top of the /cvmfs mount people can also request to have a /afs mounted (I'm working on getting /eos to work). Finally some workflow steps need GRID authentication, so for those we mount a path /recast_auth which includes a script /recast_auth/getmyproxy.sh this will then use the host's host certificate to get a MyProxy which has previously been placed on CERNs MyProxy server.

@adavidzh
Copy link

I have been fiddling with docker-machine and docker exactly for this purpose: "lxlaptop". Up to now I was getting stuck in finding a reasonable Dockerfile or image to start from.

@lukasheinrich
Copy link

I think it's a good idea to start from cern/slc6-base and install libraries on top to make it more similar to lxplus like in the example I posted.

@ibab
Copy link
Member Author

ibab commented Feb 16, 2016

👍 on starting with cern/slc6-base.

@lukasheinrich: Do you mean this RECAST? http://recast.perimeterinstitute.ca/
Looks interesting!

@adavidzh
Copy link

I just found https://github.com/hepsw/docks
Thanks for getting me on the right track :)

@lukasheinrich
Copy link

@ibab yes that's the right website but it's the old version and we (@cranmer, me, and a couple of other people) have been making good progress on docker-based workflows for analysis (i.e. you can have separate docker images for each workflow step). If you have LHCb Analysis Code, it would be great to try this out!

@adavidzh I also have a similar repository as @sbinet which I would like to merge into the hepsw organization that @sbinet setup, but I haven't gotten to it yet. Check this out:

https://github.com/lukasheinrich/hepsw-docker

@ibab
Copy link
Member Author

ibab commented Feb 16, 2016

@lukasheinrich: Sounds good! Some people at LHCb are experimenting with automated analysis workflows (like producing all plots starting from some initial dataset).
We could try to plug our code into your system.

@lukasheinrich
Copy link

that would be great. Can you put me in touch with them?

@ibab
Copy link
Member Author

ibab commented Feb 16, 2016

Do you have access to the lhcb-collaborative-working@cern.ch egroup?
That's the mailing list we use for that (and other stuff).

@lukasheinrich
Copy link

no it seems closed to LHCb members only. By the way, I have been in touch with @anaderi on including LHCb into RECAST. Maybe we can work form there? I would also be happy to present a short overview of what we already have in our infrastructure to the group (in case you have regular meetings)

@seneubert
Copy link

@lukasheinrich maybe you can discuss on one of the next collaborative-working meetings?

@lukasheinrich
Copy link

that would be great. I'll include you in an email thread that we have had for a while, maybe we can discuss this further there.

@adavidzh
Copy link

Looks to me that LHCb is having a lot of good fun.
When I mention Docker, all I get is shrugs around here.

@lukasheinrich
Copy link

hey @adavidzh, somewhat similar to ATLAS, but we're getting there. We are in touch with a couple of CMS people as well (Ken Bloom, Mike Hildreth..) so maybe we can get something going. It seems like I will be giving a talk next Monday for the LHCb group, maybe you can get one-time access as well?

PS: i'll include you in the same email thread as the others

@betatim
Copy link

betatim commented Feb 17, 2016

Getting back to docker for analysis: we should have a lesson on this! Showing people how easy it is to get going and have a lxlaptop. One thing I am undecided on is whether you should make use of cvmfs or not. I think it depends a lot on what you want to use the container for. Quick play, day to day use it is probably fine but for reproducible analysis relying on cvmfs is not that useful as it seems cvmfs is not immutable (which is what you need). Is there a tool to create a container that takes stuff from cvmfs during build time, and then somehow freezes it?

@adavidzh
Copy link

Well, in CMS whatever is in cvmfs can also be installed in standalone (I got a working Dockerfile to do that last night, but it's still coming at 16 GB).

@sbinet
Copy link

sbinet commented Feb 17, 2016

for CVMFS-free images, I have provided these 2 as a proof of concept for LHCb:

so, the basic one with just Gaudi installed (from RPMs) and a more involved one, DaVinci which is the analysis flavoured framework of LHCb.

this is detailed in a CHEP paper: https://inspirehep.net/record/1413180?ln=en

just to give an idea on the sizes of these images:

hepsw/lhcb-gaudi   v26r1   3.911 GB
hepsw/lhcb-davinci v36r5   7.790 GB

and for the CVMFS-based ones:

hepsw/cvmfs-base 20150331 629.4 MB
hepsw/cvmfs-lhcb 20150331 629.4 MB

(hepsw/docks has also CVMFS base images for Alice, Atlas and CMS)

AFAIK, there is no tool to extract the meaningful set of files under a CVMFS mount point for a given workload as files are read/downloaded on a JIT basis...
I mean, no tool, except for a bare find /cvmfs/experiment -print, which would work IFF the software is installed such as /cvmfs/experiment/app/version with everything needed under that directory (otherwise, you'd just curl the whole CVMFS content...)

@lukasheinrich
Copy link

So my experience in ATLAS is that we can get away with a reasonable image sizes at least for our analysis software by installing everything within Docker without /cvmfs access. This is easier built purely on top of ROOT or using a bare bones Gaudi/Athena based release. For running actual reconstruction type jobs, I only tried by mounting /cvmfs at runtime. That works well, but @betatim is right that it creates mutability. One reference I found for cvmfs versioning is here:

https://indico.cern.ch/event/444264/session/0/contribution/0/4/attachments/1211574/1787889/JBGG-UseOfCernVM.pdf

see slide 7. It seems like CVMFS might have something like a commit number, but I'm not sure how that works.

On a Mac, mounting /cvmfs is a bit trickier. The docker-machine runs with user-permissions, while the /cvmfs mountpoint is root-owned, which does not work well together. To get around, I create a user-owned mount ponit in my home directory, manually mount cvmfs on that and then bind docker run -v $HOME/cvmfs:/cvmfs which then works well.

Cheers,
Lukas

@adavidzh
Copy link

For CMS, by chaining the RPM installation in a single RUN command, I am now at 12.6 GB. This is starting FROM kreczko/puppet-builder because I actually used kreczko/cmssw-standalone by @kreczko as the skeleton.

Is there a better (leaner, closer to lxplus, etc) base image?

@lukasheinrich
Copy link

@adavidzh feel free to try lukasheinrich/lxplus-like

this is on top of slc6-base and has most yum packages instaleld that LXPLUS has.

Though this is certainly not lean :-/

@kreczko
Copy link

kreczko commented Feb 17, 2016

@adavidzh indeed, the containers I was playing around with not optimal as their size can get up to 19 GB. However, if you need CMSSW on a computer that does not have network all the time, kreczko/cmssw-standalone can be useful.

That said, hepsw/cvmfs-cms is much better for most cases.

@adavidzh
Copy link

@lukasheinrich: starting from 11 GB is not promising ;) (the CMS software comes in at ~6 GB).
@kreczko: kreczko/cmssw-standalone has "only" the problem of not building in Docker Hub (I suppose because of size).

@sbinet
Copy link

sbinet commented Feb 17, 2016

isn't http://docker.cern.ch/howtopr supposed to tackle this size-on-the-hub issue?

@kreczko
Copy link

kreczko commented Feb 17, 2016

I did not know CERN had their own registry. That it good news.

@lukasheinrich
Copy link

I guess there is a somewhat irreduciple compromise that we have to make. if you want to have something LXPLUS like, meaining to be able to get stuff from cmvfs and expect it to work, you somehow depend on having system libraries installed that these cvmfs software depends on (think: getting a ROOT release, which depends on libX11 etc being installed, etc). Of course it would be a bit nicer if we could get meaningful images that only have a subset of those installed to keep the image size down. But then this will only work with a subset of cvmfs software that this image was tailored to.

@lukasheinrich
Copy link

@kreczko yes, they are working on this. Also they are preparing a Google Container Enging-like service that you will able to control with Docker Swarm and Kubernetes. But this is still in a early development phase.

@adavidzh
Copy link

@sbinet docker.cern.ch is really cool. Thanks!
@lukasheinrich the problem is indeed that in CMS, every single package is taken from the CMS repo, not from SLC. So in a sense, the leaner the starting image, the "better".

@lukasheinrich
Copy link

@adavidzh interesting, so CMS maintains its own list of low-level system libraries it needs (stuff I had to deal with e.g. include various openssl libs etc)?

@lukasheinrich
Copy link

Hi everyone,

I just gave a short presentation to the ATLAS analysis software group, and got some useful input / suggestions.

One thing that has been traditionally hard to do from e.g. a Mac or Windows was to get access to the Grid. One solution is of course to have cvmfs installed, bind this to docker (analogously to the discussion above), and setup the Grid middleware from there.

But to have a nicer encapsulation, I managed to get a minimal image on top of cern/slc6-base that has everything shipped natively.

The example for now is ATLAS (panda is the ATLAS job submission interface), but the first couple of steps in the Dockerfile should be generalizable to other experiments

https://github.com/lukasheinrich/asg-docker/blob/master/preasg-base/Dockerfile

with that I can nicely submit Grid jobs from a Mac (just bind-mount a directory that has your certificate and key in it)

asciicast

Maybe people from the other expts could try getting their Grid layers to work?

Cheers,
Lukas

@RaoOfPhysics
Copy link
Member

Sooooo, what's the situation with this lesson?

@ibab
Copy link
Member Author

ibab commented Apr 22, 2016

Do you already have someone for May 14th?
I'll be at CERN during that time.
Maybe @lukasheinrich or someone else wants to cooperate on the lesson?

@RaoOfPhysics
Copy link
Member

Nothing scheduled. Also, I think the Friday is 13 May. :)

@ibab
Copy link
Member Author

ibab commented Apr 22, 2016

Yes, Friday the 13th 😨

@RaoOfPhysics
Copy link
Member

Sooooo, shall we schedule this? :D

@ibab
Copy link
Member Author

ibab commented Apr 22, 2016

Let's wait a bit to see if someone wants to help.
I also need to come up with some good ideas on what to show during the session.

@RaoOfPhysics
Copy link
Member

Ok. :)

@lukasheinrich
Copy link

hi @ibab,

yes I would definitely be interested in helping out. should we try to skype or something in the next week or so?

Cheers,
Lukas

@ibab
Copy link
Member Author

ibab commented Apr 25, 2016

Yes, next week is good!

@RaoOfPhysics
Copy link
Member

Hey @ibab, @lukasheinrich: Just confirming that we're on for this Friday. Please let me know! Thanks. :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

8 participants