Tool for setting up singularity overlays with miniconda - official NYU Greene docs
...because nobody likes doing it (until now)
✨ Here's what you could look like ✨
singuconda.webm
Running singuconda will give you a magic little ./sing
🧚🏾♀️ command in your current directory that:
- autodetects GPUs and will automagically add the
--nv
flag - remembers the path to your overlay and sif images so all you have to do is
./sing
- It automatically sources your
env
file for you (the one from the tutorial) - also creates a
singrw
script that mounts the overlay in read-write mode (./sing
mounts with read-only so you can have multiple scripts using it) - Has full support for both interactive shells (
./sing
) and scripts (./sing <<< "type -P python"
) which will run the command and exit. This is what's used in sbatch files! - It accepts additional arguments so you can do
./sing -o /scratch/work/public/ml-datasets/coco/coco-2017.sqf:ro
to mount additional overlays (for example)
The ~/singuconda
script itself:
- has autocomplete for all of the overlays and sif files
- automatically installs miniconda and lets you optionally pick a python version
ssh greene # or whatever your environment is
curl -L https://github.com/beasteers/singuconda/raw/main/singuconda --output ~/singuconda
chmod +x ~/singuconda
The singuconda command should always be run from the directory where you want your overlay and sing script to live.
But once they're created, the sing
script can be run from anywhere.
# cd to your projects directory
mkdir myproject
cd myproject
# make magic!
~/singuconda
The script will create some helper scripts for you:
./sing
run the singularity container in read-only mode - use this to run many containers at once./singrw
run the singularity container in read-write mode - use this to install packages
Those commands above will create interactive sessions. If you want to run a script/commands in singularity (e.g. in a sbatch file), you can do this:
echo 'python script.py' | ./sing
./sing <<< 'python script.py'
./sing <<EOF
python script.py
EOF
./sing <<< "
python script.py
"
Any arguments you provide will be passed to the singularity command.
# e.g. mount squashfs files
./sing -o path/to/dataset.sqf <<< "
python train.py
"
If you do this while you're inside a git repository, you may want to ignore the generated files.
Here's a list of rules to filter them.
# the overlay file
*.ext3
# singuconda: start scripts
sing
singrw
# the singularity container associated with the overlay
.*.sifpath
You can customize behavior using environment variables. Set these in your ~/.bashrc
# in case you prefer to scream: SING_CMD="aagh"
export SING_CMD="sing"
# not everyone is at NYU
export SING_OVERLAY_DIR="/scratch/work/public/overlay-fs-ext3"
export SING_SIF_DIR="/scratch/work/public/singularity"
# personal preferences
export SING_DEFAULT_OVERLAY="overlay-5GB-200K.ext3.gz"
export SING_DEFAULT_SIF="cuda11.0-cudnn8-devel-ubuntu18.04.sif"
~/singuconda
# if you have multiple overlays in the same directory
SING_NAME=other ./sing
to delete the sing environment, just do:
rm *.ext3 sing singrw .*.sifpath
singuconda does allow creating multiple overlays in the same directory. When you use singuconda to setup a second overlay in the same directory, it will overwrite the sing command to point to your newer overlay.
If you want to use your first overlay, you can override the overlay using SING_NAME=my-first-sing ./sing
(assuming your overlay is called my-first-sing.ext3
).
Yep that's gonna happen if you do that! But don't fret.
You need to rename the hidden file that contains what SIF file you want to use.
OLD_SING_NAME=my-first-sing
NEW_SING_NAME=better-name
mv ".${OLD_SING_NAME}.sifpath" ".${NEW_SING_NAME}.sifpath"
And then you're going to want to edit ./sing
and ./singrw
to point to your new overlay name.
change
SING_NAME="${SING_NAME:-my-first-sing}"
to
SING_NAME="${SING_NAME:-better-name}"
Just run ~/singuconda
again! It'll ask you if you want to configure an existing one or create a new one.
Well that's a bummer! But I've done that too. Unfortunately, there's not a super convenient way, but fortunately it's very easy to just start over! (which is what I always do).
If you need to, I suppose you could try creating a new overlay, then mount both overlays and try to
copy between them, but I'm not sure how to mount the second overlay
to a different directory (because afaik right now they'd both mount to /ext3
).
./singrw -o my-too-small-overlay.ext3 # uh oh! collision? I should test this lol
This is just a common singularity error that happens because no other processes can be using the overlay while it's in write mode.
FATAL: while loading overlay images: failed to open overlay image ./overlay.ext3: while locking ext3 partition from /scratch/bs3639/ego2023/InstructBLIP_PEFT/blip.ext3: can't open /scratch/bs3639/ego2023/InstructBLIP_PEFT/blip.ext3 for writing, currently in use by another process
So you have to find which one of your processes is still running (background screen, tmux, sbatch, ..) and either wait for them to finish, or kill the processes.
ps -fu $USER | grep tmux
One time, I spent an hour trying to hunt down the process and I swear I couldn't find it, so I just:
# move it out of the way
mv overlay.ext3 overlay1.ext3
# and made a copy
cp overlay1.ext3 overlay.ext3
# now the lock is on overlay1.ext3 :)
It will go through a series of prompts. What happens:
- pick an overlay file
- pick a sif file
- install miniconda and allows you to select a specific python version if you want
- adds the startup environment script (/ext3/env)
- menu to install packages in the container
- create shortcut script(s) for running the container
Then you're all done!
You can re-run it if you want to change anything (sif file, python version, installs).
This was built for NYU Greene's environment, but it should apply elsewhere too!
env GOOS=linux GOARCH=amd64 go build .
So we have something to copy and paste from ;)
#!/bin/bash
#SBATCH -c 8
#SBATCH --mem 8GB
#SBATCH --time 8:00:00
#SBATCH --gres gpu:1
#SBATCH --job-name=myjob
#SBATCH --output logs/job.%J.out
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<YOUR_USERID>@nyu.edu
../sing << EOF
python blah.py ...
EOF
And for jupyter:
#!/bin/bash
#SBATCH -c 8
#SBATCH --mem 24GB
#SBATCH --time 8:00:00
#SBATCH --gres gpu:1
#SBATCH --job-name=jupyter
#SBATCH --output logs/jupyter.out
port=$(shuf -i 10000-65500 -n 1)
/usr/bin/ssh -N -f -R $port:localhost:$port log-1
/usr/bin/ssh -N -f -R $port:localhost:$port log-2
/usr/bin/ssh -N -f -R $port:localhost:$port log-3
echo "To access:"
echo "ssh -L $port:localhost:$port $USER@greene.hpc.nyu.edu"
echo "ssh -L $port:localhost:$port greene"
./singrw << EOF
python -m ipykernel install --name sing --user
jupyter lab --no-browser --port $port
EOF
Remember that you have to open a new ssh session and forward the port. Check
logs/jupyter.out
for the port number.
Put your things in your home directory
ln -s /scratch/$USER ~/scratch
ln -s /vast/$USER ~/vast
ln -s /archive/$USER ~/archive
For your ~/.bashrc
:
# convenience commands for watching squeue
export SQUEUEFMT='%.18i %.9P %.32j %.8u %.8T %.10M %.9l %.6D %R'
alias msq='squeue --me -o "$SQUEUEFMT"'
alias wsq='watch -n 2 "squeue --me -o \"$SQUEUEFMT\""'
alias wnv='watch -n 0.1 nvidia-smi'
# lets me know when my bashrc is sourced
[[ $- == *i* ]] && echo 'hi bea :)'
If you manage to get this fully working, please post how you did it here! #7
Setup your ssh config on your local computer like this: vim ~/.ssh/config
Host sing
User YOUR-NETID # CHANGE
HostName cs022 # YOU WILL HAVE TO CHANGE ME EVERY TIME YOU SUBMIT A JOB
RemoteCommand /path/to/sing # CHANGE
RequestTTY yes # needed for sing to work
ProxyCommand ssh greene nc %h %p 2> /dev/null
Host greene
HostName greene.hpc.nyu.edu
User YOUR-NETID
ServerAliveInterval 120
ForwardAgent yes # for git push over ssh
# greene changes their signature for some reason?? So you have to do this to avoid errors
Host greene sing
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
You can test this by setting HostName to log-1 and doing ssh sing
. If all is successful, you should go straight into singularity (remember no running code on the login node).
In VSCode, open settings (CMD-",") and enable "Remote.SSH: Enable Remote Command".
# I don't need nothin fancy
srun -c 1 -t 8:0:0 --mem 8GB "sleep infinity"
# OR
# gimme gpu pls
srun -c 12 -t 6:0:0 --mem 64GB --gres gpu:1 "sleep infinity"
Get the node's name e.g. cs022
# lets see what node I got
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
43351615 cs bash bs3639 R 1:11:40 1 cs022
on your local computer: vim ~/.ssh/config
Host sing
HostName cs022 # UPDATE
- CMD-Shift-P to open the command palette.
- Type "Connect to Host"
- Select host "sing"
Or if you're already in a remote window (job died, you submitted another), just run "Reload Window" instead.
you can do ssh sing
and end up in a singularity container just fine.
But vscode uses ssh -T
and just timeouts when connecting.