BYOL-pytorch

An implementation of BYOL with DistributedDataParallel (1GPU : 1Process) in pytorch.
This allows scalability to any batch size; as an example a batch size of 4096 is possible using 64 gpus, each with batch size of 64 at a resolution of 224x224x3 in FP32 (see below for FP16 support).

Usage Single GPU

NOTE0: this will not produce SOTA results, but is good for debugging. The authors use a batch size of 4096+ for SOTA.
NOTE1: Setup your github ssh tokens; if you get an authentication issue from the git clone this is most likely it.

> git clone --recursive git+ssh://git@github.com/jramapuram/BYOL.git
# DATADIR is the location of imagenet or anything that works with imagefolder.
> ./docker/run.sh "python main.py --data-dir=$DATADIR \  
                                  --batch-size=64 \  
                                  --num-replicas=1 \  
                                  --epochs=100" 0  # add --debug-step to do a single minibatch

The bash script docker/run.sh pulls the appropriate docker container.
If you want to setup your own environment use:

environment.yml (conda) in addition to
requirements.txt (pip)

or just take a look at the Dockerfile in docker/Dockerfile.

Usage SLURM

Setup stuff according to the slurm bash script. Then:

> cd slurm && sbatch run.sh

Usage custom cluster / AWS, etc

Start each replica worker pointing to the master using --distributed-master=.
Set the total number of replicas appropriately using --num-replicas=.
Set each node to have a unique --distributed-rank= ranging from [0, num_replicas).
Ensure network connectivity between workers. You will get NCCL errors if there are resolution problems here.
Profit.

For example, with a 2 node setup run the following on the master node:

python main.py \
     --epochs=100 \
     --data-dir=<YOUR_DATA_DIR> \
     --batch-size=128 \                   # divides into 64 per node
     --convert-to-sync-bn \
     --visdom-url=http://MY_VISDOM_URL \  # optional, not providing uses tensorboard
     --visdom-port=8097 \                 # optional, not providing uses tensorboard
     --num-replicas=2 \                   # specifies total available nodes, 2 in this example     
     --distributed-master=127.0.0.1 \
     --distributed-port=29301 \
     --distributed-rank=0 \               # rank-0 is the master
     --uid=byolv00_0

and the following on the child node:

export MASTER=<IP_ADDR_OF_MASTER_ABOVE>
python main.py \
     --epochs=100 \
     --data-dir=<YOUR_DATA_DIR> \
     --batch-size=128 \                   # divides into 64 per node
     --convert-to-sync-bn \
     --visdom-url=http://MY_VISDOM_URL \  # optional, not providing uses tensorboard
     --visdom-port=8097 \                 # optional, not providing uses tensorboard
     --num-replicas=2 \                   # specifies total available nodes, 2 in this example
     --distributed-master=$MASTER \
     --distributed-port=29301 \
     --distributed-rank=1 \               # rank-1 is this child, increment for extra nodes
     --uid=byolv00_0

Setup data

Grab imagenet, do standard pre-processing and use --data-dir=${DATA_DIR}. Note: This SimCLR implementation expects two pytorch imagefolder locations: train and test as opposed to val in the preprocessor above.

FP16 support

If you have GPUs that works well with FP16, you can try the --half flag.
This will allow faster training with larger batch sizes (~95 with a 12Gb GPU memory).
If training doesn't work well try chaning the AMP optimization level here.

IO bound / Slow data processing?

Try increasing --workers-per-replica for dataloading or placing your dataset on a drive with larger IOPS.
Optionally, you can also try to use the Nvidia DALI image loading backend by specifying --task=dali_multi_augment_image_folder. However, the latter is missing the grayscale and gaussian blur augmentations, so model performance might be degraded.

Visualize results

This implementation supports tensorboard and visdom.
Omitting the --visdom-url and --visdom-port args defaults to tensorboard (which stores in ./runs).

Citation

Cite the original authors on doing some great work:

@article{DBLP:journals/corr/abs-2006-07733,
  author    = {Jean{-}Bastien Grill and
               Florian Strub and
               Florent Altch{\'{e}} and
               Corentin Tallec and
               Pierre H. Richemond and
               Elena Buchatskaya and
               Carl Doersch and
               Bernardo {\'{A}}vila Pires and
               Zhaohan Daniel Guo and
               Mohammad Gheshlaghi Azar and
               Bilal Piot and
               Koray Kavukcuoglu and
               R{\'{e}}mi Munos and
               Michal Valko},
  title     = {Bootstrap Your Own Latent: {A} New Approach to Self-Supervised Learning},
  journal   = {CoRR},
  volume    = {abs/2006.07733},
  year      = {2020},
  url       = {https://arxiv.org/abs/2006.07733},
  archivePrefix = {arXiv},
  eprint    = {2006.07733},
  timestamp = {Wed, 17 Jun 2020 14:28:54 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2006-07733.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Like this replication? Buy me a beer.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
datasets @ 7c5d0d9		datasets @ 7c5d0d9
docker		docker
helpers @ 3b7b824		helpers @ 3b7b824
optimizers		optimizers
slurm		slurm
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
objective.py		objective.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BYOL-pytorch

Usage Single GPU

Usage SLURM

Usage custom cluster / AWS, etc

Setup data

FP16 support

IO bound / Slow data processing?

Visualize results

Citation

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

jramapuram/BYOL

Folders and files

Latest commit

History

Repository files navigation

BYOL-pytorch

Usage Single GPU

Usage SLURM

Usage custom cluster / AWS, etc

Setup data

FP16 support

IO bound / Slow data processing?

Visualize results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages