🔥 Our paper Sparse Mixture-of-Experts are Domain Generalizable Learners has officially been accepted as ICLR 2023 for Oral presentation.
🔥 GMoE-S/16 model currently ranks top place among multiple DG datasets without extra pre-training data. (Our GMoE-S/16 is initilized from DeiT-S/16, which was only pretrained on ImageNet-1K 2012)
Wondering why GMoEs have astonishing performance? 🤯 Let's investigate the generalization ability of model architecture itself and see the great potentials of Sparse Mixture-of-Experts (MoE) architecture.
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
python3 -m pip uninstall tutel -y
python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
pip3 install -r requirements.txt
python3 -m domainbed.scripts.download \
--data_dir=./domainbed/data
Environment details used in paper for the main experiments on Nvidia V100 GPU.
Environment:
Python: 3.9.12
PyTorch: 1.12.0+cu116
Torchvision: 0.13.0+cu116
CUDA: 11.6
CUDNN: 8302
NumPy: 1.19.5
PIL: 9.2.0
Train a model:
python3 -m domainbed.scripts.train\
--data_dir=./domainbed/data/OfficeHome/\
--algorithm GMOE\
--dataset OfficeHome\
--test_env 2
We put hparams for each dataset into
./domainbed/hparams_registry.py
Basically, you just need to choose --algorithm
and --dataset
. The optimal hparams will be loaded accordingly.
This source code is released under the MIT license, included here.
The MoE module is built on Tutel MoE.