C++/MPI proxies for distributed training of deep neural networks, including ResNet-50
, ResNet-152
, BERT-large
, CosmoFlow
, GPT-2
, GPT-3
, etc. These proxies cover data parallelism
, operator parallelism
, pipeline parallelism
, MoE
, and hybrid parallelism
with a combination of the aforementioned parallel modes.
mpicxx gpt2_large.cpp -o gpt2
mpirun -n 32 ./gpt2
Setup the number of Transformer layers and the number of pipeline stages:
mpirun -n 32 ./gpt2 64 8