Repository preparation
git clone https://github.com/boat1603/SuperAI_LLM_FineTune.git
cd ./SuperAI_LLM_FineTune
ml Mamba
conda create -p ./env python=3.10.0 -y
conda activate ./env
pip install -e .
ml Apptainer
apptainer build ./llm-finetune.sif docker://boat1603/llm-finetune:latest
sbatch submit_multinode.sh
for Apptainer
sbatch submit_multinode_apptainer.sh
Note:
- Change training config via
./smultinode.sh
or./smultinode_apptainer.sh
(for apptainer). - When using Deepspeed training Scheduler will follow the Deepspeed config.
- You can setup training spec in
./submit_multinode.sh
orsubmit_multinode_apptainer.sh
following our guideline.
sbatch ./submit_zero_to_fp32.sh