Missing file and a question regarding DPO training #2

SachinVashisth · 2025-01-17T12:47:58Z

Hi

Missing File
I am trying to run the Step 2.3 i.e Sampling pseudo-labels from DPO model using the command

ARGS='+data.split="train" eval.mode="sampling" eval.sampling.max_seed=3'
torchrun --nproc_per_node 2 greedy_decode.py --config-name=dpo-1 $ARGS
python3 eval_sampling.py --config-name=dpo-1 $ARGS
python3 utils/make_rft_data.py --config-name=dpo-1

But the file greedy_decode.py seems missing. can you please provide the file?

Regarding DPO training
In the paper, it is mentioned that the training was done on a single NVIDIA A40 GPU.
I am currently working on a remote server that has two NVIDIA A40 GPUs with 48 GB of Cuda memory each.

But when I ran the commands given in Step 2.2: Train SFT model with DPO objective, then I received the out-of-memory error. When I made changes to the following variables in dpo-1.yaml, then only I was able to train:

per_device_train_batch_size: 3

per_device_eval_batch_size: 2

eval:
  per_device_eval_batch_size: 48

However, I want to clarify that the version of the trl and transformers library mentioned in the requirements file was not working somehow. For me, these versions worked:

trl==0.13.0
transformers==4.46.0

The text was updated successfully, but these errors were encountered:

TianduoWang · 2025-01-17T13:06:11Z

Hi,

Thanks for your information!

For the Missing File, greedy_decode.py is actually from a previous version and it is deprecated now. You may use generate.py to generate the pseudo-labels.

Regarding DPO training, I think we don't mention the training is performed on single A40 GPU in our paper. Actually, we use a 8*A40 GPU server to perform the training. We only mention single A40 GPU in our Figure 4, which is used to illustrate the inference speedup.

Gank0078 · 2025-01-21T12:58:51Z

Hi, I have a similar question. I run the code on 4 A6000 GPUs (each GPU has 48GB memory). I want to know how to change the config files so that I can train the model.
So far, I’ve tried reducing per_device_train_batch_size in sft-0.yaml and setting num_processes in the fsdp.yaml file to 4, but I’m still encountering out-of-memory errors. Are there any other parameters that should be adjusted to reduce the memory usage during training?
Additionally, as far as I know, the model seems to have been full fine-tuned. I’m curious why methods like LoRA weren’t used for fine-tuning instead. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing file and a question regarding DPO training #2

Missing file and a question regarding DPO training #2

SachinVashisth commented Jan 17, 2025

TianduoWang commented Jan 17, 2025

Gank0078 commented Jan 21, 2025

Missing file and a question regarding DPO training #2

Missing file and a question regarding DPO training #2

Comments

SachinVashisth commented Jan 17, 2025

TianduoWang commented Jan 17, 2025

Gank0078 commented Jan 21, 2025