Multi-GPU Training with DataParallel Results in RuntimeError #1

ronigold · 2023-09-21T15:18:52Z

Description
I am trying to perform multi-GPU training using the DataParallel wrapper from PyTorch. When I try to run the fit method, I encounter a RuntimeError saying that the parameters and buffers must be on the same device.

Here's a snippet of the code that I am using:

# Initialize learner and model
learn = Learner(...)
learn.model = ...

# Attempt to use DataParallel
model = nn.DataParallel(learn.model, device_ids=[1, 2, 3])
learn.model = model

# Update DataLoader device
learn.dls.device = torch.device("cuda:1")

# Clear cache
torch.cuda.empty_cache()

# Start training
learn.fit(1)

Error Message
The error message I receive is:

RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:3

Environment
PyTorch version: (e.g., 1.9.0)
Library version: (e.g., 0.2.0)
CUDA/cuDNN version: (e.g., CUDA 11.8, cuDNN 8.2.1)
GPU models and configuration: (e.g., 4x Tesla T4)
Operating System: (e.g., Ubuntu 18.04)

Additional Context
I've tried to set both the model and the DataLoader to the same device but without success. It seems like the model parameters and DataLoader are ending up on different devices during the training, causing the error.

Would appreciate any guidance on how to resolve this issue or if it's something that needs to be addressed in the library.

The text was updated successfully, but these errors were encountered:

ronigold · 2023-09-21T17:15:54Z

Update:

I was able to perform training according to what appeared in the notebook but without multiple GPU's but on a single processor with 16 RAM by adding quantization to the model:

from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model_id = 'meta-llama/Llama-2-7b-hf'

llama_base = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
    use_cache=False,
    token=TOKEN,  # Add your token here
    quantization_config=nf4_config
)

I've researched the base code a bit but I'd like to make sure:
When I call the fit method, does a normal workout take place behind the scenes? Not DeepSpeed or LORA based?
Because it's quite surprising that I was able to train on a single GPU even after the quantization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Training with DataParallel Results in RuntimeError #1

Multi-GPU Training with DataParallel Results in RuntimeError #1

ronigold commented Sep 21, 2023

ronigold commented Sep 21, 2023 •

edited

Loading

Multi-GPU Training with DataParallel Results in RuntimeError #1

Multi-GPU Training with DataParallel Results in RuntimeError #1

Comments

ronigold commented Sep 21, 2023

ronigold commented Sep 21, 2023 • edited Loading

ronigold commented Sep 21, 2023 •

edited

Loading