[BUG] DS Inference Bloom OOM / get_sd_loader_json() missing 1 argument #2222

oborchers · 2022-08-16T11:54:17Z

Describe the bug

When loading the bloom model the loader tries to allocate too much to a single GPU. Therefore, the script OOMs.

I think this is also related to: #2169

Hardware

5x A100 80G + 512G ram (should be enough as it is working with plain accelerate).

To Reproduce

Copied and mildly adapted from

Run with:

deepspeed --num_gpus 5 script.py

(Deepspeed it built from main branch)

Case 1:

dtype = torch.float16
dschf = HfDeepSpeedConfig(ds_config)

with deepspeed.OnDevice(dtype=dtype, device='meta'):
    model = AutoModelForCausalLM.from_config(self.config, torch_dtype=torch.bfloat16)
model = model.eval()

ds_engine = deepspeed.initialize(model=model, config_params=ds_config)[0]
ds_engine.module.eval()
model = ds_engine.module

dist.barrier()

model = deepspeed.init_inference(model, 
                         mp_size=1,
                         dtype=dtype,
                         checkpoint='/workspace/bloom-176B.json', 
                         replace_with_kernel_inject=True
                         )
self.model = model.module

Results in

RuntimeError: CUDA out of memory. Tried to allocate 1.53 GiB (GPU 2; 79.17 GiB total capacity; 76.57 GiB already allocated; 932.00 MiB free; 77.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Note, that this is CUDA device 2 out of 5!

Case 2:

Based on #2169

injection_policy={transformers.models.bloom.modeling_bloom.BloomBlock: ('self_attention.dense', 'mlp.dense_4h_to_h')}
replace_with_kernel_inject=False

results in:

TypeErrorTypeError: : get_sd_loader_json() missing 1 required positional argument: 'checkpoint_engine'get_sd_loader_json() missing 1 required positional argument: 'checkpoint_engine'

    model.load()
  File "/workspace/bloom.py", line 106, in load
    injection_policy={transformers.models.bloom.modeling_bloom.BloomBlock: ('self_attention.dense', 'mlp.dense_4h_to_h')}
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/__init__.py", line 312, in init_inference
    save_mp_checkpoint_path)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/inference/engine.py", line 104, in __init__
    self._load_checkpoint(self.checkpoint)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/inference/engine.py", line 394, in _load_checkpoint
    sd_loader = SDLoaderFactory.get_sd_loader_json(load_dir)
TypeError: get_sd_loader_json() missing 1 required positional argument: 'checkpoint_engine'

Tagging @RezaYazdaniAminabadi because I know of his active involvement in this one.

The text was updated successfully, but these errors were encountered:

felix-schneider · 2022-08-25T09:03:32Z

For case 2: For some reason, 556f005 removed this argument in get_sd_load_json. I didn't dive deep into the code, but it does seem like it's required. Maybe one of the authors of this commit @RezaYazdaniAminabadi @jeffra @tjruwase can help with this

oborchers · 2022-08-26T09:28:00Z

I was able to retest on 8 A100. Same issue unfortunately. Works just fine with accelerate, though. Will try to rebuild from scratch and test again. Doesnt work with deepspeed x.py, not python x.py

oborchers · 2022-08-26T11:41:12Z

Error seems to be on my side for case #1. Was able to run https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/scripts/inference/bloom-ds-inference.py

RezaYazdaniAminabadi · 2022-08-29T16:38:16Z

Hi @oborchers,
Thanks for the update and happy to see the issue of case #1 is resolved. Unfortunately, the case #2 does not work for this model right now. We will try to bring this support back after fixing some issues regarding the checkpoint loading.
Thanks,
Reza

asafkar · 2023-02-01T18:40:02Z

Hi @RezaYazdaniAminabadi,
do you have an ETA regarding case #2 ?
Currently (without such a fix) as far as I understand, there is no way to use DS inference without kernel injection, right?

RezaYazdaniAminabadi · 2023-02-02T00:34:44Z

Hi @asafkar,
I am afraid it is not easy to do that wo kernels since it requires some changes on the BLOOM attention side to deal with the alibi tensor. Can I ask why this feature is important and why you cannot use the kernel_injected method?
Thanks,
Reza

asafkar · 2023-02-02T11:57:58Z

Hi @RezaYazdaniAminabadi ,
I need this feature in order to run on none-GPU accelerators (mainly for using the TP/PP capabilities).
Currently, this would not work for CPUs.

Currently it seems that the engine/_load_checkpoint function does not work in this case, and perhaps can be skipped, if adding a call to module_inject/load_model_with_checkpoint function later on (by passing a checkpoint to _apply_injection_policy function)

Thanks

oborchers added the bug Something isn't working label Aug 16, 2022

mrwyattii added the inference label Aug 16, 2022

RezaYazdaniAminabadi closed this as completed Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DS Inference Bloom OOM / get_sd_loader_json() missing 1 argument #2222

[BUG] DS Inference Bloom OOM / get_sd_loader_json() missing 1 argument #2222

oborchers commented Aug 16, 2022 •

edited

Loading

felix-schneider commented Aug 25, 2022

oborchers commented Aug 26, 2022

oborchers commented Aug 26, 2022

RezaYazdaniAminabadi commented Aug 29, 2022

asafkar commented Feb 1, 2023

RezaYazdaniAminabadi commented Feb 2, 2023

asafkar commented Feb 2, 2023 •

edited

Loading

[BUG] DS Inference Bloom OOM / get_sd_loader_json() missing 1 argument #2222

[BUG] DS Inference Bloom OOM / get_sd_loader_json() missing 1 argument #2222

Comments

oborchers commented Aug 16, 2022 • edited Loading

felix-schneider commented Aug 25, 2022

oborchers commented Aug 26, 2022

oborchers commented Aug 26, 2022

RezaYazdaniAminabadi commented Aug 29, 2022

asafkar commented Feb 1, 2023

RezaYazdaniAminabadi commented Feb 2, 2023

asafkar commented Feb 2, 2023 • edited Loading

oborchers commented Aug 16, 2022 •

edited

Loading

asafkar commented Feb 2, 2023 •

edited

Loading