Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

AmazingK2k3 · 2025-03-01T15:00:10Z

Description

As discussed in this issue https://github.com/roboflow/maestro/issues/176, this PR implements the device map feature for loading all 3 models. No change in dependencies is required.

The 'device' hyperparameter was replaced by 'device map' to maintain consistency with huggingface and avoid confusion. It was also ensured in the Florence 2 model that the device map does not take in a dict input, eg: {"": "cuda:0"} and 'auto' directly assigns the device to an available device based on the already existing parse_device_spec() function.

For Qwen 2.5 and PaliGemma 2, the device map is directly passed to the loading of the models (from_pretrained), with the default set to 'auto'.

The docstring for the load_model() function for all 3 model checkpoints was updated to reflect the changes.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Testing

Tested loading each model setting device map to different modes - 'auto', 'cuda', 'cpu'. In a cloud environment passing the cases.

I have read the CLA Document and I sign the CLA.

…& paligemma_2

CLAassistant · 2025-03-01T15:00:16Z

All committers have signed the CLA.

SkalskiP · 2025-03-06T17:00:32Z

Hi @AmazingK2k3 👋🏻 thank you so much for your PR. Could you please explain why you decided to drop the device argument? I'm looking at the #176 issue and if I remember correctly, we wanted to keep the device argument and add device_map allowing for:

Load on CPU

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cpu"
)

Load on MPS

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="mps"
)

Load on single GPU machine

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cuda:0"
)

Load model on all GPUs

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)

Load model on specific subset of GPUs

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map={"": "cuda:0"}
)

I think just device_map won't allow us for the same level of flexibility.

AmazingK2k3 · 2025-03-06T18:30:37Z

Hey @SkalskiP, The main reason I dropped the device argument completely is that I felt having two arguments that dealt with handling devices device and device_map might confuse the user loading the model. For example, currently, if we have both device and device map, say the user is loading the qwen model, the user could set the device = cpu but leave the device_map setting it to None or auto,

  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_id_or_path,
            revision=revision,
            trust_remote_code=True,
            device_map=device_map if device_map else "auto",
            torch_dtype=torch.bfloat16,
            cache_dir=cache_dir,
        )
        model.to(device)

This will ultimately load the model across GPUs even if a specific device is requested, as stated in issue #176.

I felt it would be much simpler to have a single argument dealing with the devices. device_map is commonly used in the transformers library as well and can directly take in cpu, mps, and cuda:0 that device can take in and load the models accordingly. If It is left None, the models will be loaded with the device_map set as auto.

Load on CPU

processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map="cpu"
)

Load on MPS

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
     device_map="mps"
 )

Load on single GPU machine

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="cuda:0"
)

Load model on all GPUs

 processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)

Load model on a specific subset of GPUs

processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map={"": "cuda:0"}  # Not applicable to Florence 2
)

Just one argument device_map for all cases!

Let me know if this is okay or there is a better way to go about it

Commit - Device map feature for maestro models -qwen_2.5, florence_2 …

ac25e38

…& paligemma_2

fix(pre_commit): 🎨 auto format pre-commit hooks

1a42148

AmazingK2k3 changed the title ~~Commit - Device map feature for maestro models -qwen_2.5, florence_2 …~~ Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

AmazingK2k3 commented Mar 1, 2025 •

edited

Loading

CLAassistant commented Mar 1, 2025 •

edited

Loading

SkalskiP commented Mar 6, 2025

AmazingK2k3 commented Mar 6, 2025 •

edited

Loading

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

Are you sure you want to change the base?

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

Conversation

AmazingK2k3 commented Mar 1, 2025 • edited Loading

Description

Type of change

Testing

CLAassistant commented Mar 1, 2025 • edited Loading

SkalskiP commented Mar 6, 2025

AmazingK2k3 commented Mar 6, 2025 • edited Loading

AmazingK2k3 commented Mar 1, 2025 •

edited

Loading

CLAassistant commented Mar 1, 2025 •

edited

Loading

AmazingK2k3 commented Mar 6, 2025 •

edited

Loading