Feature Request: Advertise GPU Models as Extended Resources for Model-Specific Scheduling #1169

chloroaqua · 2025-02-19T21:00:58Z

Overview
Currently, the NVIDIA device plugin registers GPUs as a single generic resource (e.g., nvidia.com/gpu), which makes it difficult to target specific GPU models in heterogeneous clusters. In environments where nodes may have different GPU models (e.g., a mix of RTX 3090s and RTX 3070s), users often want to request a particular model regardless of the node, or even request multiple GPUs of a specific model (for example, 3× 3070s).

Problem Statement
With the current implementation, Kubernetes cannot differentiate between GPU models during scheduling. This limitation forces users to rely on node labels and manual scheduling, which does not scale well for workloads that require:

A specific GPU model (e.g., RTX 3090)
A precise count of a specific GPU model (e.g., 3× RTX 3070)

Proposed Solution
Enhance the NVIDIA device plugin to advertise GPUs as extended resources based on their model instead of just as a generic GPU resource. For instance:

An RTX 3090 could be advertised as nvidia.com/gpu-3090
An RTX 3070 could be advertised as nvidia.com/gpu-3070

This change would involve:

Modifying the Device Enumeration:
    Update the ListAndWatch functionality to detect the GPU model (and potentially other relevant attributes like memory capacity) and register each GPU under a resource name that reflects its model.
Updating the Allocation Logic:
    Modify the Allocate method to interpret the requested extended resource (e.g., nvidia.com/gpu-3090) and correctly set environment variables (like NVIDIA_VISIBLE_DEVICES) so that only the intended GPU is exposed to the container.
Ensuring Flexibility:
    Allow users to request a count of a specific model (e.g., specifying nvidia.com/gpu-3070: 3 in the pod spec) to support workloads requiring multiple GPUs of the same type.
Maintaining Backward Compatibility:
    Consider introducing a feature flag or opt-in configuration so that existing deployments using the generic nvidia.com/gpu resource remain unaffected unless users explicitly choose to enable model-specific scheduling.

Benefits

Granular Scheduling:
Allows precise control over GPU model selection across nodes, enabling workloads to target the appropriate hardware automatically.
Optimized Resource Utilization:
Users can request exactly the hardware they need, avoiding suboptimal GPU allocation in mixed environments.
Scalability:
Simplifies scheduling in heterogeneous clusters, reducing the reliance on manual node labeling and ad-hoc scheduling strategies.
User Flexibility:
Facilitates both single GPU model requests (e.g., one 3090) and multiple GPU requests (e.g., three 3070s) in a straightforward manner.

Example Use Case
A pod can specify the following in its resource limits:

resources:
limits:
nvidia.com/gpu-3090: 1

or

resources:
limits:
nvidia.com/gpu-3070: 3

This will ensure that the pod is scheduled on a node with the requested GPU model(s), regardless of the node the pod lands on.

The text was updated successfully, but these errors were encountered:

chipzoller · 2025-02-20T15:03:23Z

This sort of use case is being covered by DRA and so I don't see this being something attempted in the static device plug-in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Advertise GPU Models as Extended Resources for Model-Specific Scheduling #1169

Feature Request: Advertise GPU Models as Extended Resources for Model-Specific Scheduling #1169

chloroaqua commented Feb 19, 2025

chipzoller commented Feb 20, 2025

Feature Request: Advertise GPU Models as Extended Resources for Model-Specific Scheduling #1169

Feature Request: Advertise GPU Models as Extended Resources for Model-Specific Scheduling #1169

Comments

chloroaqua commented Feb 19, 2025

chipzoller commented Feb 20, 2025