[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

AlexanderZhk · 2025-02-12T23:20:11Z

📚 The doc issue

Two parts of the documentation appear to contradict each other, especially at first glance.

Here, it is explicitly stated that LoRA inference with a quantized model is not supported:

vllm/docs/source/models/supported_models.md

Lines 59 to 61 in 4c0d93f

    
           ##### LORA and quantization 
        
           Both are not supported yet! Make sure to open an issue and we'll work on this together with the `transformers` team!

However, here, an example is provided for running offline inference with a quantized model and a LoRA adapter:

vllm/examples/offline_inference/lora_with_quantization_inference.py

Lines 3 to 4 in 4c0d93f

    
           This example shows how to use LoRA with different quantization techniques 
        
           for offline inference.

To resolve this confusion, it would be very helpful to clarify the following points directly (please correct me if I am mistaken):

QLoRA is supported, but only for offline inference. This means you cannot dynamically load LoRA adapters after loading the quantized base model.
QLoRA is not supported with the OpenAI-compatible server, even for a single LoRA-base model pair.

Edit:

It's easy to miss on the docs site, that ##### LORA and quantization is a subsection of ### Transformers fallback, that's why I was confused.

vllm/docs/source/models/supported_models.md

Line 43 in 4c0d93f

### Transformers fallback

vllm/docs/source/models/supported_models.md

Lines 57 to 59 in 4c0d93f

    
           #### Supported features 
        
           ##### LORA and quantization

The text was updated successfully, but these errors were encountered:

jeejeelee · 2025-02-13T01:42:55Z

I think this means that transformers-fallback doesn't support these 2 features. For models integrated with vllm, we support QLoRA.

BTW, Afer #13166 was landed, I think transformers-fallback can support LoRA directly, cc @Isotr0py @hmellor

AlexanderZhk · 2025-02-13T14:47:40Z

For models integrated with vllm, we support QLoRA.

Would be great, if you could point me to a more specific example, my understanding of vllm/transformers isn't too deep.

Take qwen2 for example, it is integrated (if I understand correctly) here vllm/model_executor/models/qwen2.py
However, running a qwen2 model quantized with vllm serve is not supported.

jeejeelee · 2025-02-13T16:07:42Z

Could you please provide more detailed information, such as log information and errors

AlexanderZhk · 2025-02-14T19:11:37Z

Could you please provide more detailed information, such as log information and errors

That's partly why I created the issue, it does load, but why does the documentation state otherwise? Did it just not get updated? Are there issues, we need to be aware of, when running QLoRa currently?

hmellor · 2025-02-14T21:27:00Z

The documentation does not state otherwise.

The documentation explicitly states that quantisation and LoRA are not compatible together with the Transformers fallback.

AlexanderZhk · 2025-02-15T02:27:24Z

I see now, thanks for clarifying. It's easy to miss on the docs site, that ##### LORA and quantization is a subsection of ### Transformers fallback

vllm/docs/source/models/supported_models.md

Line 43 in 4c0d93f

### Transformers fallback

vllm/docs/source/models/supported_models.md

Lines 57 to 59 in 4c0d93f

    
           #### Supported features 
        
           ##### LORA and quantization

hmellor · 2025-02-15T11:11:42Z

Ok, we should make that clearer. Thank you for the feedback!

hmellor · 2025-02-17T16:25:05Z

The documentation change in #12960 should help with this

AlexanderZhk added the documentation Improvements or additions to documentation label Feb 12, 2025

hmellor closed this as completed Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

AlexanderZhk commented Feb 12, 2025 •

edited

Loading

jeejeelee commented Feb 13, 2025 •

edited

Loading

AlexanderZhk commented Feb 13, 2025

jeejeelee commented Feb 13, 2025

AlexanderZhk commented Feb 14, 2025

hmellor commented Feb 14, 2025

AlexanderZhk commented Feb 15, 2025

hmellor commented Feb 15, 2025

hmellor commented Feb 17, 2025

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

Comments

AlexanderZhk commented Feb 12, 2025 • edited Loading

📚 The doc issue

jeejeelee commented Feb 13, 2025 • edited Loading

AlexanderZhk commented Feb 13, 2025

jeejeelee commented Feb 13, 2025

AlexanderZhk commented Feb 14, 2025

hmellor commented Feb 14, 2025

AlexanderZhk commented Feb 15, 2025

hmellor commented Feb 15, 2025

hmellor commented Feb 17, 2025

AlexanderZhk commented Feb 12, 2025 •

edited

Loading

jeejeelee commented Feb 13, 2025 •

edited

Loading