AWQ is not working #1240

endomorphosis · 2024-08-11T10:57:52Z

System Info

Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana
ValueError: AWQ is only available on GPU

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

.

Expected behavior

.

The text was updated successfully, but these errors were encountered:

regisss · 2024-08-19T16:09:18Z

Is it supposed to work on Gaudi?

endomorphosis · 2024-08-19T16:28:16Z

The primary goal is to get llama405b on a single gaudi node

I had read originally that huggingface TGI was supposed to use awq, but i was unable to use any sort of quantization method at all, provided by huggingface quants, including GPTQ, uint4, etc, its just spread amongst different issues.

regisss · 2024-08-20T07:57:58Z

I think GPTQ should work on Gaudi no?

endomorphosis · 2024-08-20T10:16:09Z

no, neither generating quantized models with the intel neural compressor nor does https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 work on tgi_gaudi, nor does fp8 work with INC on a single node.

endomorphosis added the bug Something isn't working label Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ is not working #1240

AWQ is not working #1240

endomorphosis commented Aug 11, 2024

regisss commented Aug 19, 2024

endomorphosis commented Aug 19, 2024

regisss commented Aug 20, 2024

endomorphosis commented Aug 20, 2024

AWQ is not working #1240

AWQ is not working #1240

Comments

endomorphosis commented Aug 11, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

regisss commented Aug 19, 2024

endomorphosis commented Aug 19, 2024

regisss commented Aug 20, 2024

endomorphosis commented Aug 20, 2024