Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AWQ is not working #1240

Open
4 tasks
endomorphosis opened this issue Aug 11, 2024 · 4 comments
Open
4 tasks

AWQ is not working #1240

endomorphosis opened this issue Aug 11, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@endomorphosis
Copy link

System Info

Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana
ValueError: AWQ is only available on GPU

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

.

Expected behavior

.

@endomorphosis endomorphosis added the bug Something isn't working label Aug 11, 2024
@regisss
Copy link
Collaborator

regisss commented Aug 19, 2024

Is it supposed to work on Gaudi?

@endomorphosis
Copy link
Author

The primary goal is to get llama405b on a single gaudi node

I had read originally that huggingface TGI was supposed to use awq, but i was unable to use any sort of quantization method at all, provided by huggingface quants, including GPTQ, uint4, etc, its just spread amongst different issues.

@regisss
Copy link
Collaborator

regisss commented Aug 20, 2024

I think GPTQ should work on Gaudi no?

@endomorphosis
Copy link
Author

no, neither generating quantized models with the intel neural compressor nor does https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 work on tgi_gaudi, nor does fp8 work with INC on a single node.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants