Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

support for llama3 in autoquant #67

Open
CrispStrobe opened this issue Apr 28, 2024 · 3 comments
Open

support for llama3 in autoquant #67

CrispStrobe opened this issue Apr 28, 2024 · 3 comments

Comments

@CrispStrobe
Copy link

... would need vocab_type bpe, see here for illustration
https://colab.research.google.com/drive/1q1hTxLZOCRf9n0KdxSSu3tD0EI5QufrV?usp=sharing
(i also made a few adaptions for faster running for my use case)
thank you and keep up the great work!!

@CrispStrobe CrispStrobe changed the title support for llama3 support for llama3 in autoquant Apr 28, 2024
@CrispStrobe
Copy link
Author

in the meanwhile, there is also a fix for the pretokenizer. i have included it in this Kaggle notebook. of course you can adapt it if you wish.

@mlabonne
Copy link
Owner

Sorry for the slow response, thanks a lot for opening this issue. I saw a lot of comments about issues with the tokenization in GGUF, so I don't know if it's the right time to update AutoQuant.

I like your improvements in the first notebook. Do you think I should transfer them or should I wait until the situation is fixed?

@CrispStrobe
Copy link
Author

indeed might be better to wait with regard to the pre-tokenizer. i am not completely sure i understood the procedure for new models like say llama3 merges. but my current understanding is illustrated by this updated kaggle script.
there is also now a problem with older models: there are some models, like phi2, which need convert-hf-to-gguf.py and not convert.py. and after the new pre-tokenizer-fix, some of these will not easily work now.
i wonder why the script not simply falls back on default in such cases. my workaround is to just use an older version for such cases.
so atm we have at least 3 number of cases afaik:

  • old models like phi2 ==> older convert-hf-to-gguf.py
  • new bpe models like llama3 ==> newer convert-hf-to-gguf.py with complicated pre-tokenizer-handling
  • others ==> convert.py

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants