Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix llama conversion, improve parameter conversion #94

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Dec 17, 2024

✨ Description

Fixes: #93

#55 added support for rope scaling, but requires the value to be a dict. However, the value can also be None, in which case #55 broke conversion.

This could not be solved in an easy, non-hacky way, because there is a variable set of entries on the HF side, which may depend on each other. I handled this by making a converter for the entire rope scaling dict on the HF side, but this has to be mapped to multiple values on the Fast-LLM side, which is also a problem. (Mapping to the whole rotary config dict wouldn't work because of rope theta.)

So the solution is to adapt parameter converters to take a variable number of parameters on each side. The rope scaling if the first use case for this, but I expect more in the future. Also it's the same thing we do for weight converters (and for exactly the same reason), so it's a natural next step. That makes converter definitions a bit more complicated since I had to enforce 2d arrays of tuples to make things less confusing and error-prone, but I think it's worth it in the long run.

Added the DEFAULT tag that gets replaced with the default value during config validation, so far it's used to avoid the ad-hoc None handling in rope config, but I'm sure it will have more uses elsewhere.

Added a plain llama model in the tests, it's the same as mistral but uses the llama converter.

Note that it will force some small changes in unmerged converters (only #5 and #84)

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier marked this pull request as ready for review December 17, 2024 05:48
@jlamypoirier jlamypoirier changed the title Fix improve conversion Fix llama conversion, improve parameter conversion Dec 17, 2024
@jlamypoirier jlamypoirier merged commit d8f3390 into main Dec 17, 2024
4 checks passed
@jlamypoirier jlamypoirier deleted the fix_improve_conversion branch December 17, 2024 21:11
@jlamypoirier jlamypoirier mentioned this pull request Dec 17, 2024
25 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Crash when loading pretrained model
1 participant