Fix llama conversion, improve parameter conversion #94
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Fixes: #93
#55 added support for rope scaling, but requires the value to be a dict. However, the value can also be
None
, in which case #55 broke conversion.This could not be solved in an easy, non-hacky way, because there is a variable set of entries on the HF side, which may depend on each other. I handled this by making a converter for the entire rope scaling dict on the HF side, but this has to be mapped to multiple values on the Fast-LLM side, which is also a problem. (Mapping to the whole
rotary
config dict wouldn't work because of rope theta.)So the solution is to adapt parameter converters to take a variable number of parameters on each side. The rope scaling if the first use case for this, but I expect more in the future. Also it's the same thing we do for weight converters (and for exactly the same reason), so it's a natural next step. That makes converter definitions a bit more complicated since I had to enforce 2d arrays of tuples to make things less confusing and error-prone, but I think it's worth it in the long run.
Added the
DEFAULT
tag that gets replaced with the default value during config validation, so far it's used to avoid the ad-hoc None handling in rope config, but I'm sure it will have more uses elsewhere.Added a plain llama model in the tests, it's the same as mistral but uses the llama converter.
Note that it will force some small changes in unmerged converters (only #5 and #84)
🔍 Type of change
Select all that apply: