Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

KeyError model[0] did not exist in tensor? #446

Open
FrozzDay opened this issue Oct 27, 2024 · 3 comments
Open

KeyError model[0] did not exist in tensor? #446

FrozzDay opened this issue Oct 27, 2024 · 3 comments

Comments

@FrozzDay
Copy link

I am performing a Mega Merge using LLaMA 3.2 3B, both the base model and fine-tuning/instruction tuning, with the DARE linear method. Following the successful completion of the initial merge, I encountered an error when attempting to merge the second one. The error message:

Traceback (most recent call last):
  File "/usr/local/bin/mergekit-mega", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mergekit/options.py", line 82, in wrapper
    f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mergekit/scripts/megamerge.py", line 187, in main
    merge(m, merge_options, force, out_path)
  File "/usr/local/lib/python3.10/site-packages/mergekit/scripts/megamerge.py", line 81, in merge
    run_merge(
  File "/usr/local/lib/python3.10/site-packages/mergekit/merge.py", line 96, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/usr/local/lib/python3.10/site-packages/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/usr/local/lib/python3.10/site-packages/mergekit/tokenizer/embed.py", line 54, in execute
    embed_size = tensors[models[0]].shape[1]
  File "/usr/local/lib/python3.10/site-packages/mergekit/tokenizer/embed.py", line 54, in execute
    embed_size = tensors[models[0]].shape[1]
KeyError: ModelReference(model=ModelPath(path='unsloth/Llama-3.2-3B', revision=None), lora=None, override_architecture=None)

The config is something like this

models:
  - model: unsloth/Llama-3.2-3B
  - model: model-1
    parameters:
      weight: 1
  - model: model-2
    parameters:
      weight: 1
merge_method: dare_linear
base_model: unsloth/Llama-3.2-3B
tokenizer_source: model-1
parameters:
dtype: float32
@David-AU-github
Copy link

David-AU-github commented Nov 12, 2024

Confirming exact same error ; mergekit can not find the "base_model" ; including if the path is local (absolute) on windows.

Funny thing is some mergekits work fine - no issue, where as others fail for the reasons below.
And merges I did in late SEPT 2024, now SOME fail ; others are fine ?!?!

Example: L3 models -> merge fine, no issue
Gemmas: Now break as noted below... but not all of them (??!?!)

This works fine:

models:

  • model: G:/9B/gemma-2-9b-it-abliterated
    parameters:
    weight: .4
    merge_method: dare_ties
    base_model: G:/9B/gemma2-gutenberg-9B
    tokenizer_source: union
    dtype: bfloat16

BUT THIS DIES:

models:

  • model: G:/9B/Gemma-2-Ataraxy-9B
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
  • model: G:/9B/Gemma-2-9B-It-SPPO-Iter3
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
  • model: G:/9B/gemma-2-Ifable-9B
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
    merge_method: dare_ties
    base_model: E:/Gemma-Dark-Writer3-mega-ab
    dtype: bfloat16

But exact SAME as above (3 models, base, dare_ties) , for Llama 3/3.1 merge - works fine (??)

Other GEMMA merges of the same type (3 models, base, dare_ties) that DID work (sept 2024) now crash and burn.

Even if I change this:
"base_model: E:/Gemma-Dark-Writer3-mega-ab"

Still dies, no matter what.
If I put in a bad location , it gives the normal not found too ; (??)

Likewise any "Gemma" merges like the one above that DID WORK fine, now crash and burn.
(specifically: dare_ties, 3 models + base model)

Please advise.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Program Files\Python312\Scripts\mergekit-yaml.exe_main
.py", line 7, in
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\options.py", line 82, in wrapper
f(*args, **kwargs)
File "F:\mergekit3\mergekit\mergekit\scripts\run_yaml.py", line 47, in main
run_merge(
File "F:\mergekit3\mergekit\mergekit\merge.py", line 96, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "F:\mergekit3\mergekit\mergekit\graph.py", line 197, in run
res = task.execute(**arguments)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\merge_methods\generalized_task_arithmetic.py", line 126, in execute
tvs, base = get_task_vectors(
^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\merge_methods\generalized_task_arithmetic.py", line 201, in get_task_vectors
base = tensors[base_model]
~~~~~~~^^^^^^^^^^^^
KeyError: ModelReference(model=ModelPath(path='G:/9B/gemma2-gutenberg-9B', revision=None), lora=None, override_architecture=None)

@cg123
Copy link
Collaborator

cg123 commented Nov 29, 2024

@FrozzDay @David-AU-github
I'm betting in all the cases where this is happening your base model has tied weights, but one or more of the fine tuned versions have a separate lm_head (whether from training them that way or having been produced by an older version of mergekit that always output them.)

If you're able, could you try this merge on a commit from before #429 (if it's Llama) or #406 (if it's Gemma)? I'm working on more robust handling for cases like this but it'd be great to get confirmation that the issue you're experiencing is what I have in mind. Thanks!

@David-AU-github
Copy link

@cg123 Thank you so much.;

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants