Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Result of merging 2 Gemma2 9B models gains 1B parameters somehow #385

Closed
jim-plus opened this issue Jul 28, 2024 · 6 comments · Fixed by #406
Closed

Result of merging 2 Gemma2 9B models gains 1B parameters somehow #385

jim-plus opened this issue Jul 28, 2024 · 6 comments · Fixed by #406

Comments

@jim-plus
Copy link

Resulting model weights and SLERP merge formula here:
https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B

An exl2 quant of the above works, but where did the extra 1B parameters come from?

@ALucek
Copy link

ALucek commented Aug 8, 2024

https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german

Also found this to happen with model stock and gemma2 2b

@jim-plus
Copy link
Author

jim-plus commented Aug 8, 2024

In the case of 9b, the fault appears to reside in the first safetensors chunk. There's a spurious lm_head.weight tensor that should be removed from that as well as model.safetensors.index.json, and after that the model size is what it should be.

@ALucek
Copy link

ALucek commented Aug 8, 2024

Beat me to it, same thing is happening here with lm_head.weight for the 2b model

looks like its likely something related to handling the tokenizer source

@h-lunah
Copy link

h-lunah commented Aug 9, 2024

and how can the duplicate lm_head.weight be removed so I can merge uncensored models for max uncensorship?

@ALucek
Copy link

ALucek commented Aug 9, 2024

@piotr25691 Remove the entry for it from your index.json using whatever code editor, and then for the model itself you can directly edit the file with safetensors package. Here's a simplified script that will do it for you

from safetensors import safe_open
from safetensors.torch import save_file
import torch

# Path to your SafeTensors file
input_file = "path/to/your/model-00001-of-00002.safetensors"
output_file = "path/to/your/fixed-model-00001-of-00002.safetensors"

# Load the SafeTensors file
tensors = {}
with safe_open(input_file, framework="pt", device="cpu") as f:
    for key in f.keys():
        if key != "lm_head.weight":
            tensors[key] = f.get_tensor(key)

# Save the modified tensors
save_file(tensors, output_file)

print(f"SafeTensors file without lm_head saved to {output_file}")

# Optionally, verify the removal
with safe_open(output_file, framework="pt", device="cpu") as f:
    if "lm_head.weight" not in f.keys():
        print("lm_head.weight successfully removed")
    else:
        print("Warning: lm_head.weight still present")

@jukofyork
Copy link
Contributor

It's because the (transpose of?) lm_head is used as embedding weights too:

ggml-org/llama.cpp#9065

IIRC, the command-r models also reuses the lm_head like this too.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants