-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Result of merging 2 Gemma2 9B models gains 1B parameters somehow #385
Comments
https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german Also found this to happen with model stock and gemma2 2b |
In the case of 9b, the fault appears to reside in the first safetensors chunk. There's a spurious lm_head.weight tensor that should be removed from that as well as model.safetensors.index.json, and after that the model size is what it should be. |
Beat me to it, same thing is happening here with lm_head.weight for the 2b model looks like its likely something related to handling the tokenizer source |
and how can the duplicate |
@piotr25691 Remove the entry for it from your index.json using whatever code editor, and then for the model itself you can directly edit the file with safetensors package. Here's a simplified script that will do it for you from safetensors import safe_open
from safetensors.torch import save_file
import torch
# Path to your SafeTensors file
input_file = "path/to/your/model-00001-of-00002.safetensors"
output_file = "path/to/your/fixed-model-00001-of-00002.safetensors"
# Load the SafeTensors file
tensors = {}
with safe_open(input_file, framework="pt", device="cpu") as f:
for key in f.keys():
if key != "lm_head.weight":
tensors[key] = f.get_tensor(key)
# Save the modified tensors
save_file(tensors, output_file)
print(f"SafeTensors file without lm_head saved to {output_file}")
# Optionally, verify the removal
with safe_open(output_file, framework="pt", device="cpu") as f:
if "lm_head.weight" not in f.keys():
print("lm_head.weight successfully removed")
else:
print("Warning: lm_head.weight still present") |
It's because the (transpose of?) IIRC, the |
Resulting model weights and SLERP merge formula here:
https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B
An exl2 quant of the above works, but where did the extra 1B parameters come from?
The text was updated successfully, but these errors were encountered: