-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Add date and commit hash to gguf metadata #2728
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: master
Are you sure you want to change the base?
Conversation
I think it's a good idea to add this metadata, but |
@slaren fair enough. Open to suggestions for some other way of generating a 'commit hash'. Unless you just mean adding some extra safety to how we're generating it now, and leaving it as 'N/A' when git isn't setup |
That sounds like a reasonable approach to me. I'd suggest just not adding it rather than adding it with a value that doesn't convey any information if fetching the hash fails. I'm also not sure it should be a general field. Just for example, suppose I write a different tool to convert models. Do I put my commit hash in that field? If I do, it kind of becomes worthless because you don't have a way to differentiate between the commit hash from the official Maybe instead of commit hash it just should be a fairly free-form |
I'm preparing gguf writing utility as a pip-installable package. Maybe adding the package version makes more sense? |
It just occurred to me that this is going to make validating models with SHA256 or whatever a lot harder. The actual data for the model could be 100% the same, but if two models were generated one second apart they'd have a different hash. Previously you could use something like a SHA256 to verify the content of a model, if stuff like the data/build is in it then you can only verify that it's a copy of a specific generated model. Cases where that might matter are for developers verifying that their changes don't functionally change the result, troubleshooting user issues, etc. |
@KerfuffleV2 what if we added some utility script for generating this hash? It could load the .gguf file and pick out the fields/data that make sense to include rather than taking a hash of the whole file |
Well, it's better than nothing. :) I think there's still a disadvantage though because you have to have people generate this specific special type of "hash". HF includes a SHA256 with files published there, but you won't be able to use that or look there to find the hash that has to be used for comparing GGUF files. |
@monatis hmm but in theory this gguf utility could stay the same while the model contents change due to differences in |
@KerfuffleV2 I'm a total noob in this space but are people already publishing .gguf files on HF? It looks like HF has a bunch of cool stuff like being able to run inference directly on their site: Do we already have a ggml integration on HF? If not, maybe there is a hook for defining this model hash there |
HF's stuff doesn't really support GGML/GGUF at all. The hosted inference only works with models that can run via Transformers. I'd be surprised if they ever allow stuff like hosted inference of GGUF models. |
@KerfuffleV2 I briefly skimmed this and it kind of sounded like it was possible: https://huggingface.co/docs/hub/models-adding-libraries#integrate-your-library-with-the-hub |
actually, why not include the hash for the tensors as metadata. that way a standalone tool can validate the file itself. |
since a possible (yet artifical) chain of gguf files could be: first convert, then quantize, and maybe then quantize again, then apply a lora, then another, then requantize. edit: and then the history of a lora or a merge (avg) of 2 or more models would make it a tree history.... oh no, i think i rediscovered git. |
I guess I might have been too pessimistic there. I'm almost positive there's currently no interface for hosted inference with GGML/GGUF though. |
Ah yes. Maybe hash of the conversion script file itself? :) Its advantage over commit hash is that the output file will be the same unless the conversion scrip is updated, and the model file can still be validated by SHA256. |
Some info about HF services / products here to be on the same page:
|
I think this is one way: |
How's everyone take on this idea? Still a good idea these days? |
There was some discussion on #2707 about including some extra metadata in the .gguf files to more easily figure out when and how the file was generated.
This PR adds:
git rev-parse --short HEAD
to get thisWe are also printing these fields alongside
general.name
etcTo test the change, I ran:
and then cracked open the .gguf file in Notepad++ and saw the corresponding values:

and saw the values were logged when running the inference:
