Skip to content

convert.py: Support models which are stored in a single pytorch_model.bin #1469

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged

Conversation

TheBloke
Copy link
Contributor

@TheBloke TheBloke commented May 15, 2023

Currently convert.py will error out when a pytorch model is released in a single .bin file. This one line PR fixes that.

Before:

root@1c6b80974469:~/AutoGPTQ# ll /workspace/models/ehartford_WizardLM-13B-Uncensored/
total 25429958
drwxrwxrwx  2 root root     3002424 May 15 20:58 ./
drwxrwxrwx 21 root root     3038961 May 15 20:58 ../
-rw-rw-rw-  1 root root         933 May 15 20:58 README.md
-rw-rw-rw-  1 root root          21 May 15 20:58 added_tokens.json
-rw-rw-rw-  1 root root         554 May 15 20:58 config.json
-rw-rw-rw-  1 root root         137 May 15 20:58 generation_config.json
-rw-rw-rw-  1 root root         293 May 15 20:58 huggingface-metadata.txt
-rw-rw-rw-  1 root root 26031885999 May 15 21:04 pytorch_model.bin
-rw-rw-rw-  1 root root          96 May 15 20:58 special_tokens_map.json
-rw-rw-rw-  1 root root     1842847 May 15 20:58 tokenizer.json
-rw-rw-rw-  1 root root      499723 May 15 20:58 tokenizer.model
-rw-rw-rw-  1 root root         727 May 15 20:58 tokenizer_config.json

root@1c6b80974469:~/llama.cpp# python convert.py --outtype f16 --outfile /workspace/wizardlm-13B-uncensored/ggml/wizardLM-13B-Uncensored.fp16.bin /workspace/models/ehartford_WizardLM-13B-Uncensored
Traceback (most recent call last):
  File "/root/llama.cpp/convert.py", line 1169, in <module>
    main()
  File "/root/llama.cpp/convert.py", line 1149, in main
    model_plus = load_some_model(args.model)
  File "/root/llama.cpp/convert.py", line 1066, in load_some_model
    raise Exception(f"Can't find model in directory {path}")
Exception: Can't find model in directory /workspace/models/ehartford_WizardLM-13B-Uncensored

With this PR:

root@1c6b80974469:~/llama.cpp# python convert.py --outtype f16 --outfile /workspace/wizardlm-13B-uncensored/ggml/wizardLM-13B-Uncensored.fp16.bin /workspace/models/ehartford_WizardLM-13B-Uncensored
Building llama.cpp
Making unquantised GGML at /workspace/wizardlm-13B-uncensored/ggml//wizardLM-13B-Uncensored.fp16.ggml.bin
Loading model file /workspace/models/ehartford_WizardLM-13B-Uncensored/pytorch_model.bin
Loading vocab file /workspace/models/ehartford_WizardLM-13B-Uncensored/tokenizer.model
Writing vocab...
[  1/363] Writing tensor tok_embeddings.weight                  | size  32001 x   5120  | type UnquantizedDataType(name='F16')
...

Also, unrelated: remove line 124, which was a duplicated mis-spelt line that was doing nothing.

@FNsi
Copy link
Contributor

FNsi commented May 16, 2023

f'layers.{i}.atttention_norm.weight',

Not related to that PR, but I think that's a typo in line 124, it seems useless.

@slaren
Copy link
Member

slaren commented May 16, 2023

Not related to that PR, but I think that's a typo in line 124, it seems useless.

Line 120 has the correct name, so it still works.
But yeah to be clear, this should be fixed.

@TheBloke
Copy link
Contributor Author

Fixed!

@slaren slaren merged commit 2b26469 into ggml-org:master May 16, 2023
ggerganov pushed a commit to JohannesGaessler/llama.cpp that referenced this pull request May 20, 2023
….bin (ggml-org#1469)

* Support models in a single pytorch_model.bin

* Remove spurious line with typo
ggerganov added a commit that referenced this pull request May 20, 2023
…oadcasting for ggml_mul (#1483)

* Broadcasting for ggml_mul

* CUDA kernel for ggml_mul, norms in VRAM

* GPU weights not in RAM, direct loading with cuFile

* fixup! GPU weights not in RAM, direct loading with cuFile

* fixup! GPU weights not in RAM, direct loading with cuFile

* define default model path once, sync path with readme (#1366)

* ~7% faster Q5_1 AVX2 code (#1477)

* convert.py: Support models which are stored in a single pytorch_model.bin (#1469)

* Support models in a single pytorch_model.bin

* Remove spurious line with typo

* benchmark-matmul: Print the average of the test results (#1490)

* Remove unused n_parts parameter (#1509)

* Fixes #1511 lambda issue for w64devkit (mingw) (#1513)

* Fix for w64devkit and mingw

* make kv_f16 the default for api users (#1517)

* minor : fix compile warnings

* readme : adds WizardLM to the list of supported models (#1485)

* main : make reverse prompt option act as a stop token in non-interactive mode (#1032)

* Make reverse prompt option act as a stop token in non-interactive scenarios

* Making requested review changes

* Update gpt_params_parse and fix a merge error

* Revert "Update gpt_params_parse and fix a merge error"

This reverts commit 2bb2ff1.

* Update gpt_params_parse and fix a merge error take 2

* examples : add persistent chat (#1495)

* examples : add persistent chat

* examples : fix whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* tests : add missing header

* ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)

* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0

* llama : bump LLAMA_FILE_VERSION to 3

* cuda : update Q4 and Q8 dequantize kernels

* ggml : fix AVX dot products

* readme : update performance table + hot topics

* ggml : fix scalar implementation of Q4_1 dot

* llama : fix compile warnings in llama_set_state_data()

* llama : fix name shadowing and C4146 (#1526)

* Fix name shadowing and C4146

* Fix if macros not using defined when required

* Update llama-util.h

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update llama-util.h

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Code style

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix for mingw (#1462)

* llama : add llama_init_backend() API (close #1527)

* feature : add blis and other BLAS implementation support (#1502)

* feature: add blis support

* feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927

* fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake

* Fix typo in INTEGER

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Revert "feature : add blis and other BLAS implementation support (#1502)"

This reverts commit 07e9ace.

* GPU weights not in RAM, direct loading with cuFile

* llama : code style fixes + progress print fix

* ggml : ggml_mul better broadcast support

* cmake : workarounds for cufile when CMake version < 3.25

* gg rebase fixup

* Loop in llama.cpp, fixed progress callback

* Attempt clang-tidy fix

* llama : fix vram size computation

* Add forgotten fclose()

---------

Co-authored-by: András Salamon <ott2@users.noreply.github.com>
Co-authored-by: Ilya Kurdyukov <59548320+ilyakurdyukov@users.noreply.github.com>
Co-authored-by: Tom Jobbins <784313+TheBloke@users.noreply.github.com>
Co-authored-by: rankaiyx <rankaiyx@rankaiyx.com>
Co-authored-by: Stephan Walter <stephan@walter.name>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: David Kennedy <dakennedyd@gmail.com>
Co-authored-by: Jason McCartney <jmac@theroot.org>
Co-authored-by: Evan Jones <evan.q.jones@gmail.com>
Co-authored-by: Maxime <672982+maximegmd@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zenix <zenixls2@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants