Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

model: llama3.2 #26

Closed
2 of 3 tasks
Tracked by #21
freelerobot opened this issue Sep 26, 2024 · 7 comments
Closed
2 of 3 tasks
Tracked by #21

model: llama3.2 #26

freelerobot opened this issue Sep 26, 2024 · 7 comments
Assignees
Labels

Comments

@freelerobot
Copy link
Contributor

freelerobot commented Sep 26, 2024

Model Requests

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
and 11b

Which formats?

  • GGUF (llama.cpp)
  • TensorRT (TensorRT-LLM)
  • ONNX (Onnx Runtime)
@freelerobot
Copy link
Contributor Author

already supported

@freelerobot freelerobot reopened this Sep 26, 2024
@freelerobot freelerobot reopened this Sep 26, 2024
@freelerobot freelerobot changed the title model: llama 3.2 bug: model: llama 3.2 Sep 26, 2024
@freelerobot
Copy link
Contributor Author

Model responds with garbage:

❯ cortex-nightly run https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF/blob/main/llama-3.2-3b-instruct-q8_0.gguf
Validating download items, please wait..
Start downloading: llama-3.2-3b-instruct-q8_0.gguf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1138  100  1138    0     0   3942      0 --:--:-- --:--:-- --:--:--  3951
100 3263M  100 3263M    0     0  20.6M      0  0:02:37  0:02:37 --:--:-- 25.3M
Model Llama-3.2-3B-Instruct-Q8_0-GGUF downloaded successfully!
Starting server ...
Host: 127.0.0.1 Port: 3928
Server started
Model loaded!
Inorder to exit, type `exit()`
> hiya
+"(($>'9%A69G,87>6.#1C3F+0.>.B$;(,2032,+?FF'3**5"*&>69FCGHF((>*F252+'HB2%C<#!;'39,GE&#>?+'97F+,4GE?"1H+%+-?3A3*,+#C070+F:*)2*?260)B?"F)DB+F')63A6+;G$=F$H)6&/4E4"7"6.(&31+B-0A(*#;!&1C2//G-65%*=,5.2D>A6=B$2=<D417(%74'!?,2>FG/.)9&178;73=D<6?''10F/;C%')(*GC!-0!60=D1HA0AG(4E;8*.>0&-*H4E)3"462965,48!&=7H*E+E9=(9A6#.3""5##7HG8#A;81$F<B%1.;73*0#,7GC&9:HH.6(%G+"-"D>72)5#C"E'6:&=C,F19&3=./(06'F$A'F-.CB8/>DF5A*!2"!5?&)$#/.D)E.2&E"6>C/-:CB!0BE5F!8H)9H><?D%3-
,>?AG'=)+757+H'EE=F#$G#6+55)*73=?&^C

@freelerobot freelerobot added the type: bug Something isn't working label Sep 26, 2024
@dan-menlo dan-menlo changed the title bug: model: llama 3.2 model: llama3.2 Sep 26, 2024
@dan-menlo dan-menlo removed the type: bug Something isn't working label Sep 26, 2024
@nguyenhoangthuan99
Copy link
Contributor

Hi @0xSage, can you share the cortex.log, cortex-cli.log and the ~/cortexcpp-nightly/models/huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF/llama-3.2-3b-instruct-q8_0.yml

I tested in my machine and it works well. I just need to update the ctx_len to fit with 16GB Vram.

Image

@nguyenhoangthuan99
Copy link
Contributor

Hi @gabrielle-ong, This is model.yml for llama 3.2

# BEGIN GENERAL GGUF METADATA
id: Llama-3.2-3B-Instruct # Model ID unique between models (author / quantization)
model: Llama-3.2-3B-Instruct # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama-3.2-3B-Instruct # metadata.general.name
version: 2 # metadata.version

# END GENERAL GGUF METADATA

# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop:                # tokenizer.ggml.eos_token_id
  - <|eot_id|>
# END REQUIRED

# BEGIN OPTIONAL
stream: true # Default true?
top_p: 0.9 # Ranges: 0 to 1
temperature: 0.7 # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0 # Ranges: 0 to 1
max_tokens: 4096 # Should be default to context length
seed: -1
dynatemp_range: 0
dynatemp_exponent: 1
top_k: 40
min_p: 0.05
tfs_z: 1
typ_p: 1
repeat_last_n: 64
repeat_penalty: 1
mirostat: false
mirostat_tau: 5
mirostat_eta: 0.100000001
penalize_nl: false
ignore_eos: false
n_probs: 0
min_keep: 0
# END OPTIONAL
# END INFERENCE PARAMETERS

# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
engine: cortex.llamacpp # engine to run model
prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
# END REQUIRED

# BEGIN OPTIONAL
ctx_len: 4096 # llama.context_length | 0 or undefined = loaded from model
ngl: 29 # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS

@gabrielle-ong gabrielle-ong self-assigned this Sep 27, 2024
@gabrielle-ong
Copy link

Model conversion - Llama3.2 - Complete

Checklist (LlamaCpp only)

Out of scope

  • Onnx
  • Tensorrt-llm

HF
Image
Mac
Image
Windows

Ubuntu
Image

@gabrielle-ong
Copy link

Note: @0xSage 11b is a vision model which we dont support with the current model.yml - I checked with Alex.
https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

@gabrielle-ong gabrielle-ong closed this as completed by moving to Done in Models Sep 27, 2024
@dan-menlo
Copy link

Note: @0xSage 11b is a vision model which we dont support with the current model.yml - I checked with Alex. https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

Please make sure this is tracked in the Model Kanban - we should set it as a goal to support all the models.

@dan-menlo dan-menlo transferred this issue from menloresearch/cortex.cpp Sep 29, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
Archived in project
Status: To Announce (Jan and/or Cortex)
Development

No branches or pull requests

4 participants