model: llama3.2 #26

freelerobot · 2024-09-26T02:18:05Z

Model Requests

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
and 11b

Which formats?

GGUF (llama.cpp)
TensorRT (TensorRT-LLM)
ONNX (Onnx Runtime)

freelerobot · 2024-09-26T02:55:43Z

already supported

freelerobot · 2024-09-26T03:29:17Z

Model responds with garbage:

❯ cortex-nightly run https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF/blob/main/llama-3.2-3b-instruct-q8_0.gguf
Validating download items, please wait..
Start downloading: llama-3.2-3b-instruct-q8_0.gguf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1138  100  1138    0     0   3942      0 --:--:-- --:--:-- --:--:--  3951
100 3263M  100 3263M    0     0  20.6M      0  0:02:37  0:02:37 --:--:-- 25.3M
Model Llama-3.2-3B-Instruct-Q8_0-GGUF downloaded successfully!
Starting server ...
Host: 127.0.0.1 Port: 3928
Server started
Model loaded!
Inorder to exit, type `exit()`
> hiya
+"(($>'9%A69G,87>6.#1C3F+0.>.B$;(,2032,+?FF'3**5"*&>69FCGHF((>*F252+'HB2%C<#!;'39,GE&#>?+'97F+,4GE?"1H+%+-?3A3*,+#C070+F:*)2*?260)B?"F)DB+F')63A6+;G$=F$H)6&/4E4"7"6.(&31+B-0A(*#;!&1C2//G-65%*=,5.2D>A6=B$2=<D417(%74'!?,2>FG/.)9&178;73=D<6?''10F/;C%')(*GC!-0!60=D1HA0AG(4E;8*.>0&-*H4E)3"462965,48!&=7H*E+E9=(9A6#.3""5##7HG8#A;81$F<B%1.;73*0#,7GC&9:HH.6(%G+"-"D>72)5#C"E'6:&=C,F19&3=./(06'F$A'F-.CB8/>DF5A*!2"!5?&)$#/.D)E.2&E"6>C/-:CB!0BE5F!8H)9H><?D%3-
,>?AG'=)+757+H'EE=F#$G#6+55)*73=?&^C

nguyenhoangthuan99 · 2024-09-26T15:28:12Z

Hi @0xSage, can you share the cortex.log, cortex-cli.log and the ~/cortexcpp-nightly/models/huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF/llama-3.2-3b-instruct-q8_0.yml

I tested in my machine and it works well. I just need to update the ctx_len to fit with 16GB Vram.

nguyenhoangthuan99 · 2024-09-27T03:57:32Z

Hi @gabrielle-ong, This is model.yml for llama 3.2

# BEGIN GENERAL GGUF METADATA
id: Llama-3.2-3B-Instruct # Model ID unique between models (author / quantization)
model: Llama-3.2-3B-Instruct # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama-3.2-3B-Instruct # metadata.general.name
version: 2 # metadata.version

# END GENERAL GGUF METADATA

# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop:                # tokenizer.ggml.eos_token_id
  - <|eot_id|>
# END REQUIRED

# BEGIN OPTIONAL
stream: true # Default true?
top_p: 0.9 # Ranges: 0 to 1
temperature: 0.7 # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0 # Ranges: 0 to 1
max_tokens: 4096 # Should be default to context length
seed: -1
dynatemp_range: 0
dynatemp_exponent: 1
top_k: 40
min_p: 0.05
tfs_z: 1
typ_p: 1
repeat_last_n: 64
repeat_penalty: 1
mirostat: false
mirostat_tau: 5
mirostat_eta: 0.100000001
penalize_nl: false
ignore_eos: false
n_probs: 0
min_keep: 0
# END OPTIONAL
# END INFERENCE PARAMETERS

# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
engine: cortex.llamacpp # engine to run model
prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
# END REQUIRED

# BEGIN OPTIONAL
ctx_len: 4096 # llama.context_length | 0 or undefined = loaded from model
ngl: 29 # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS

gabrielle-ong · 2024-09-27T05:20:40Z

Model conversion - Llama3.2 - Complete

Checklist (LlamaCpp only)

Model.yml - thanks @nguyenhoangthuan99
New repo in HF Cortexso/llama3.2: https://huggingface.co/cortexso/llama3.2, with model.yml, Readme.md
Successful Model converter CI run https://github.com/janhq/cortex.llamacpp/actions/runs/11064477210
Quantizations available in HF
QA
- Mac
- Windows
- Ubuntu (vm: Ubuntu 24)

Out of scope

Onnx
Tensorrt-llm

HF

Mac

Windows

Ubuntu

gabrielle-ong · 2024-09-27T05:21:57Z

Note: @0xSage 11b is a vision model which we dont support with the current model.yml - I checked with Alex.
https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

dan-menlo · 2024-09-27T05:39:00Z

Note: @0xSage 11b is a vision model which we dont support with the current model.yml - I checked with Alex. https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct

Please make sure this is tracked in the Model Kanban - we should set it as a goal to support all the models.

freelerobot added P1: important Important feature / fix type: model request labels Sep 26, 2024

freelerobot assigned nguyenhoangthuan99 Sep 26, 2024

freelerobot closed this as completed Sep 26, 2024

freelerobot reopened this Sep 26, 2024

freelerobot closed this as completed Sep 26, 2024

freelerobot reopened this Sep 26, 2024

freelerobot changed the title ~~model: llama 3.2~~ bug: model: llama 3.2 Sep 26, 2024

freelerobot added the type: bug Something isn't working label Sep 26, 2024

dan-menlo mentioned this issue Sep 29, 2024

roadmap: Built-in Model Library #21

Open

10 tasks

dan-menlo changed the title ~~bug: model: llama 3.2~~ model: llama3.2 Sep 26, 2024

dan-menlo removed the type: bug Something isn't working label Sep 26, 2024

gabrielle-ong self-assigned this Sep 27, 2024

gabrielle-ong closed this as completed by moving to Done in Models Sep 27, 2024

dan-menlo transferred this issue from menloresearch/cortex.cpp Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: llama3.2 #26

model: llama3.2 #26

freelerobot commented Sep 26, 2024 •

edited by gabrielle-ong

Loading

freelerobot commented Sep 26, 2024

freelerobot commented Sep 26, 2024

nguyenhoangthuan99 commented Sep 26, 2024

nguyenhoangthuan99 commented Sep 27, 2024

gabrielle-ong commented Sep 27, 2024

gabrielle-ong commented Sep 27, 2024

dan-menlo commented Sep 27, 2024

model: llama3.2 #26

model: llama3.2 #26

Comments

freelerobot commented Sep 26, 2024 • edited by gabrielle-ong Loading

Model Requests

Which formats?

freelerobot commented Sep 26, 2024

freelerobot commented Sep 26, 2024

nguyenhoangthuan99 commented Sep 26, 2024

nguyenhoangthuan99 commented Sep 27, 2024

gabrielle-ong commented Sep 27, 2024

Model conversion - Llama3.2 - Complete

Checklist (LlamaCpp only)

Out of scope

gabrielle-ong commented Sep 27, 2024

dan-menlo commented Sep 27, 2024

freelerobot commented Sep 26, 2024 •

edited by gabrielle-ong

Loading