You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/development/HOWTO-add-model.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ The required steps to implement for an HF model are:
28
28
```python
29
29
@Model.register("MyModelForCausalLM")
30
30
classMyModel(Model):
31
-
model_arch = gguf.MODEL_ARCH.GROK
31
+
model_arch = gguf.MODEL_ARCH.MYMODEL
32
32
```
33
33
34
34
2. Define the layout of the GGUF tensors in [constants.py](/gguf-py/gguf/constants.py)
@@ -79,14 +79,14 @@ Depending on the model configuration, tokenizer, code and tensors layout, you wi
79
79
-`Model#set_vocab`
80
80
-`Model#write_tensors`
81
81
82
-
NOTE: Tensor names must end with `.weight`suffix, that is the convention and several tools like `quantize` expect this to proceed the weights.
82
+
NOTE: Tensor names must end with `.weight`or `.bias` suffixes, that is the convention and several tools like `quantize` expect this to proceed the weights.
83
83
84
84
### 2. Define the model architecture in `llama.cpp`
85
85
86
86
The model params and tensors layout must be defined in `llama.cpp`:
87
87
1. Define a new `llm_arch`
88
88
2. Define the tensors layout in `LLM_TENSOR_NAMES`
89
-
3. Add any nonstandard metadata in `llm_load_hparams`
89
+
3. Add any non-standard metadata in `llm_load_hparams`
90
90
4. Create the tensors for inference in `llm_load_tensors`
91
91
5. If the model has a RoPE operation, add the rope type in `llama_rope_type`
92
92
@@ -96,9 +96,9 @@ NOTE: The dimensions in `ggml` are typically in the reverse order of the `pytorc
96
96
97
97
This is the funniest part, you have to provide the inference graph implementation of the new model architecture in `llama_build_graph`.
98
98
99
-
Have a look at existing implementation like `build_llama`, `build_dbrx` or `build_bert`.
99
+
Have a look at existing implementations like `build_llama`, `build_dbrx` or `build_bert`.
100
100
101
-
When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support for missing backend operations can be added in another PR.
101
+
Some `ggml` backends do not support all operations. Backend implementations can be added in a separate PR.
102
102
103
103
Note: to debug the inference graph: you can use [llama-eval-callback](/examples/eval-callback/).
0 commit comments