-
I am integrating Core ChallengeHow to implement custom message handling while maximizing reuse of existing session management capabilities? Key technical constraints:
Current Approachclass PPEChatWrapper extends ChatWrapper {
public readonly wrapperName = "PPEChat";
constructor(public options: {filename: string; stops: string[], fileInfo: GgufFileInfo}) {
super();
}
async generateContextState({ chatHistory }: { chatHistory: AIChatMessageParam[] }) {
// Safety handling: wrap content with control characters
const processedHistory = chatHistory.map(msg => ({
...msg,
content: `\x01${msg.content.replace(/[\x01]/g, '')}\x01`
}));
// Use HuggingFace template conversion
const contextText = await formatPromptToLLamaText(processedHistory, this.options);
return {
contextText,
stopGenerationTriggers: [LlamaText(this.options.stops)]
};
}
}
### Challenges Encountered
1. Reading Model Metadata Built-in System Template: (resolved) model.fileInfo.metadata.tokenizer.chat_template
2. **generateContextState**: The current generateContextState signature doesn't support async operations. How are others handling template processing that requires async I/O.
3. **Session State Reuse Pattern**: What's the recommended way to leverage existing LlamaChatSession state management when using custom wrappers? Are there any workaround patterns that have worked for others?
4. **Dynamic Role Handling**: Has anyone implemented a system supporting custom role names (beyond system/user/assistant)? Our template needs to handle conversations like:
```yaml
<|im_start|>system
This is a conversation between Mike and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.<|im_end|>
<|im_start|>Llama
What can I do for you, sir?<|im_end|>
<|im_start|>Mike
Nice to meet you, Llama!<|im_end|>
<|im_start|>Llama
Hello! It's nice to meet you too, Mr. Mike. How may I assist you today?
<|im_start|>Mike
Why the sky is blue?<|im_end|>
<|im_start|>Llama
Proposed Solutions
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
Most of your questions would be answered by this documentation: https://node-llama-cpp.withcat.ai/guide/external-chat-state
You can create an adaptation yourself from
This is a non-strandard feature, and isn't supported by most model chat templates.
Any text in a
This is already done by
I've made chat wrappers independent on purpose, so you can use them without depending on a model, or even calling |
Beta Was this translation helpful? Give feedback.
-
No. This is genernal for the "completion" model. The conversation closer to the system template format, the better the generated quality.
The essence of LLM is just a completion. and "instruct" model is a fine-tune for following the system template.
Need async supports for converting the message object to string. IMO: use C for more speedup, js for more flexibility
I've already implemented it a long time ago. It's easy to add new model with default parameters supports without coding. And I've implemented general tool calls and thinking mode(include deep thinking) for any models at a higher level as plugins. |
Beta Was this translation helpful? Give feedback.
-
@giladgd Now I have done my basic research on prompt, thought clearly about the hierarchy with llama.cpp, and am ready to integrate it. Your node-llama-cpp project is very well organized at the low level, especially the separation of sampler and predictor from CPP to js, which makes it more flexible to use in js. But the lack of layering at a higher level makes it simply unable to adapt to a wider range of needs. Although I am willing to join in the work, it seems that you have your own set of thinking patterns, and I have to fork and rewrite to make it simpler now, which is the last thing I want to do because it wastes everyone's time. Anyway, thank you very much for your hard work. |
Beta Was this translation helpful? Give feedback.
-
Yes, but the premise is without any guidance, eg
the Llama 3.1 8B Instruct result(click to expand).
The only role that can be fixed is the system role, but gemma2 does not have it. I only have two days left to integrate it, so I have to use some ugly tricks that you will not be interested in. I just have a little idea:
|
Beta Was this translation helpful? Give feedback.
-
IMO, the purpose of OO is to maximize the reuse of ode and data. For the large language model (LLM), the relationship between classes should be: graph TD
LLMChat --> LLMInstructCompletion
LLMInstructCompletion --> LLMCompletion
LLMInfillCompletion --> LLMCompletion
Why do you think that only by introducing the Safety ultimately depends on the implementation of the upper layer. Using a simple string type to wrap the safe content with control characters, the only additional operation required by the upper layer is to filter the control character for the safe content. But because no new types are introduced, the entire API will be cleaner and clearer. // upper level processing using string is easy:
for (const msg of messages) {
// keep the msg.content safe.
msg.content = CTRL_CHAR + trimControlChar(msg.content) + CTRL_CHAR
// keep the dynamic role safe.
msg.role = CTRL_CHAR + trimControlChar(msg.role) + CTRL_CHAR
// msg.content = LLamaText(msg.content)
}
const data = await getTemplateData()
// howto do with LlamaText here?
// the upper level should make sure system_template is safe too.
const text_content = await formatMessagesWithTemplate(system_template, {...data, messages})
// it could be introduced to llamaModel.tokenize when addSpecial == null
const tokens = processTextWithSafeWrapper(text_content, (text, addSpecial) => llamaModel.tokenize(text, addSpecial, trimLeadingSpace)) Only when there are multiple different token types and they need to be handled differently, you might have to introduce new type. For the low-level API, it is sufficient to ensure that
Isn’t it safe and simple to use strings using control characters?
Yes, but I had to rewrite a lot of code with similar functions, such as the LlamaSampler that is not public, and the result returned by completionWithMeta lacks the used seed, temperature, etc. IMO, it is more appropriate to separate the lowest layer into an independent npm package.
Yes, It is not easy, but your current function tool implementation has embedded a lot of code in the bottom layer, and this code does not need to exist if the function is not used. These function tool codes should be extracted, rather than trick-embedded into the underlying layer. |
Beta Was this translation helpful? Give feedback.
Most of your questions would be answered by this documentation: https://node-llama-cpp.withcat.ai/guide/external-chat-state
You can create an adaptation yourself from
ChatHistoryItem[]
to the OpenAI format and vice verse.Note that the
ChatHistoryItem
type contains more information than the OpenAI format, so doing that will mean you'll miss out on some features, but mostly things that you can only do withnode-llama-cpp
and not with an OpenAI API, so this may be fine for your use case.The main features pertain to content segmentation and the stability of the context state to reuse it as much as possible and avoid redun…