-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: Cortex supports Function Calling #295
Comments
Would be great to have this for the tensorRT backend as well, though I do not know if they support it. |
I am scoping this to llama3.1 function calling, as per discussion here #1151 (comment) |
@nguyenhoangthuan99 I realize I did not scope this story well for Sprint 21: Edit: Found the comment menloresearch/models#16 User story: Developer should be able to use Cortex with llama3.2, to do Function Calling (with an API similar to OpenAI) Can you help me with the following:
|
@nguyenhoangthuan99 Found your comment on Llama3.1 function calling, and pasting it here: Here is python script that run function calling, GitHub block
cc @dan-homebrew @0xSage |
Function Calling Feature SupportOpenAI's Function CallingOpenAI supports function calling as described in their documentation. The process involves three main steps:
Step 1: API Requesttools = [
{
"type": "function",
"function": {
"name": "get_delivery_date",
"description": "Get the delivery date for a customer's order.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The customer's order ID."
}
},
"required": ["order_id"]
}
}
}
]
messages = [
{"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},
{"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
{"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},
{"role": "user", "content": "i think it is order_12345"}
]
response = client.chat.completions.create(
model='gpt-4o',
messages=messages,
tools=tools
) Step 2: Processing the ResponseIf successful, the response will include a tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call['function']['arguments'])
order_id = arguments.get('order_id')
delivery_date = get_delivery_date(order_id) Step 3: Sending Function Result Backfunction_call_result_message = {
"role": "tool",
"content": json.dumps({
"order_id": order_id,
"delivery_date": delivery_date.strftime('%Y-%m-%d %H:%M:%S')
}),
"tool_call_id": response['choices'][0]['message']['tool_calls'][0]['id']
}
completion_payload = {
"model": "gpt-4o",
"messages": [
# ... previous messages ...
response['choices'][0]['message'],
function_call_result_message
]
}
final_response = openai.chat.completions.create(**completion_payload) Cortex Function Calling SupportCortex can support function calling with LLaMA 3.1 and 3.2 models. Key changes include:
These changes will allow Cortex to mimic OpenAI's function calling capabilities, providing a similar workflow for developers using Cortex-based models. This work can be done in cortex.cpp side, because the template of function calling is identical between engines (onnx, tensorrt-llm, llama.cpp). In order to support as many models as possible, I think we can make cortex to support function calling dynamically by providing default function calling config for few model architectures (llama, ...) like example
This allows us to add more model architectures support in the future and can run with many engines. Note that: Function calling should only support through API |
Reopening issue and moving function calling to v1.0.2 for more extensive QA |
Function calling with OpenAI compatibleThis tutorial, I use the Step by step with function calling1. Start server and run model.
2. Create a python script
|
Response formatThe Response Format feature in OpenAI is fundamentally a prompt engineering challenge. While its goal is to use system prompts to generate JSON output matching a specific schema, popular open-source models like Llama 3.1 and Mistral Nemo struggle to consistently generate exact JSON output that matches the requirements.
The response format parsed by OpenAI before sending to the server is quite complex for the
The response for this request by
For the Function Calling feature, while the response format serves as a guide to write correct function templates, this currently works reliably only with GPT models, not with open-source models. Given these limitations, I suggest we create a separate ticket for the Response Format feature and update the Function Calling implementation later when open-source models can handle these requirements more effectively. Besides, the response format maybe just in beta because we have to use cc @gabrielle-ong @dan-homebrew |
tried with
|
|
Marking as complete, thanks @nguyenhoangthuan99 and @TC117! |
Goal
Questions
model.yaml
to have any specific template?Original post
Problem
AFAICS, the current implementation does not have OpenAI Function Calling support. This would be a fantastic, powerful, and much needed feature.
Success Criteria
Any OAI client can be used with Nitro, even (and especially) those that use OAI Function Calling.
Reference:
https://platform.openai.com/docs/guides/function-calling
https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools
Linked Issues
Bugs
The text was updated successfully, but these errors were encountered: