Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Cortex supports Function Calling #295

Closed
1 task done
Tracked by #1151
ChristianWeyer opened this issue Dec 21, 2023 · 14 comments · Fixed by #1472, #1503 or #1572
Closed
1 task done
Tracked by #1151

feat: Cortex supports Function Calling #295

ChristianWeyer opened this issue Dec 21, 2023 · 14 comments · Fixed by #1472, #1503 or #1572
Assignees
Labels
needs pm Needs product level decisions type: feature request A new feature
Milestone

Comments

@ChristianWeyer
Copy link

ChristianWeyer commented Dec 21, 2023

Goal

Questions

  • Is our implementation of Function Calling consistent across all models?
    • e.g. do we need model.yaml to have any specific template?

Original post

Problem
AFAICS, the current implementation does not have OpenAI Function Calling support. This would be a fantastic, powerful, and much needed feature.

Success Criteria
Any OAI client can be used with Nitro, even (and especially) those that use OAI Function Calling.

Reference:
https://platform.openai.com/docs/guides/function-calling
https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools

Linked Issues

Bugs

@hiro-v hiro-v self-assigned this Feb 20, 2024
@hiro-v hiro-v added the P3: nice to have Nice to have feature label Feb 20, 2024
@hiro-v hiro-v added the good first issue Good for newcomers label Mar 22, 2024
@hiro-v hiro-v removed their assignment Mar 22, 2024
@fiery-prometheus
Copy link

Would be great to have this for the tensorRT backend as well, though I do not know if they support it.

@Van-QA Van-QA added this to Menlo May 28, 2024
@louis-menlo louis-menlo self-assigned this Jun 6, 2024
@freelerobot freelerobot added P1: important Important feature / fix and removed P3: nice to have Nice to have feature labels Jun 11, 2024
@Van-QA Van-QA added this to the C‌‌ortex OAI API milestone Jun 17, 2024
@louis-menlo louis-menlo removed their assignment Aug 19, 2024
@Van-QA Van-QA removed this from the C‌‌ortex OAI API milestone Aug 28, 2024
@imtuyethan imtuyethan moved this to Icebox in Menlo Sep 2, 2024
@imtuyethan imtuyethan removed P1: important Important feature / fix good first issue Good for newcomers labels Sep 2, 2024
@dan-menlo dan-menlo changed the title feat: Support for OpenAI Function Calling (for full drop-in replacement) feat: Support Function Calling in llama3.1 Sep 10, 2024
@dan-menlo
Copy link
Contributor

dan-menlo commented Sep 10, 2024

I am scoping this to llama3.1 function calling, as per discussion here #1151 (comment)

@dan-menlo
Copy link
Contributor

dan-menlo commented Sep 26, 2024

@nguyenhoangthuan99 I realize I did not scope this story well for Sprint 21:

Edit: Found the comment menloresearch/models#16

User story: Developer should be able to use Cortex with llama3.2, to do Function Calling (with an API similar to OpenAI)

Can you help me with the following:

  • Add links to the existing Github gist you had for llama3.1 Function Calling support
  • Help me think through how we should scope this - is there a way we can "mimic" the OpenAI API structure (e.g. pass in the function call into context window)

@dan-menlo
Copy link
Contributor

dan-menlo commented Sep 26, 2024

@nguyenhoangthuan99 Found your comment on Llama3.1 function calling, and pasting it here:

menloresearch/models#16

Here is python script that run function calling, GitHub block .py file, so I cannot upload it.

import requests, json

ENDPOINT = "https://litellm.jan.ai/v1/chat/completions" # "http://localhost:3928/v1/chat/completions" #
MODEL = "alan-gift" # "meta-llama3.1-8b-instruct" #

grammar = """
root   ::= object
value  ::= object | array | string | number | ("true" | "false" | "null") ws

object ::=
  "{" ws (
            string ":" ws value
    ("," ws string ":" ws value)*
  )? "}" ws

array  ::=
  "[" ws (
            value
    ("," ws value)*
  )? "]" ws

string ::=
  "\"" (
    [^"\\\x7F\x00-\x1F] |
    "\\" (["\\bfnrt] | "u" [0-9a-fA-F]{4}) # escapes
  )* "\"" ws

number ::= ("-"? ([0-9] | [1-9] [0-9]{0,15})) ("." [0-9]+)? ([eE] [-+]? [0-9] [1-9]{0,15})? ws

# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= | " " | "\n" [ \t]{0,20}
"""

system_prompt = """
Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 20 September 2024

# Tool Instructions
- Always execute python code in messages that you share.
- When looking for real time information use relevant functions if available else fallback to brave_search

You have access to the following CUSTOM functions:

Use the function 'spotify_trending_songs' to: Get top trending songs on Spotify
{
  "name": "spotify_trending_songs",
  "description": "Get top trending songs on Spotify",
  "parameters": {
    "n": {
      "param_type": "int",
      "description": "Number of trending songs to get",
      "required": true
    }
  }
}

Use the function 'get_current_conditions' to: Get the current weather conditions for a specific location
{
    "type": "function",
    "function": {
    "name": "get_current_conditions",
    "description": "Get the current weather conditions for a specific location",
    "parameters": {
        "type": "object",
        "properties": {
        "location": {
            "type": "string",
            "description": "The city and state, e.g., San Francisco, CA"
        },
        "unit": {
            "type": "string",
            "enum": ["Celsius", "Fahrenheit"],
            "description": "The temperature unit to use. Infer this from the user's location."
        }
        },
        "required": ["location", "unit"]
    }
    }
}

If a you choose to call a CUSTOM function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where

start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function argument value as value.
end_tag => `</function>`

Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- Always add your sources when using search results to answer the user query
- If can not find correct parameters corresponding to function, ask user again to provide.
- No explanation are needed when calling a function.

You are a helpful assistant.
"""
user_prompt = "Who is US president in 2024"
system = {"role":"system","content":system_prompt}
user = {"role":"user","content":user_prompt}

messages = [system,user]
body = {
    "model": MODEL,
    "messages": messages,
    "top_p":0.9,
    "top_k":40,
    "temperature":0.6,
    "stop" : ["</s>","<|eot_id|>"],
    "grammar":grammar,
}

result = requests.post(ENDPOINT, json=body,headers={'content-type': 'application/json'}).json()
print(json.dumps(result,indent=4))
assitant = result["choices"][0]["message"]
users2 = {"role":"user","content":"Maybe CA"}
# ipython = {"role":"ipython",""}
messages = [system,user,assitant,users2]

body = {
    "model": MODEL,
    "messages": messages,
    "top_p":0.9,
    "temperature":0.6,
    "stop" : ["</s>","<|eot_id|>","<|eom_id|>"],
    "grammar":grammar,
}
result = requests.post(ENDPOINT, json=body,headers={'content-type': 'application/json'}).json()
print(json.dumps(result,indent=4))

cc @dan-homebrew @0xSage

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Sep 27, 2024

Function Calling Feature Support

OpenAI's Function Calling

OpenAI supports function calling as described in their documentation. The process involves three main steps:

  1. Request to OpenAI API with function definitions
  2. Processing the API response and executing the function
  3. Sending the function result back to the API

Step 1: API Request

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "required": ["order_id"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},
    {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
    {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},
    {"role": "user", "content": "i think it is order_12345"}
]

response = client.chat.completions.create(
    model='gpt-4o',
    messages=messages,
    tools=tools
)

Step 2: Processing the Response

If successful, the response will include a tool_calls field:

tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call['function']['arguments'])
order_id = arguments.get('order_id')
delivery_date = get_delivery_date(order_id)

Step 3: Sending Function Result Back

function_call_result_message = {
    "role": "tool",
    "content": json.dumps({
        "order_id": order_id,
        "delivery_date": delivery_date.strftime('%Y-%m-%d %H:%M:%S')
    }),
    "tool_call_id": response['choices'][0]['message']['tool_calls'][0]['id']
}

completion_payload = {
    "model": "gpt-4o",
    "messages": [
        # ... previous messages ...
        response['choices'][0]['message'],
        function_call_result_message
    ]
}

final_response = openai.chat.completions.create(**completion_payload)

Cortex Function Calling Support

Cortex can support function calling with LLaMA 3.1 and 3.2 models. Key changes include:

  1. Injecting a default system prompt for function calling or allowing manual setting.
  2. Parsing the response content to identify tool calls.
  3. Handling tool role messages and tool_calls in message content.
  4. Managing edge cases like multiple or parallel function calls.

These changes will allow Cortex to mimic OpenAI's function calling capabilities, providing a similar workflow for developers using Cortex-based models. This work can be done in cortex.cpp side, because the template of function calling is identical between engines (onnx, tensorrt-llm, llama.cpp).

In order to support as many models as possible, I think we can make cortex to support function calling dynamically by providing default function calling config for few model architectures (llama, ...) like example

# file function_calling_config/llama3.1.h

constexpr system_prompt = "this is default system prompt llama3.1"; // default system_prompt will be different accross model
constexpr tool_role_name = "ipython"; // tool_role_name is different between models, llama3.1 is `ipython`

# file function_calling_config/mistral_nemo.h

constexpr system_prompt = "this is default system prompt mistral nemo"; // default system_prompt will be different accross model
constexpr tool_role_name = "tool"; // tool_role_name is different between models

This allows us to add more model architectures support in the future and can run with many engines.
When user send function calling requests to cortex.cpp, we will let user decide which type of model they want to use, or set manually by themselves.

Note that: Function calling should only support through API

@dan-menlo dan-menlo moved this from Icebox to Triage in Menlo Sep 29, 2024
@gabrielle-ong
Copy link
Contributor

gabrielle-ong commented Oct 21, 2024

Reopening issue and moving function calling to v1.0.2 for more extensive QA
We also need to implement the full spec as per OpenAI

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Oct 29, 2024

Function calling with OpenAI compatible

This tutorial, I use the mistral-nemo:12b-gguf-q4-km for testing with cortex.cpp. All steps are reproduced from original openai instruction https://platform.openai.com/docs/guides/function-calling

Step by step with function calling

1. Start server and run model.

cortex run mistral-nemo:12b-gguf-q4-km

2. Create a python script function_calling.py with this content:

from datetime import datetime
from openai import OpenAI
from pydantic import BaseModel
ENDPOINT = "http://localhost:39281/v1"
MODEL = "mistral-nemo:12b-gguf-q4-km"
client = OpenAI(
    base_url=ENDPOINT,
    api_key="not-needed"
)

This step creates OpenAI client in python

3. Start create a chat completion with tool calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",

                "strict": True,
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID.",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    }
]
completion_payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},
        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
    ]
}
response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages=completion_payload["messages"],
    tools=tools,
)
print(response)

Because you didn't provide the order_id, the model will ask again

ChatCompletion(
   id='1lblzWtLw9h5HG0GjYYi',
   choices=[
       Choice(
           finish_reason=None,
           index=0,
           logprobs=None,
           message=ChatCompletionMessage(
               content='Of course! Please provide your order ID so I can look it up.',
               refusal=None,
               role='assistant',
               audio=None,
               function_call=None,
               tool_calls=None
           )
       )
   ],
   created=1730204306,
   model='_',
   object='chat.completion',
   service_tier=None,
   system_fingerprint='_',
   usage=CompletionUsage(
       completion_tokens=15,
       prompt_tokens=449,
       total_tokens=464,
       completion_tokens_details=None,
       prompt_tokens_details=None
   )
)

4. Add new message user provide order id

completion_payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},
        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
        {"role": "assistant", "content": "Sure! Could you please provide your order ID so I can look up the delivery date for you?"},
        {"role": "user", "content": "i think it is order_12345"},
    ]
}

response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages=completion_payload["messages"],
    tools=tools
)

The response of the model will be

ChatCompletion(
   id='zUnHwEPCambJtrvWOAQy',
   choices=[
       Choice(
           finish_reason='tool_calls',
           index=0,
           logprobs=None,
           message=ChatCompletionMessage(
               content='',
               refusal=None,
               role='assistant',
               audio=None,
               function_call=None,
               tool_calls=[
                   ChatCompletionMessageToolCall(
                       id=None,
                       function=Function(
                           arguments='{"order_id": "order_12345"}',
                           name='get_delivery_date'
                       ),
                       type='function'
                   )
               ]
           )
       )
   ],
   created=1730204559,
   model='_',
   object='chat.completion',
   service_tier=None,
   system_fingerprint='_',
   usage=CompletionUsage(
       completion_tokens=23,
       prompt_tokens=483,
       total_tokens=506,
       completion_tokens_details=None,
       prompt_tokens_details=None
   )
)

It can return correct function with arguments

5. Push the response to the conversation and ask model to answer user

order_id = "order_12345"
delivery_date = datetime.now()

# Simulate the tool call response
response = {
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_62136354",
                        "type": "function",
                        "function": {
                            "arguments": "{'order_id': 'order_12345'}",
                            "name": "get_delivery_date"
                        }
                    }
                ]
            }
        }
    ]
}

# Create a message containing the result of the function call
function_call_result_message = {
    "role": "tool",
    "content": json.dumps({
        "order_id": order_id,
        "delivery_date": delivery_date.strftime('%Y-%m-%d %H:%M:%S')
    }),
    "tool_call_id": response['choices'][0]['message']['tool_calls'][0]['id']
}

# Prepare the chat completion call payload
completion_payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},
        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
        {"role": "assistant", "content": "Sure! Could you please provide your order ID so I can look up the delivery date for you?"},
        {"role": "user", "content": "i think it is order_12345"},
        response["choices"][0]["message"],
        function_call_result_message
    ]
}

client = OpenAI(
    # This is the default and can be omitted
    base_url=ENDPOINT,
    api_key="not-needed"
)

response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages=completion_payload["messages"],
    tools=tools,
)
print(response)

The response will include all the content that processed by the function, where the delivery date is produced by query db, ....

ChatCompletion(
   id='l1xdCuKVMYBSC5tEDlAn',
   choices=[
       Choice(
           finish_reason=None,
           index=0,
           logprobs=None,
           message=ChatCompletionMessage(
               content="Your order with ID 'order_12345' is scheduled to be delivered on October 29, 2024. Is there anything else I can help you with?",
               refusal=None,
               role='assistant',
               audio=None,
               function_call=None,
               tool_calls=None
           )
       )
   ],
   created=1730205470,
   model='_',
   object='chat.completion',
   service_tier=None,
   system_fingerprint='_',
   usage=CompletionUsage(
       completion_tokens=40,
       prompt_tokens=568,
       total_tokens=608,
       completion_tokens_details=None,
       prompt_tokens_details=None
   )
)

Handling parallel function calling

Cortex cpp support parallel function calling by default

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",

                "strict": True,
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID.",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_conditions",
            "description": "Get the current weather conditions for a specific location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["Celsius", "Fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the user's location."
                    }
                },
                "required": ["location", "unit"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Hi, can you tell me the delivery date for my order order_12345 and check the weather condition in LA?"}
]
response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages= messages, 
    tools=tools
)
print(response)

It will call 2 functions in parallel

ChatCompletion(
    id='5ot3qux399DojubnBFrG',
    choices=[
        Choice(
            finish_reason='tool_calls',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content='',
                refusal=None,
                role='assistant',
                audio=None,
                function_call=None,
                tool_calls=[
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"order_id": "order_12345"}',
                            name='get_delivery_date'
                        ),
                        type='function'
                    ),
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"location": "LA", "unit": "Fahrenheit"}',
                            name='get_current_conditions'
                        ),
                        type='function'
                    )
                ]
            )
        )
    ],
    created=1730205975,
    model='_',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='_',
    usage=CompletionUsage(
        completion_tokens=47,
        prompt_tokens=568,
        total_tokens=615,
        completion_tokens_details=None,
        prompt_tokens_details=None
    )
)

Configuring function calling behavior using the tool_choice parameter

User can set tool_choice=none to disable function calling even if the tools are provided

response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages= messages, #completion_payload["messages"],
    tools=tools,
    tool_choice="none"
)

User can also force model to call a tool by specify the tool name, in this example it's the get_current_conditions

response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages= [{"role": "user", "content": "Hi, can you tell me the delivery date for my order order_12345 and check the weather condition in LA?"}],
    tools=tools,
    tool_choice= {"type": "function", "function": {"name": "get_current_conditions"}})

User can also specify the function with enum field to make model generate more accurate.

{
    "name": "pick_tshirt_size",
    "description": "Call this if the user specifies which size t-shirt they want",
    "parameters": {
        "type": "object",
        "properties": {
            "size": {
                "type": "string",
                "enum": ["s", "m", "l"],
                "description": "The size of the t-shirt that the user would like to order"
            }
        },
        "required": ["size"],
        "additionalProperties": false
    }
}

(*) Note that the accuracy of function calling heavily depends on the quality of the model. For small models like 8B or 12B, we should only use function calling with simple cases.#

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Oct 29, 2024

Response format

The Response Format feature in OpenAI is fundamentally a prompt engineering challenge. While its goal is to use system prompts to generate JSON output matching a specific schema, popular open-source models like Llama 3.1 and Mistral Nemo struggle to consistently generate exact JSON output that matches the requirements.
For example, consider this request created using the OpenAI library:

class Step(BaseModel):
    explanation: str
    output: str


class MathReasoning(BaseModel):
    steps: List[Step]
    final_answer: str

    
completion_payload = {
    "messages": [
        {"role": "system", "content": f"You are a helpful math tutor. Guide the user through the solution step by step.\n"},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ]
}

response = client.beta.chat.completions.parse(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages= completion_payload["messages"],
    response_format=MathReasoning
)

The response format parsed by OpenAI before sending to the server is quite complex for the MathReasoning schema. Unlike GPT models, Llama 3.1 and Mistral Nemo cannot reliably generate responses that can be parsed as shown in the OpenAI tutorial. This may be due to these models not being trained on similar structured output tasks.

"response_format" : 
        {
                "json_schema" : 
                {
                        "name" : "MathReasoning",
                        "schema" : 
                        {
                                "$defs" : 
                                {
                                        "Step" : 
                                        {
                                                "additionalProperties" : false,
                                                "properties" : 
                                                {
                                                        "explanation" : 
                                                        {
                                                                "title" : "Explanation",
                                                                "type" : "string"
                                                        },
                                                        "output" : 
                                                        {
                                                                "title" : "Output",
                                                                "type" : "string"
                                                        }
                                                },
                                                "required" : 
                                                [
                                                        "explanation",
                                                        "output"
                                                ],
                                                "title" : "Step",
                                                "type" : "object"
                                        }
                                },
                                "additionalProperties" : false,
                                "properties" : 
                                {
                                        "final_answer" : 
                                        {
                                                "title" : "Final Answer",
                                                "type" : "string"
                                        },
                                        "steps" : 
                                        {
                                                "items" : 
                                                {
                                                        "$ref" : "#/$defs/Step"
                                                },
                                                "title" : "Steps",
                                                "type" : "array"
                                        }
                                },
                                "required" : 
                                [
                                        "steps",
                                        "final_answer"
                                ],
                                "title" : "MathReasoning",
                                "type" : "object"
                        },
                        "strict" : true
                },
                "type" : "json_schema"
        }

The response for this request by mistral-nemo and llama3.1 can not be used to parse result like in the original tutorial by openAI. Maybe llama3.1 and mistral-nemo didn't train with this kind of data, so it fails to handle this case.

Response: {
        "choices" : 
        [
                {
                        "finish_reason" : null,
                        "index" : 0,
                        "message" : 
                        {
                                "content" : "Here's a step-by-step guide to solving the equation 8x + 7 = -23:\n\n```json\n{\n  \"name\": \"MathReasoning\",\n  \"schema\": {\n    \"$defs\": {\n      \"Step\": {\n        \"additionalProperties\": false,\n        \"properties\": {\n          \"explanation\": {\"title\": \"Explanation\", \"type\": \"string\"},\n          \"output\": {\"title\": \"Output\", \"type\": \"string\"}\n        },\n        \"required\": [\"explanation\", \"output\"],\n        \"title\": \"Step\",\n        \"type\": \"object\"\n      }\n    },\n    \"additionalProperties\": false,\n    \"properties\": {\n      \"final_answer\": {\"title\": \"Final Answer\", \"type\": \"string\"},\n      \"steps\": {\n        \"items\": {\"$ref\": \"#/$defs/Step\"},\n        \"title\": \"Steps\",\n        \"type\": \"array\"\n      }\n    },\n    \"required\": [\"steps\", \"final_answer\"],\n    \"title\": \"MathReasoning\",\n    \"type\": \"object\"\n  },\n  \"strict\": true\n}\n```\n\n1. **Subtract 7 from both sides** to isolate the term with x:\n\n   - Explanation: To get rid of the +7 on the left side, we add -7 to both sides of the equation.\n   - Output: `8x + 7 - 7 = -23 - 7`\n\n   This simplifies to:\n   ```\n   8x = -30\n   ```\n\n2. **Divide both sides by 8** to solve for x:\n\n   - Explanation: To get rid of the 8 on the left side, we multiply both sides of the equation by the reciprocal of 8, which is 1/8.\n   - Output: `8x / 8 = -30 / 8`\n\n   This simplifies to:\n   ```\n   x = -3.75\n   ```\n\nSo, the final answer is:\n\n- Final Answer: `x = -3.75`",
                                "role" : "assistant"
                        }
                }
        ],

For the Function Calling feature, while the response format serves as a guide to write correct function templates, this currently works reliably only with GPT models, not with open-source models. Given these limitations, I suggest we create a separate ticket for the Response Format feature and update the Function Calling implementation later when open-source models can handle these requirements more effectively. Besides, the response format maybe just in beta because we have to use client.beta.chat.completions.parse to create chat completion instead of client.chat.completion.create

cc @gabrielle-ong @dan-homebrew

@nguyenhoangthuan99 nguyenhoangthuan99 moved this from In Progress to In Review in Menlo Nov 1, 2024
@github-project-automation github-project-automation bot moved this from In Review to Review + QA in Menlo Nov 5, 2024
@gabrielle-ong gabrielle-ong modified the milestones: v1.0.2, v1.0.3 Nov 12, 2024
@TC117
Copy link

TC117 commented Nov 13, 2024

tried with llama3.2:3b-gguf-q8-0

  • Not given orderID
ChatCompletion(
    id="zNT5M5ExpwViKSd8DCmC",
    choices=[
        Choice(
            finish_reason="tool_calls",
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="",
                refusal=None,
                role="assistant",
                audio=None,
                function_call=None,
                tool_calls=[
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"order_id":"your_order_id_here"}',
                            name="get_delivery_date",
                        ),
                        type="function",
                    )
                ],
            ),
        )
    ],
    created=1731469921,
    model="_",
    object="chat.completion",
    service_tier=None,
    system_fingerprint="_",
    usage=CompletionUsage(
        completion_tokens=18,
        prompt_tokens=443,
        total_tokens=461,
        completion_tokens_details=None,
        prompt_tokens_details=None,
    ),
)
  • Given orderID
ChatCompletion(
    id="UavHsxm5CEf6Ote5gzBS",
    choices=[
        Choice(
            finish_reason="tool_calls",
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="",
                refusal=None,
                role="assistant",
                audio=None,
                function_call=None,
                tool_calls=[
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"order_id": "order_12345"}',
                            name="get_delivery_date",
                        ),
                        type="function",
                    )
                ],
            ),
        )
    ],
    created=1731470092,
    model="_",
    object="chat.completion",
    service_tier=None,
    system_fingerprint="_",
    usage=CompletionUsage(
        completion_tokens=19,
        prompt_tokens=481,
        total_tokens=500,
        completion_tokens_details=None,
        prompt_tokens_details=None,
    ),
)

@TC117
Copy link

TC117 commented Nov 13, 2024

  • response to the conversation and ask model to answer user
ChatCompletion(
    id="41JQyqlgdg6eO6tygCgV",
    choices=[
        Choice(
            finish_reason=None,
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="The delivery date for your order (order_12345) is 2024-11-13.",
                refusal=None,
                role="assistant",
                audio=None,
                function_call=None,
                tool_calls=None,
            ),
        )
    ],
    created=1731470305,
    model="_",
    object="chat.completion",
    service_tier=None,
    system_fingerprint="_",
    usage=CompletionUsage(
        completion_tokens=21,
        prompt_tokens=540,
        total_tokens=561,
        completion_tokens_details=None,
        prompt_tokens_details=None,
    ),
)
ChatCompletion(
    id="HKSD6hVz2jiHkkS5X1fl",
    choices=[
        Choice(
            finish_reason=None,
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="The delivery date for your order (order_12345) is November 13, 2024.",
                refusal=None,
                role="assistant",
                audio=None,
                function_call=None,
                tool_calls=None,
            ),
        )
    ],
    created=1731470305,
    model="_",
    object="chat.completion",
    service_tier=None,
    system_fingerprint="_",
    usage=CompletionUsage(
        completion_tokens=21,
        prompt_tokens=540,
        total_tokens=561,
        completion_tokens_details=None,
        prompt_tokens_details=None,
    ),
)
  • Parallel request
ChatCompletion(
    id="LHH39nYPjBLIGnoQu0Q9",
    choices=[
        Choice(
            finish_reason="tool_calls",
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="",
                refusal=None,
                role="assistant",
                audio=None,
                function_call=None,
                tool_calls=[
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"order_id": "order_12345"}',
                            name="get_delivery_date",
                        ),
                        type="function",
                    ),
                    ChatCompletionMessageToolCall(
                        id=None,
                        function=Function(
                            arguments='{"location": "Los Angeles, CA", "unit": "Celsius"}',
                            name="get_current_conditions",
                        ),
                        type="function",
                    ),
                ],
            ),
        )
    ],
    created=1731470542,
    model="_",
    object="chat.completion",
    service_tier=None,
    system_fingerprint="_",
    usage=CompletionUsage(
        completion_tokens=42,
        prompt_tokens=549,
        total_tokens=591,
        completion_tokens_details=None,
        prompt_tokens_details=None,
    ),
)

@gabrielle-ong gabrielle-ong removed this from the v1.0.3 milestone Nov 14, 2024
@gabrielle-ong
Copy link
Contributor

Marking as complete, thanks @nguyenhoangthuan99 and @TC117!

@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 14, 2024
@gabrielle-ong gabrielle-ong modified the milestones: v1.0.4, v1.0.3 Nov 18, 2024
@TC117 TC117 mentioned this issue Dec 19, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
needs pm Needs product level decisions type: feature request A new feature
Projects
Archived in project