Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix resolve chat completion URL #2540

Merged
merged 2 commits into from
Sep 16, 2024
Merged

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 13, 2024

Fix #2484.

With the introduction of base_url in #2384 and a refacto in #2410, some bugs have been introduced on the logic to resolve the chat completion URL. This PR fixes definitely this with a dedicated _resolve_chat_completion_url method. I added a bunch of parametrized tests to check every possible cases. It is now possible to:

  • pass a model id as InferenceClient(model=...)
  • pass a model id as InferenceClient().chat_completion(model=...)
  • pass a url as InferenceClient(model=...), InferenceClient(base_url=...) or InferenceClient().chat_completion(model=...)
    • the URL can have a trailing / or not
    • the URL can end with /v1 or not (for OpenAI compatibility)
    • the URL can end with /v1/chat/completions or not
    • the URL can be a local TGI instance or an Inference Endpoint url
  • pass a url to InferenceClient and a model_id in chat_completion (already the case before)

With all of these use cases tested and working, I think this is fixed for good :)


Two minor unrelated changes:

  1. some typing: ignore to fix CI (after Implemented https://github.com/huggingface/huggingface_hub/issues/2516 #2532)
  2. Better error in src/huggingface_hub/utils/_http.py (after oversight in Refacto error parsing (HfHubHttpError) #2474)

Failing test is unrelated.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 13, 2024

@MoritzLaurer FYI, I've made a full example for Inference Endpoints => InferenceClient and it works well:

from huggingface_hub import get_inference_endpoint


endpoint = get_inference_endpoint("smollm-360m-instruct-hgl")  # need to start an IE and copy its name
client = endpoint.resume().wait().client


output = client.chat_completion(
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ]
)

print(output.choices[0].message.content)

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! thanks @Wauplin! Nice test coverage 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 16, 2024

Thanks for the review! Failing tests are unrelated so I'll merge this one :)

@Wauplin Wauplin merged commit a49ca75 into main Sep 16, 2024
15 of 19 checks passed
@Wauplin Wauplin deleted the 2484-fix-passing-chat-completion-url branch September 16, 2024 07:41
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore
2 participants