Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Merged
merged 26 commits into from
Apr 25, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Apr 19, 2024

Sorry for the huge PR 😬 It goes hand-in-hand with huggingface/huggingface.js#629 (and to a lesser extent huggingface/text-generation-inference#1798). Most of the changes to review are hopefully documentation / auto-generated stuff.

What's in this PR?

  • Inference types for text_generation and chat_completion have been updated based on TGI specs (see associated PR Generate specs from TGI openapi.json huggingface.js#629)
    • 💔 there are breaking changes for people importing the return types. Otherwise, there shouldn't be breaking changes in the usage of return values.
  • Add support for grammar in text_generation task
  • Add support for tools in chat_completion task
  • Add support more parameters to those 2 tasks. All parameters supported by TGI are not handled.
  • When a model is not TGI-served, we now parse the error message to check which parameters are not supported. This makes it more robust instead of manually maintaining a list of incompatible parameters. See MODEL_KWARGS_NOT_USED_REGEX.
  • In text_generation, only send non-None parameters in payload.
  • Added a script check_inference_input_params.py that checks task parameters are correctly used and documented, to be consistent with the generated types. For now, raises an error when not consistent but doesn't provide an auto-fix (see TODOs in script).
    • for now, only for chat_completion/text_generation + do not check everything
  • Adapted generate_async_inference_client.py to generate KO package reference correctly.
  • Fixed a bunch of tests, mainly related to new types names

What to review

Mostly:

  • src/huggingface_hub/inference/_client.py => lots of docs update (not interesting) + few tweaks in code (to review)
  • src/huggingface_hub/inference/_common.py => only small tweaks
  • tests/test_inference_client.py => to check "it works"
  • tests/test_inference_text_generation.py => to check "it works"
  • utils/check_inference_input_params.py => new script to check input parameters

The rest is either auto-generated stuff or low-level scripts.

Generated docs:

What's not this PR?

=> will be handled in a follow-up PR

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin Wauplin marked this pull request as ready for review April 23, 2024 10:23
@Wauplin Wauplin requested a review from LysandreJik April 23, 2024 13:56
@Wauplin Wauplin changed the title [wip] TGI-based specs + adapt TGI-based specs + adapt code + add support for Gramma/Tools Apr 23, 2024
@Wauplin Wauplin changed the title TGI-based specs + adapt code + add support for Gramma/Tools Add support for Grammar/Tools + TGI-based specs in InferenceClient Apr 23, 2024
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the quantity of auto-generated code I couldn't do a perfect review, but I haven't seen anything shocking in the PR. I'm of the opinion of "ship it and eventually fix if issues arise", the code looks clean enough :)

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 25, 2024

Thanks for the review @LysandreJik! Agree with you about "I'm of the opinion of ship it and eventually fix if issues arise," 😄

@Wauplin Wauplin merged commit ff6c1f9 into main Apr 25, 2024
16 checks passed
@Wauplin Wauplin deleted the follow-tgi-specs-for-chat-completion branch April 25, 2024 09:34
Wauplin added a commit to huggingface/huggingface.js that referenced this pull request Apr 30, 2024
From an idea mentioned by @OlivierDehaene and @drbh in
#579 and [slack
thread](https://huggingface.slack.com/archives/C05CFK1HM0T/p1711360022125399)
(internal).

This PR adds a script `inference-tgi-import.ts` to generate the
`text-generation` and `chat_completion` specs from the auto-generated
TGI
[specifications](https://huggingface.github.io/text-generation-inference/).
The goal is to keep in sync TGI improvements with @huggingface/tasks and
therefore have a consistent "single source of truth". The converted
specs are then compatible with our tooling to generate JS/Python code.

This PR changes quite a lot of naming in the generated JS/Python types.
Luckily I don't think they were yet used in the JS ecosystem so better
to do that now than later.

I also opened huggingface/huggingface_hub#2237
to include these changes in `huggingface_hub`.


**TODO:**
- [ ] fix lint errors. How to deal with a parsed json to avoid using
`any`?
- [ ] CI workflow to open a PR each time TGI is updated? => can be done
in a future PR

---------

Co-authored-by: SBrandeis <simon@huggingface.co>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants