Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ONNX support for T5 and Marian #14029

Merged
merged 14 commits into from
Dec 7, 2023

Conversation

vankov
Copy link
Contributor

@vankov vankov commented Oct 17, 2023

This PR introduces an optimized version of the T5 and Marian transformer annotators

Description

The PR contains the following changes:

  1. Several small optimization of the TensorFlow implementation of T5.
  2. Caching support for the TensorFlow implementation of T5.
  3. ONNX implementation of T5
  4. The missing beam search parameters have been added to the Marain Transformer (topP, temperature, noRepeatNgramSize, repetitionPenalty).
  5. ONNX implementation of Marian

NOTE: I've added caching support for TF as I already had the code and it was easy to add. However the ONNX version is always faster and it makes no sense to export new models (or re-export existing ones) to TF, with or without caching. The existing TF models can be run by the optimized annotator but they can't benefit from caching (they need to be re-exported). Let me know if you think there is no point in having the TF caching functionality and I will removed it (it is about 10-20 lines of code).

The code is restructured so that the Tensorflow and the ONNX specific code is in separate classes and general logic is shared.

Both the ONNX implementations are using caching.

Notebooks for exporting T5 models from HuggingFace:

TF:
https://colab.research.google.com/drive/1hQR9OgVG0cWbcem05Fm0Bo3cWVJNP261?usp=sharing

ONNX:
https://colab.research.google.com/drive/1l9KDgpYnbnqKSVjImKtmy2pi_7HcXGx9?usp=sharing

Notebook for exporting ONNX Marian models from HuggingFace:

https://colab.research.google.com/drive/1Cf0gBZuGMe--OYGftL1G3DNDj26I5_1I?usp=sharing

A couple of new params are added to the T5 annotator:

maxNewTokens: the max number of tokens to be generated (default is 512)
stopAtEos: whether to stop generating when the Eos token is encountered (default is True)

The generation continues until one of the following conditions is met:

  1. The Eos token is produced and stopAtEos is True
  2. The maxTextOutputLength is reached
  3. The maxNewTokens is reached

T5 transformer: There is a new params which is used only internally: useCache. Set useCache should only be used when exporting the model (see TF notebook)

Motivation and Context

T5 and Marian performance can be significantly improved using ONNX or TF with caching.

How Has This Been Tested?

I've test both existing and new models in Python and Scala.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [x ] My code follows the code style of this project.
  • [ x] My change requires a change to the documentation.
  • [ x] I have updated the documentation accordingly.
  • [ x] I have read the CONTRIBUTING page.
  • [ x] I have added tests to cover my changes.
  • [ x] All new and existing tests passed.

@vankov vankov requested a review from maziyarpanahi October 17, 2023 14:19
@maziyarpanahi maziyarpanahi self-assigned this Oct 17, 2023
@maziyarpanahi maziyarpanahi added new-feature Introducing a new feature new model DON'T MERGE Do not merge this PR labels Oct 17, 2023
@vankov vankov changed the title T5 optimization and ONNX support ONNX support for T5 and Marian Oct 25, 2023
@maziyarpanahi maziyarpanahi changed the base branch from master to release/520-release-candidate December 7, 2023 15:46
@maziyarpanahi maziyarpanahi merged commit e000610 into release/520-release-candidate Dec 7, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
DON'T MERGE Do not merge this PR new model new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants