ONNX support for T5 and Marian #14029

vankov · 2023-10-17T14:19:50Z

This PR introduces an optimized version of the T5 and Marian transformer annotators

Description

The PR contains the following changes:

Several small optimization of the TensorFlow implementation of T5.
Caching support for the TensorFlow implementation of T5.
ONNX implementation of T5
The missing beam search parameters have been added to the Marain Transformer (topP, temperature, noRepeatNgramSize, repetitionPenalty).
ONNX implementation of Marian

NOTE: I've added caching support for TF as I already had the code and it was easy to add. However the ONNX version is always faster and it makes no sense to export new models (or re-export existing ones) to TF, with or without caching. The existing TF models can be run by the optimized annotator but they can't benefit from caching (they need to be re-exported). Let me know if you think there is no point in having the TF caching functionality and I will removed it (it is about 10-20 lines of code).

The code is restructured so that the Tensorflow and the ONNX specific code is in separate classes and general logic is shared.

Both the ONNX implementations are using caching.

Notebooks for exporting T5 models from HuggingFace:

TF:
https://colab.research.google.com/drive/1hQR9OgVG0cWbcem05Fm0Bo3cWVJNP261?usp=sharing

ONNX:
https://colab.research.google.com/drive/1l9KDgpYnbnqKSVjImKtmy2pi_7HcXGx9?usp=sharing

Notebook for exporting ONNX Marian models from HuggingFace:

https://colab.research.google.com/drive/1Cf0gBZuGMe--OYGftL1G3DNDj26I5_1I?usp=sharing

A couple of new params are added to the T5 annotator:

maxNewTokens: the max number of tokens to be generated (default is 512)
stopAtEos: whether to stop generating when the Eos token is encountered (default is True)

The generation continues until one of the following conditions is met:

The Eos token is produced and stopAtEos is True
The maxTextOutputLength is reached
The maxNewTokens is reached

T5 transformer: There is a new params which is used only internally: useCache. Set useCache should only be used when exporting the model (see TF notebook)

Motivation and Context

T5 and Marian performance can be significantly improved using ONNX or TF with caching.

How Has This Been Tested?

I've test both existing and new models in Python and Scala.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x ] My code follows the code style of this project.
[ x] My change requires a change to the documentation.
[ x] I have updated the documentation accordingly.
[ x] I have read the CONTRIBUTING page.
[ x] I have added tests to cover my changes.
[ x] All new and existing tests passed.

vankov added 3 commits October 16, 2023 16:31

init

83b9bff

tf caching and onnx implementation

f28da55

suppressed onnx warnings and python wrapping

97909b2

vankov requested a review from maziyarpanahi October 17, 2023 14:19

maziyarpanahi self-assigned this Oct 17, 2023

maziyarpanahi added new-feature Introducing a new feature new model DON'T MERGE Do not merge this PR labels Oct 17, 2023

vankov added 10 commits October 20, 2023 16:13

implementation

0709131

marian refactoring and onnx implementation

4f0a2db

updated docs

674123e

init

03be84f

tf caching and onnx implementation

7a1f8d8

suppressed onnx warnings and python wrapping

380bab6

refactor code

5cc3c70

fixes

1f52f54

refactoring

52d49fd

merge

b5b9798

vankov changed the title ~~T5 optimization and ONNX support~~ ONNX support for T5 and Marian Oct 25, 2023

update tests

dc23d66

maziyarpanahi changed the base branch from master to release/520-release-candidate December 7, 2023 15:46

maziyarpanahi approved these changes Dec 7, 2023

View reviewed changes

maziyarpanahi merged commit e000610 into release/520-release-candidate Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX support for T5 and Marian #14029

ONNX support for T5 and Marian #14029

vankov commented Oct 17, 2023 •

edited

Loading

ONNX support for T5 and Marian #14029

ONNX support for T5 and Marian #14029

Conversation

vankov commented Oct 17, 2023 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

vankov commented Oct 17, 2023 •

edited

Loading