Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Research] Speed up evaluation for XTREME-S #16785

Merged
merged 3 commits into from
Apr 27, 2022

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Apr 14, 2022

What does this PR do?

This adds a couple of improvements to the evaluation parts of the XTREME-S script:

  • fix the bug where filtering by language happened multiple times for parallel workers (redundantly)
  • use preprocess_logits_for_metrics to transform the logits into pred_ids before concatenating them to avoid OOMs
  • add the --language_group parameter to train on the FLEURS dataset in batches of languages (west/eastern european languages, south asian languages etc.)

Misc:

  • add --ctc_zero_infinity to handle the noisy FLEURS transcriptions

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 14, 2022

The documentation is not available anymore as the PR was closed or merged.

@anton-l anton-l changed the title [WIP][Research] Speed up evaluation for XTREME-S [Research] Speed up evaluation for XTREME-S Apr 26, 2022
@anton-l anton-l marked this pull request as ready for review April 26, 2022 15:45
@anton-l
Copy link
Member Author

anton-l commented Apr 26, 2022

@patrickvonplaten these are ready to merge now I think

Also cc @sanchit-gandhi: the fixes should make your life much easier if you decide to do a run of multilingual translation :)

@anton-l anton-l merged commit a4a88fa into huggingface:main Apr 27, 2022
chamidullinr pushed a commit to chamidullinr/transformers that referenced this pull request Apr 28, 2022
* Avoid repeated per-lang filtering

* Language groups and logits preprocessing

* Style
@anton-l anton-l deleted the faster-xtreme-s-eval branch April 28, 2022 09:39
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Avoid repeated per-lang filtering

* Language groups and logits preprocessing

* Style
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants