Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixes issue #535 , fix hexa 1-char tokens in ASR output. #550

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

KarelVesely84
Copy link
Contributor

@KarelVesely84 KarelVesely84 commented Jan 26, 2024

  • This is to avoid output like : [' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>', 'to', 'bu', '<0x73>', '<0x75>', ... ] with regular 500 BPE units.
  • Don't rewrite 1-char output tokens in range [ 0x20 (space) .. 0x7E (tilde) ]

- Avoid output like : `[' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>',
  'to', 'bu', '<0x73>', '<0x75>', ... ]` with regular 500 BPE units.
- Don't rewrite 1-char tokens in range [ 0x20 (space) .. 0x7E (tilde) ]
@csukuangfj
Copy link
Collaborator

Thank you for your first-time contribution!

@csukuangfj csukuangfj merged commit 3f2a17e into k2-fsa:master Jan 26, 2024
170 of 181 checks passed
XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025
…a#550)

- Avoid output like : `[' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>',
  'to', 'bu', '<0x73>', '<0x75>', ... ]` with regular 500 BPE units.
- Don't rewrite 1-char tokens in range [ 0x20 (space) .. 0x7E (tilde) ]
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants