This repository summarizes the paper, codes, and tools for Speech-to-text Translation and Speech-to-speech Translation. Welcome to pull requests.
- GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators. ACL 2024 [Paper] [Codes]
-
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. ACL 2023 [Paper]
-
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation. Arxiv 2023.08 [Paper] [Demo] [Codes]
-
Seamless: Multilingual Expressive and Streaming Speech Translation. Arxiv 2023. [Paper] [Codes]
-
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. Interspeech 2023 [Paper] [Demo]
-
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation. Arxiv 2024.07 [Paper]
-
AudioPaLM: A Large Language Model That Can Speak and Listen. Arxiv 2023.06 [Paper]
-
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?ACL 2024 [Paper] [Demo] [Codes]
-
Enhancing expressivity transfer in textless speech-to-speech translation. ASRU 2023 [Paper]
-
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation. Interspeech 2023 [Paper]
-
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation. Arxiv 2024.04 [Paper]
-
PolyVoice: Language Models for Speech to Speech Translation. ICLR 2024 [Paper]
-
SEAMLESSEXPRESSIVELM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought. Arxiv 2024.05 [Paper]
-
Speech-to-Speech Translation For A Real-world Unwritten Language. ACL 2023 [Paper]
-
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer. ACL SRW 2024 [Paper]
-
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations. Arxiv 2022.11 [Paper]
-
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning. ACL 2024 [Paper]
-
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation. ACL 2024 [Paper]