Stars
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Community list of startups working with AI in audio and music technology
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
A list of tools, papers and code related to Fake Audio Detection.
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
Speech Human Evaluation Estimation Toolkit (SHEET)
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
Real-time Speech-Text Foundation Model Toolkit (wip)
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Text-to-Music Generation with Rectified Flow Transformers
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
SpeechGPT Series: Speech Large Language Models
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview