Audio AI Agent

Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.

2023

Date	Source	Description	Paper	Code	Trained Model
06.12	JAMMIN-GPT	JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live	arXiv	GitHub	-
19.11	M2UGen	M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models	arXiv	-	-
14.11	Qwen-Audio	Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models	arXiv	GitHub	-
02.11	FLAP	FLAP: Fast Language-Audio Pre-training	arXiv	-	-
29.10	JEN-1 Composer	JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation	arXiv	-	-
20.10	SALMONN	SALMONN: Towards Generic Hearing Abilities for Large Language Models	arXiv	GitHub	Hugging Face
19.10	Loop Copilot	Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing	arXiv	-	-
18.10	MusicAgent	MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models	arXiv	GitHub	-
11.10	LLark	LLark: A Multimodal Foundation Model for Music	arXiv	GitHub	-
01.10	UniAudio	UniAudio: An Audio Foundation Model Toward Universal Audio Generation	arXiv	GitHub	-
18.09	Dynamic-SUPERB	Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech	arXiv	GitHub	-