Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 6.77 KB

README.md

File metadata and controls

19 lines (16 loc) · 6.77 KB

Audio AI Agent

Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.

2023

Date Source Description Paper Code Trained Model
06.12 JAMMIN-GPT JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live arXiv GitHub -
19.11 M2UGen M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models arXiv - -
14.11 Qwen-Audio Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models arXiv GitHub -
02.11 FLAP FLAP: Fast Language-Audio Pre-training arXiv - -
29.10 JEN-1 Composer JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation arXiv - -
20.10 SALMONN SALMONN: Towards Generic Hearing Abilities for Large Language Models arXiv GitHub Hugging Face
19.10 Loop Copilot Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing arXiv - -
18.10 MusicAgent MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models arXiv GitHub -
11.10 LLark LLark: A Multimodal Foundation Model for Music arXiv GitHub -
01.10 UniAudio UniAudio: An Audio Foundation Model Toward Universal Audio Generation arXiv GitHub -
18.09 Dynamic-SUPERB Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech arXiv GitHub -