Skip to content
View xzm2004260's full-sized avatar
  • Xiamen

Block or report xzm2004260

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PyTorch implementation of MusicLLM.

Python 3 Updated Jan 16, 2025

无需情感标注的情感可控语音合成模型,基于VITS

Jupyter Notebook 1,352 168 Updated Mar 30, 2023

InspireMusic: A Unified Framework for Music, Song, Audio Generation.

Python 311 27 Updated Dec 27, 2024

Balanced Error Rate for Speaker Diarization

Python 28 3 Updated Feb 28, 2023

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

597 45 Updated Jan 15, 2025

Community list of startups working with AI in audio and music technology

1,603 142 Updated Aug 9, 2024

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 189 8 Updated Jun 17, 2024

A list of tools, papers and code related to Fake Audio Detection.

52 Updated Jan 20, 2025
Python 153 13 Updated Nov 29, 2024

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案

418 110 Updated Oct 16, 2023

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案

1,182 290 Updated Dec 14, 2023

Speech Human Evaluation Estimation Toolkit (SHEET)

Python 49 6 Updated Nov 13, 2024

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Python 23 1 Updated Aug 11, 2024

Real-time Speech-Text Foundation Model Toolkit (wip)

Python 126 11 Updated Oct 14, 2024

Local realtime voice AI

Python 2,170 117 Updated Jan 17, 2025

[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Jupyter Notebook 99 4 Updated Nov 28, 2024

客家话输入方案(广西高峰乡)

1 1 Updated May 12, 2021
Python 6 Updated Sep 16, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,656 128 Updated Dec 10, 2024

粵語正字法

13 6 Updated Jul 22, 2020

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,092 270 Updated Nov 5, 2024
Python 3 Updated Aug 8, 2024

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 225 27 Updated Dec 23, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,326 89 Updated Jul 22, 2024

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 165 11 Updated Jul 12, 2024
Python 15 3 Updated Jan 19, 2025

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python 96 13 Updated Jan 17, 2025

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 508 65 Updated Oct 26, 2024

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 585 32 Updated Jul 2, 2024
Next
Showing results