Skip to content

Finding the most similar tone/color in a large collection of audio. 在一大堆音频中寻找最相似的音色。

License

Notifications You must be signed in to change notification settings

cronrpc/Audio-Speaker-Needle-In-Haystack

Repository files navigation

Audio Needle In Haystack

English | 中文简体

Project Introduction

In a large collection of audio, finding audio clips that closely match a target audio clip. Given a sufficient amount of audio data, there will always be a similar audio clip available.

Project Example: HuggingFace Space

Algorithm for matching audio: speech_eres2netv2_sv_zh-cn_16k-common

image webui page

Installation Dependencies

Install ffmpeg and libsox-dev:

sudo apt install ffmpeg libsox-dev

Install Python dependencies:

conda create -n 3ds python=3.9
conda activate 3ds

cd Audio-Speaker-Needle-In-Haystack
pip install -r requirements.txt

If you need to generate WAV files using ChatTTS, please follow the installation instructions of the related dependencies in the 2noise/ChatTTS project.

If you want to use pre-generated ChatTTS WAV files, you can access them at Audio_speaker_needle_in_haystack.

User Guide

Download the relevant audio files generated by ChatTTS and run the following commands:

python download_audios.py
python webui_speaker_needle_in_haystack.py

Alternatively, you can manually place the WAV files in the "audios" folder and run the command:

mkdir audios
python webui_speaker_needle_in_haystack.py

If you need help or want to reduce the batch size (if you don't have enough GPU memory), you can use the following commands:

python webui_speaker_needle_in_haystack.py --help
python webui_speaker_needle_in_haystack.py --batch_size=1

To generate audio using ChatTTS and save it in the "audios" folder, please refer to the relevant parameters in the generate_audios_chattts.py file:

cd Audio-Speaker-Needle-In-Haystack
git clone https://github.com/2noise/ChatTTS.git
python generate_audios_chattts.py

References

About

Finding the most similar tone/color in a large collection of audio. 在一大堆音频中寻找最相似的音色。

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages