Official implementation of Self-Remixing, an unsupervised sound separation framework. Self-Remixing not only works when fine-tuning pre-trained models but when training from scratch, as shown in our paper.
This repository supports several single-channel sound separation methods:
- Mixture invariant training (MixIT) from Asteroid toolkit.
- Efficient MixIT (unofficial implementation)
- MixIT with source sparsity loss (unofficial implementation)
- RemixIT (unofficial implementation)
- Self-Remixing
This repo supports training on some public dataset.
Clone this repo
git clone https://github.com/kohei0209/self-remixing
Create anaconda environment
# change directory
cd self-remixing
# create environmenet and activate
conda env create -f environment.yaml
conda activate selfremixing
Once creating the environmenet, training can be run as follows
python train.py /path/to/config /path/to/dataset
Currently, this repository supports training with the following public datasets.
- SMS-WSJ
- Free universal sound separation (FUSS)
- (To do) Libri2mix
- (To do) WSJ-mix used in Self-Remixing paper
Some config files for each dataset and each algorithm are prepared in configs/"dataset_name"/"algorithm_name"
.
For example, if you use SMS-WSJ, the command to run Self-Remixing training from scratch is
python train.py configs/smswsj/selfremixing/selfremixing_tfgridnet_cbs+cs_mrl1.yaml /path/to/smswsj
Note that we use Weights and Bias (wandb) for logging. One can change entity
in the line 368 of train.py
to his/her user name.
When you use SMS-WSJ, evaluation can be done as follows. Speech metrics are first evaluated and then WER is evaluated using Whisper Large v2.
run_tests_wsj.sh /path/to/model_directory /path/to/smswsj
When using FUSS, evaluation can be done as
run_tests_fuss.sh /path/to/model_directory /path/to/fuss
- Support Libri2Mix and WSJ-mix
- Support DDP
2024 Kohei Saijo, Waseda University.
All of this code except for the code from ESPnet is released under MIT License.
This repository includes the code from ESPnet released under Apache 2.0 license and the code from Asteroid toolkit released under MIT License.
models/conformer.py
from ESPnetmodels/tfgridnetv2.py
from ESPnetmy_torch_utils/stft.py
from ESPnetlosses/mixit_wrapper.py
from Asteroidlosses/pit_wrapper.py
from Asteroiddatasets/fuss_dataset.py
from Asteroiddatasets/librimix_dataset.py
from Asteroid
@INPROCEEDINGS{saijo23_self,
author={Saijo, Kohei and Ogawa, Tetsuji},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10095596}
}
@inproceedings{saijo23_interspeech,
author={Kohei Saijo and Tetsuji Ogawa},
title={{Remixing-based Unsupervised Source Separation from Scratch}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={1678--1682},
doi={10.21437/Interspeech.2023-1389}
}