REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

National Taiwan University

This repository is dedicated to the "reborn-uasr" project, an initiative focused on enhancing Unsupervised Automatic Speech Recognition (ASR) through the implementation of Reinforcement Learning (RL) techniques for segmenter training.

Using REBORN Models through Hugging Face 🤗

The simplest way to access the REBORN models is through Hugging Face. We have wrapped our model including PCA dimension reduction matrix, REBORN segmenter, and REBORN generator into the Hugging Face supported form. Furthermore, we've also built the datasets corresponding to the models to Hugging Face (LibrSpeech 100 hours, Multilingual LibriSpeech across 6 languages). For those who want to have a quick start, please checkout our demo on Google Colab.

Summarizing the Card Names

To replicate the REBORN end-to-end unsupervised phoneme recognition result, one would need:

The upstream model (wav2vec 2.0) as feature extracter.
The REBORN model (including the PCA dimension reduction matrix, the segmenter, and the generator).
The corresponding dataset.

Since all of the components are available on Hugging Face, users can follow our demo on Google Colab to generate the results across different datasets by simply replacing card names of the models and datasets. Here, we summarize all the available pairings of the card names below for convenience:

Description	upstream_model_card	reborn_model_card	dataset_card	dataset_name	split
LibriSpeech 100 hour @ iter2-stage1	facebook/wav2vec2-large-lv60	andybi7676/reborn-uasr_ls100h_iter2-stage1	andybi7676/reborn-uasr_librispeech-no-silence-100hr		{train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small}
LibriSpeech 100 hour @ iter5-stage1	facebook/wav2vec2-large-lv60	andybi7676/reborn-uasr_ls100h_iter5-stage1	andybi7676/reborn-uasr_librispeech-no-silence-100hr		{train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small}
Multilingual LibriSpeech 100 hour German @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	german	{train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Dutch @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	dutch	{train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour French @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	french	{train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Spanish @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	spanish	{train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Italian @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	italian	{train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Portuguese @ iter2-stage1	facebook/wav2vec2-large-xlsr-53	andybi7676/reborn-uasr_mls-de_iter2-stage1	andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr	portuguese	{train.100hr, dev, test, dev.small}

By replacing the card names, users can directly experience our pre-trained REBORN models with little efforts.

Prerequisite

If you want to build up the environment and train the REBORN model on your own, please follow the below content first to meet the requirements.

Docker Image (Recommended)

We provide the pre-built docker image on the Docker Hub. The image contains all the dependencies for training reborn. This might be the simpliest way to setup the whole environment if you are familiar with Docker. Type the following command to pull and run the container based on the image.

docker run -it --rm --gpus all andybi7676/reborn-uasr:latest

Note that this is just an example of using the image in interactive mode with all the gpus on your machine. Feel free to use it in your own way. If the gpus are not available inside the container, please verify that nvidia-docker is installed.

Building up the Environment from Source

In this section we are going to give instructions on how to build up the REBORN environment step by step. If you are using the reborn-uasr docker image, you can skip this section directly.

Fairseq

We have attach the fairseq version we use in the folder reborn-uasr/fairseq. You can use it by cloning our repo to make sure that there is no version biases which may possibly lead to unexpected errors.

git clone https://github.com/andybi7676/reborn-uasr.git
cd reborn-uasr/fairseq
pip install -e .

Kenlm

Please follow the instruction from the official repo of kenlm. Please make sure that the python bindings is also installed (pip install https://github.com/kpu/kenlm/archive/master.zip).

Other requirements (python packages)

cd /your/path/to/reborn-uasr
pip install -r requirements.txt

Modify and run path.sh to export fairseq and reborn-uasr to PYTHONPATH.

Modify the /path/to/fairseq to export the corrent fairseq path into the environment.

run source path.sh to append fairseq and reborn-uasr into the PYTHONPATH. The result should be as follow:

(base) username@desktop:/your/path/to/reborn-uasr$ source path.sh 
Added /your/path/to/fairseq to PYTHONPATH
Appended /your/path/to/reborn-uasr to PYTHONPATH
=======================================================================================
FAIRSEQ_ROOT: /your/path/to/fairseq
REBORN_WORK_DIR: /your/path/to/reborn-uasr
PYTHONPATH: /your/path/to/fairseq:/your/path/to/reborn-uasr
Please make sure that FAIRSEQ_ROOT and REBORN_WORK_DIR are in PYTHONPATH
During each runtime, please make sure to run `source path.sh` to set up the environment.
=======================================================================================
Testing the required import functionality...
SUCCESS

Flashlight python bindings (optional)

TBA

Pykaldi and Kaldi (optional)

TBA

Training REBORN

In this section, we will introduce how to train your own reborn model from scratch. Before diving into the training part, we recommend users go through the Prerequisite section and make sure that all the requirements have been satisfied.

We divide the training process into the following three main stages: wav2vec-U initialization, segmenter training, and generator (phoneme prediction model) training.

Data Preparation

Audio preparation

Text preparation

Stage 0: Training wav2vec-U as Initialization

Stage 1: REBORN segmenter training

Behavior Cloning

In this step, we initialize the CNN-based segmenter using pseudo-boundaries derived from a wav2vec-U model. This pretraining step provides a solid starting point before we move on to reinforcement learning.

bash rl/cnn_segmenter/_pretrain.sh

Expected Output: cnn_segmenter.pt in the specified output_dir.

Important Arguments to Adjust:

reborn_dir: Root directory of the reborn-uasr codebase.
output_dir: Directory where pretraining results and checkpoints will be stored.

audio_dir: Directory containing features and boundary files. Example structure:

audio_dir
├── CLUS128
│   ├── train.bds
│   └── valid.bds
├── precompute_pca512
    ├── train.npy
    └── valid.npy

Reinforcement Learning

After pretraining, we refine the segmenter using reinforcement learning. The RL step optimizes the segmenter by considering language model perplexity, phoneme-level token error rates, and length ratio constraints, thereby improving segmentation quality.

bash rl/cnn_segmenter/_train.sh

Expected Output: Multiple RL-updated checkpoints, for example: rl_agent_segmenter_best.pt.

Important Arguments to Adjust:

reborn_dir, output_dir: As in pretraining, ensure these are set correctly.

audio_dir: Move the wav2vec-U logit-segmented phoneme results to:

audio_dir
├── precompute_pca512
│   ├── train.npy
│   ├── train.w2vu_logit_segmented_units
│   ├── valid.npy
│   └── valid.w2vu_logit_segmented_units

kenlm_fpath: Path to the KenLM language model.
Pretrain_segmenter_path: Path to the pretrained segmenter checkpoint from the Behavior Cloning step.
Pretrain_wav2vecu_path: Path to the wav2vec-U checkpoint used for feature extraction/logit generation.
Adjust coef_ter, coef_len, and lr in _train.sh to tune performance.

Evaluation

Use the rl/utils/_evaluate.sh script to evaluate your trained segmenter against development and test splits. This script generates phoneme sequences and compares them against ground truth references.

Key Arguments:

reborn_dir, output_dir: Ensure these match your setup.
generator_ckpt: Path to the wav2vec-U generator model checkpoint.
feats_dir: Directory containing the PCA-reduced features used during evaluation.

Stage 2: REBORN generator training

Boundary post-processing

GAN-training

Reference Repositories

Citation

Please cite this work as:

@article{tseng2024reborn,
  title={REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR},
  author={Tseng, Liang-Hsuan and Hu, En-Pei and Chiang, Cheng-Han and Tseng, Yuan and Lee, Hung-yi and Lee, Lin-shan and Sun, Shao-Hua},
  journal={arXiv preprint arXiv:2402.03988},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
fairseq		fairseq
hf		hf
rl		rl
s2p		s2p
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
path.sh		path.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

National Taiwan University

Using REBORN Models through Hugging Face 🤗

Summarizing the Card Names

Prerequisite

Docker Image (Recommended)

Building up the Environment from Source

Fairseq

Kenlm

Other requirements (python packages)

Flashlight python bindings (optional)

Pykaldi and Kaldi (optional)

Training REBORN

Data Preparation

Audio preparation

Text preparation

Stage 0: Training wav2vec-U as Initialization

Stage 1: REBORN segmenter training

Behavior Cloning

Reinforcement Learning

Evaluation

Stage 2: REBORN generator training

Boundary post-processing

GAN-training

Reference Repositories

Citation

About

Releases

Packages

Contributors 4

Languages

License

andybi7676/reborn-uasr

Folders and files

Latest commit

History

Repository files navigation

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

National Taiwan University

Using REBORN Models through Hugging Face 🤗

Summarizing the Card Names

Prerequisite

Docker Image (Recommended)

Building up the Environment from Source

Fairseq

Kenlm

Other requirements (python packages)

Flashlight python bindings (optional)

Pykaldi and Kaldi (optional)

Training REBORN

Data Preparation

Audio preparation

Text preparation

Stage 0: Training wav2vec-U as Initialization

Stage 1: REBORN segmenter training

Behavior Cloning

Reinforcement Learning

Evaluation

Stage 2: REBORN generator training

Boundary post-processing

GAN-training

Reference Repositories

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages