Audio-Grounded Contrastive Learning (WACV’24)

Official pytorch implementation of out paper:

Can CLIP Help Sound Source Localization?

Sooyoung Park*, Arda Senocak*, Joon Son Chung (* Equal Contribution)

WACV 2024

Introduction

This repo is pytorch implementation of Audio-Grounded Contrastive Learning (ACL). Code is very simple and easy to understand fastly.

Some of these codes are based on AudioToken, BEATs, TCL.

Demo:

Required packages

Python = 3.10.8
Pytorch = 1.13.0
transformers = 4.25.1

Installation

$ conda install -c nvidia cudatoolkit=11.7
$ conda install -c conda-forge cudnn
$ conda install python=3.10
$ pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
$ pip install tensorboard
$ pip transformers==4.25.1
$ pip install opencv-python
$ pip install tqdm
$ pip install scikit-learn

Data preparation

Important Note: All audio samples must be converted to 16kHz, and for detailed instructions, refer to the readme in each dataset-specific directory.

Dataset
- VGG-Sound: [Link]
  - VGG-SS: [Link]
- Flickr: [Link]
- AVSBench: [Link]
- Extended VGG-SS/Flickr: [Link]

Model preparation

Downloading pretrained model (audio backbone) in pretrain folder

BEATs: https://github.com/microsoft/unilm/tree/master/beats
- BEATs_iter3_plus_AS2M_finedtuned_on_AS2M_cpt2.pt

Training

Ensure that you check the .sh files and set the $ export CUDA_VISIBLE_DEVICES=”**” according to your hardware setup.
Make sure that —model_name corresponds to the configuration file located at ./config/model/{-model_name}.yaml.
Model files (.pth) will be saved in the directory {—save_path}/Train_record/{-model_name}_{-exp_name}/.
Review the configuration settings in ./config/train/{-train_config}.yaml to ensure they match your training requirements.
Choose one of the following methods to initiate training:

$ sh SingleGPU_Experiment.sh. # For single GPU setup
$ sh Distributed_Experiment.sh. # For multi-GPU setup (DDP)

Test

Before testing, please review the .sh file and set the $ export CUDA_VISIBLE_DEVICES=”**” environment variable according to your hardware configuration.
Ensure that the —model_name parameter corresponds to the configuration file located at ./config/model/{-model_name}.yaml.
Model files (.pth) located in the directory {—save_path}/{-model_name}_{-exp_name}/Param_{-epochs}.pth will be used for testing.
The —epochs parameter can accept either an integer or a list of integers (e.g., 1, 2, 3).
If —epochs is left unspecified (null), the default model file {—save_path}/Train_record/{-model_name}_{-exp_name}/Param_best.pth will be used for testing.

$ sh Test_PTModels

Pretrained models

Important Note: After downloading the Param_best.pth file, move it to the directory {—save_path}/{-model_name}_{-exp_name}/ before use.

VGG-Sound 144k trained model: [Link]
- This model was trained using a 2-GPU setup.
- The reported numbers are the highest, with performance varying across different seeds, and the provided .pth link corresponds to the checkpoint used for the highest result.

Citation

If you use this project, please cite this project as:

@inproceedings{park2023clip,
      title={Can CLIP Help Sound Source Localization?}, 
      author={Sooyoung Park and Arda Senocak and Joon Son Chung},
      journal = {arXiv preprint arXiv:2311.04066},
      year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
AVSBench		AVSBench
Flickr		Flickr
VGGSS		VGGSS
asset		asset
config		config
modules		modules
pretrain		pretrain
Distributed_Experiment.sh		Distributed_Experiment.sh
Eval.py		Eval.py
README.md		README.md
SingleGPU_Experiment.sh		SingleGPU_Experiment.sh
Test_PTModels.py		Test_PTModels.py
Test_PTModels.sh		Test_PTModels.sh
Train_ACL.py		Train_ACL.py
loss_utils.py		loss_utils.py
util.py		util.py
viz_utils.py		viz_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Grounded Contrastive Learning (WACV’24)

Introduction

Required packages

Installation

Data preparation

Model preparation

Training

Test

Pretrained models

Citation

About

Releases

Packages

Languages

swimmiing/ACL-SSL

Folders and files

Latest commit

History

Repository files navigation

Audio-Grounded Contrastive Learning (WACV’24)

Introduction

Required packages

Installation

Data preparation

Model preparation

Training

Test

Pretrained models

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages