Name		Name	Last commit message	Last commit date
parent directory ..
tokenizer_bert		tokenizer_bert
tokenizer_gpt2		tokenizer_gpt2
LICENSE		LICENSE
README.md		README.md
captions.txt		captions.txt
input.wav		input.wav
msclap.py		msclap.py

README.md

Microsoft CLAP

Input

audio file

Audio file in wav format to use as the model's input. Default file name is input.wav (source: https://freesound.org/people/InspectorJ/sounds/456440/)

text file

A text file containing sentences separated by new lines. Default file name is captions.txt

Output

Cosine similarities

Cosine similarity between the input audio and the sentences in the text file.

Usage

Internet connection is required when running the script for the first time, as the model files will be automatically downloaded.

Running the script will compute the cosine similarities between the audio and the captions, using audio and language encoder models train by contrastive training.

You can switch the versions of the encoder model's weight (2022 or 2023) by specifying the version using the argument -v or --version. For more information on arguments, try running python3 msclap.py --help

$ python3 msclap.py -t captions.txt -a input.wav -v 2023
 INFO arg_utils.py (13) : Start!
 INFO arg_utils.py (158) : env_id updated to 0
 INFO arg_utils.py (163) : env_id: 0
 INFO arg_utils.py (166) : CPU
 INFO msclap.py (167) : input_text: ['Dog barking.', 'Birds whistling.', 'Car passing by.', 'Wind blowing.', 'Water flowing.', 'People talking.']
 INFO msclap.py (170) : inference has started...
Similarity: 
    Birds whistling.: 0.41247469186782837
       Wind blowing.: 0.2643369734287262
      Water flowing.: 0.23884761333465576
     Car passing by.: 0.22803542017936707
     People talking.: 0.17387858033180237
        Dog barking.: 0.11309497803449631
 INFO msclap.py (192) : Script finished successfully.

Reference

CLAP

Framework

Pytorch

Model Format

ONNX opset=11

Netron

caption_model_2023.onnx.prototxt

audio_model_2023.onnx.prototxt

caption_model_2022.onnx.prototxt

audio_model_2022.onnx.prototxt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

msclap

msclap

README.md

Microsoft CLAP

Input

Output

Usage

Reference

Framework

Model Format

Netron

Files

msclap

Directory actions

More options

Directory actions

More options

Latest commit

History

msclap

Folders and files

parent directory

README.md

Microsoft CLAP

Input

Output

Usage

Reference

Framework

Model Format

Netron