Visually Indicated Sounds

Implementation and extension of the paper, Visually Indicated Sounds by Andrew et al. which proposes the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene.

Brief Description of the paper

The authors present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. The authors show that the sounds predicted by their model are realistic enough to fool participants in a “real or fake” psychophysical experiment and that they convey significant information about material properties and physical interactions.

Implementations

All implementation code can be found inside /src/experiments, where each experiment is contained within its own directory.

PaperModel: The model architecture used in the paper.
BiLSTMModel: A modification to the architecture described in the paper, by replacing the LSTM with a Bidirectional LSTM.
VMAEModel: Using modern transformer based architecture for Feature Extraction.
LatentVMAEModel: Switching out Cochleagrams with a Learned Latent Space Representation of the waves through an AutoEncoder, fed into the VMAEModel.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
src		src
.gitignore		.gitignore
README.md		README.md
fetch_test.sh		fetch_test.sh
fetch_train.sh		fetch_train.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visually Indicated Sounds

Brief Description of the paper

Implementations

About

Releases

Packages

Contributors 4

Languages

Aa-Aanegola/visually-indicated-sounds

Folders and files

Latest commit

History

Repository files navigation

Visually Indicated Sounds

Brief Description of the paper

Implementations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages