Skip to content

A sound event detection (SED) demo for sounds that you can make around your computer: whistling, clapping, breathing, knocking and more.

License

Notifications You must be signed in to change notification settings

xseq/whistle-n-clap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whistle-n-clap

This is a sound event detection (SED) demo for sounds that you can make around your computer: clapping, breathing, knocking and more. A public domain audio data set is preprocessed for training purposes. The train SED algorithm uses audio features (in this case, log-mel spctrogram) to do inference based on recorded sounds.

=== Usage === 'inference.py' is the main entrance. It loads a trained model, streams audio and do an inference every half second. 'training.py' trains (and retrains) the SED model. Use '/util/wav_to_npz.py' to generate local numpy array first.

=== Data Set === This project uses a subset of the FSD50k data set. https://annotator.freesound.org/fsd/release/FSD50K/

Note: I started the project with the name "whistle-n-clap" without realizing that the data set does not contain whistling sounds. So sorry, no whistling can be detected.

=== Model === This is a simple realization of CNN based on this article: https://zhuanlan.zhihu.com/p/52298361

=== Reference ===

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv 2020. Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S. Y., & Sainath, T. (2019). Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2), 206-219.

About

A sound event detection (SED) demo for sounds that you can make around your computer: whistling, clapping, breathing, knocking and more.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages