whistle-n-clap

This is a sound event detection (SED) demo for sounds that you can make around your computer: clapping, breathing, knocking and more. A public domain audio data set is preprocessed for training purposes. The train SED algorithm uses audio features (in this case, log-mel spctrogram) to do inference based on recorded sounds.

=== Usage === 'inference.py' is the main entrance. It loads a trained model, streams audio and do an inference every half second. 'training.py' trains (and retrains) the SED model. Use '/util/wav_to_npz.py' to generate local numpy array first.

=== Data Set === This project uses a subset of the FSD50k data set. https://annotator.freesound.org/fsd/release/FSD50K/

Note: I started the project with the name "whistle-n-clap" without realizing that the data set does not contain whistling sounds. So sorry, no whistling can be detected.

=== Model === This is a simple realization of CNN based on this article: https://zhuanlan.zhihu.com/p/52298361

=== Reference ===

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv 2020. Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S. Y., & Sainath, T. (2019). Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2), 206-219.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.vscode		.vscode
csv		csv
include		include
misc		misc
util		util
.gitignore		.gitignore
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whistle-n-clap

About

Releases

Packages

Languages

License

xseq/whistle-n-clap

Folders and files

Latest commit

History

Repository files navigation

whistle-n-clap

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages