Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Does mood_happy-msd-musicnn work in real time? #1459

Open
feibuguocanghai opened this issue Jan 17, 2025 · 1 comment
Open

Does mood_happy-msd-musicnn work in real time? #1459

feibuguocanghai opened this issue Jan 17, 2025 · 1 comment

Comments

@feibuguocanghai
Copy link

Is it possible to use mood_happy-msd-musicnn to make accurate inferences on a 3-second audio clip?

With the following code:

from essentia.standard import MonoLoader, TensorflowPredictMusiCNN, TensorflowPredict2D

audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
embedding_model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb", output="model/dense/BiasAdd")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="mood_happy-msd-musicnn-1.pb", output="model/Softmax")
predictions = model(embeddings)

I read the C++ code of TensorflowPredictMusiCNN function and found that the input signal requirements are frameSize=512, hopSize=256, samplerate =16000.

Then I found that the input data requirement for mood_happy-msd-musicnn-1.pb is: batchsize * 187 *96.
According to this information, I calculated that the minimum audio duration required by this model is about 3 seconds, I am not sure whether the calculation is correct or not.
In addition, can 3 seconds of data accurately deduce the correct result?
Do I need to average enough results to make the results more accurate?

@feibuguocanghai feibuguocanghai changed the title how to run BeatTrackerDegara in c++ standard mode? Does mood_happy-msd-musicnn work in real time? Jan 17, 2025
@palonso
Copy link
Contributor

palonso commented Jan 22, 2025

Hi @feibuguocanghai, you are right, this model operates in windows of 3 seconds.

Individual estimations based on 3-second windows are expected to be noisy, and to obtain more accurate results you can average predictions over time.

If you want a real-time system, you could go for something like a moving average. You can have a look at our tutorial for real-time usage of the models.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants