Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Configurable low_freq high_freq, dithering #664

Merged
merged 4 commits into from
Mar 22, 2024

Conversation

KarelVesely84
Copy link
Contributor

add support of non-default config of FBANK features

  • i built telephone speech ASR system with 56-dim fbanks with low_freq=80, high_freq=3600
  • i have a signal with hard-zeros in it, so dithering would be valuable

(i noticed that dithering is not yet in https://github.com/csukuangfj/kaldi-native-fbank)
(i think of porting it there from kaldi, just the constant will be 1/(2^15)-times smaller)

@KarelVesely84 KarelVesely84 changed the title [WIP] Configurable low freq high freq [WIP] Configurable low_freq high_freq, dithering Mar 13, 2024
@csukuangfj
Copy link
Collaborator

Thanks!

@csukuangfj
Copy link
Collaborator

(i noticed that dithering is not yet in https://github.com/csukuangfj/kaldi-native-fbank

Would you like to add it?

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 13, 2024

yes, PR is here: csukuangfj/kaldi-native-fbank#40

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 13, 2024

is there a way to install kaldi-native-fbank for sherpa-onnx from a customized branch ?
(ideally existing installation in current conda env, or in a local build directory)

i have seen the cmake download of kaldi-native-fbank, .../tags/v1.18.7.tar.gz in cmake/kaldi-native-fbank.cmake,
but i dunno how it is wired inside the overall cmake build...

@csukuangfj
Copy link
Collaborator

is there a way to install kaldi-native-fbank for sherpa-onnx from a customized branch

Yes.

i have seen the cmake download of kaldi-native-fbank, .../tags/v1.18.7.tar.gz in cmake/kaldi-native-fbank.cmake,
but i dunno how it is wired inside the overall cmake build...

You only need to change the URL to point to a specific commit.

In your specific case, your repo is
https://github.com/KarelVesely84/kaldi-native-fbank
and the commit you want is
KarelVesely84/kaldi-native-fbank@926fa47

Note that the commit id is 926fa4787977bcd8157c7e7a0cfee5a4597a92e4, so you need to replace

set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.18.7.tar.gz")

with

 set(kaldi_native_fbank_URL  "https://github.com/KarelVesely84/kaldi-native-fbank/archive/926fa4787977bcd8157c7e7a0cfee5a4597a92e4.zip") 

By the way, you need to change the HASH

set(kaldi_native_fbank_HASH "SHA256=e78fd9d481d83d7d6d1be0012752e6531cb614e030558a3491e3c033cb8e0e4e")

You can leave the hash empty by using

 set(kaldi_native_fbank_HASH) 

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 364ec69 to 4f04fb8 Compare March 14, 2024 15:13
@KarelVesely84
Copy link
Contributor Author

okay, i'll configure it with the updated kaldi-native-fbank, so the unit-tests are already with the newer version

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 14, 2024

Otherwise, the 80-3600Hz FBANK ASR appeared to be quite sensitive to the dithering constant:

|   | System                          | Eval        | Comment                    |
|---|---------------------------------|-------------|----------------------------|
| D | small (24M), 80-3600Hz, no-aug  | 77.9        | feat-dim56, dither 0.0     |
|   |                                 | 76.9        | feat-dim56, dither 0.00003 |
|   |                                 | 74.0        | feat-dim56, dither 0.0001  |
|   |                                 | 67.2        | feat-dim56, dither 0.0003  |
|   |                                 | 63.3        | feat-dim56, dither 0.001   |
|   |                                 | 63.3        | feat-dim56, dither 0.003   |
|   |                                 | 70.0        | feat-dim56, dither 0.01    |

There was a problem with "too many deletions", which dithering mitigated.

But still, there seems to be another problem too...

  • Maybe missing empty segments in ASR training data.
  • Maybe audio channel mismatch.
  • Maybe other inconsistency of train/sherpa features...

But this should not block this PR from being merged.
The hight_freq, low_freq, non 80-dim features and dithering works...

K.

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from bf88d1b to 678bb71 Compare March 18, 2024 09:57
@KarelVesely84
Copy link
Contributor Author

with the update of kaldi-native-fbank, there is a problem in the android build:

[  9%] Building CXX object _deps/kaldi_native_fbank-build/kaldi-native-fbank/csrc/CMakeFiles/kaldi-native-fbank-core.dir/online-feature.cc.o
/home/runner/work/sherpa-onnx/sherpa-onnx/build-android-arm64-v8a/_deps/kaldi_native_fbank-src/kaldi-native-fbank/csrc/kaldi-math.cc:19:26: error: no member named 'mutex' in namespace 'std'

But there are mutexes elsewhere in the sherpa-onnx, and the kaldi-native-fbank is compiled in C++14 standard.
Is this a big problem ? Do you have an idea how to fix it ?

@csukuangfj
Copy link
Collaborator

csukuangfj commented Mar 19, 2024

I see. kaldi-math.cc lacks of the header

#include <mutex>

You can change it locally in your build directory and see it if fixes the issue.

@csukuangfj
Copy link
Collaborator

with the update of kaldi-native-fbank, there is a problem in the android build:

[  9%] Building CXX object _deps/kaldi_native_fbank-build/kaldi-native-fbank/csrc/CMakeFiles/kaldi-native-fbank-core.dir/online-feature.cc.o
/home/runner/work/sherpa-onnx/sherpa-onnx/build-android-arm64-v8a/_deps/kaldi_native_fbank-src/kaldi-native-fbank/csrc/kaldi-math.cc:19:26: error: no member named 'mutex' in namespace 'std'

But there are mutexes elsewhere in the sherpa-onnx, and the kaldi-native-fbank is compiled in C++14 standard. Is this a big problem ? Do you have an idea how to fix it ?

I have fixed it in csukuangfj/kaldi-native-fbank#43

Please use kaldi-native-fbank v1.19.1

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 97429cd to 5afda1e Compare March 20, 2024 17:29
…and python API

- I am buliding ASR system for telephone speech, in this scenario it is
  good to limit the bandwidth to 80-3600Hz.
- The default remains the 20-7600Hz.
- we use less features 80->56
- we pass `feature_dim` to `OnlineZipformer2TransducerModel::GetEncoderInitStates()` via new method `OnlineTransducerModel::SetFeatureDim(.)`
- the value 19 corresponding to 80-dim features was previously hard-coded, in this PR it is imported from the `FeatureExtractorConfig`
- currently, dithering is not yet implemented in https://github.com/csukuangfj/kaldi-native-fbank
- i can port it there from kaldi
@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 5afda1e to a940e56 Compare March 20, 2024 17:30
@KarelVesely84 KarelVesely84 changed the title [WIP] Configurable low_freq high_freq, dithering Configurable low_freq high_freq, dithering Mar 20, 2024
Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me. Left some minor comments.

set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.18.7.tar.gz")
set(kaldi_native_fbank_URL2 "https://huggingface.co/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldi-native-fbank-1.18.7.tar.gz")
set(kaldi_native_fbank_HASH "SHA256=e78fd9d481d83d7d6d1be0012752e6531cb614e030558a3491e3c033cb8e0e4e")
set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.19.1.tar.gz")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change lines 15-19 in this file.

@@ -101,6 +105,7 @@ class OnlineZipformer2TransducerModel : public OnlineTransducerModel {

int32_t context_size_ = 0;
int32_t vocab_size_ = 0;
int32_t feature_dim_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the default value to 80?

@csukuangfj
Copy link
Collaborator

Thank you! I am merging it.

@csukuangfj csukuangfj merged commit eaec4c8 into k2-fsa:master Mar 22, 2024
218 of 227 checks passed
XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants