We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的: Y0000000768_10jLYDtPEpg_S00000.wav 原:中国工商银行在国账市场上 正:中国工商银行在国际市场上 Y0000000768_10jLYDtPEpg_S00004.wav 原:我们整个的银行体系已经从技术角皮续产了 正:我们整个的银行体系从技术角度已经续产了
备注:以上音频已经根据切分好的以sid命名的音频文件
这种情况咱们处理,人工筛选成本有点太高了
The text was updated successfully, but these errors were encountered:
从你的抽检看下来,大概是什么样错误比例?因为数据是自动化标注来的,本身有一定的错误率,我们通过自动化算法删选出来高置信度的,但总有一部分漏网之鱼。
Sorry, something went wrong.
No branches or pull requests
你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的:
Y0000000768_10jLYDtPEpg_S00000.wav
原:中国工商银行在国账市场上
正:中国工商银行在国际市场上
Y0000000768_10jLYDtPEpg_S00004.wav
原:我们整个的银行体系已经从技术角皮续产了
正:我们整个的银行体系从技术角度已经续产了
备注:以上音频已经根据切分好的以sid命名的音频文件
这种情况咱们处理,人工筛选成本有点太高了
The text was updated successfully, but these errors were encountered: