Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

标志数据错误问题 #48

Open
xyx361100238 opened this issue Mar 29, 2023 · 1 comment
Open

标志数据错误问题 #48

xyx361100238 opened this issue Mar 29, 2023 · 1 comment

Comments

@xyx361100238
Copy link

xyx361100238 commented Mar 29, 2023

你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的:
Y0000000768_10jLYDtPEpg_S00000.wav
原:中国工商银行在国账市场上
正:中国工商银行在国际市场上
Y0000000768_10jLYDtPEpg_S00004.wav
原:我们整个的银行体系已经从技术角皮续产了
正:我们整个的银行体系从技术角度已经续产了

备注:以上音频已经根据切分好的以sid命名的音频文件

这种情况咱们处理,人工筛选成本有点太高了

@robin1001
Copy link
Contributor

从你的抽检看下来,大概是什么样错误比例?因为数据是自动化标注来的,本身有一定的错误率,我们通过自动化算法删选出来高置信度的,但总有一部分漏网之鱼。

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants