Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

中国人名识别命中率有点低 #418

Closed
cicido opened this issue Mar 6, 2017 · 1 comment
Closed

中国人名识别命中率有点低 #418

cicido opened this issue Mar 6, 2017 · 1 comment

Comments

@cicido
Copy link

cicido commented Mar 6, 2017

我在网上找到sohu1.9G的语料,共110万篇文章。 然后分词,统计各词性的词。 发现人名好多是分错的。head部分结果如下:
都是 nr 50596 20170301
车内 nr 39683 20170301
都有 nr 18472 20170301
来看 nr 12741 20170301
都会 nr 11236 20170301
全车 nr 9696 20170301
刘翔 nr 8981 20170301
胡锦涛 nr 8511 20170301
才是 nr 8322 20170301
都将 nr 8162 20170301
都讯 nr 8150 20170301
苏宁 nr 8106 20170301
才能 nr 7878 20170301
都能 nr 7545 20170301
占比 nr 7511 20170301
都要 nr 7469 20170301
文中所 nr 7412 20170301
全系 nr 7187 20170301
景海鹏 nr 7184 20170301
房企 nr 7176 20170301
曾在 nr 6558 20170301
高出 nr 6360 20170301
孙杨 nr 6077 20170301
刘旺 nr 5987 20170301
安南 nr 5821 20170301
唯冠 nr 5678 20170301
令人 nr 5602 20170301
赛扬 nr 5477 20170301
才会 nr 5340 20170301
小微 nr 5263 20170301
杨幂 nr 5164 20170301
刚需 nr 5145 20170301
宣传 nr 4587 20170301

@hankcs
Copy link
Owner

hankcs commented Mar 9, 2017

#407

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants