Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

加载用户字典不起作用以及实体未识别出来的情况 #45

Open
ShuGao0810 opened this issue Sep 29, 2018 · 3 comments
Open
Labels

Comments

@ShuGao0810
Copy link

ShuGao0810 commented Sep 29, 2018

博主好,foolnltk使用时发现加载用户字典不起作用,不知道是什么原因导致的,具体如下:
环境:win10+python3.6

fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

用户字典格式:
饿了么 10

fool.load_userdict(path)
fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

加载用户字典似乎不起作用?分词时“饿了么”还是被拆开了,实体识别中也没识别出来

@rockyzhengwu
Copy link
Owner

@ShuGao0810 谢谢你的反馈,现在的词典在分词的时候是有效的,analysis 不支持,稍后修改

@xrzlizheng
Copy link

如何加载jieba格式的字典,

@yu45020
Copy link

yu45020 commented Dec 14, 2018

@ShuGao0810
或许可行的解决办法:修改__init__.py
ner 的修改抄 cut 的

这样改好像不行 ><

def ner(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]]
    res = LEXICAL_ANALYSER.ner(text)
-    return res
+    new_words = []
+    if _DICTIONARY.sizes != 0:
+        for sent, words in zip(text, res):
+            words = _mearge_user_words(sent, words)
+            new_words.append(words)
+    else:
+        new_words = res
+    return new_words


def analysis(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]], [[]]
-    res = LEXICAL_ANALYSER.analysis(text)
-    return res
+    word_inf = pos_cut(text)
+    ners = ner(text)
+    return word_inf, ners
a = ['阿里收购饿了么']
fool.load_userdict('foolnltk_userdict.txt')
# fool.delete_userdict()
print(fool.cut(a))
[['阿里', '收购', '饿了么']]

print(fool.analysis(a))
([[('阿里', 'nz'), ('收购', 'v'), ('饿了么', 'nz')]], [['阿里收购', '饿了么']])

@rockyzhengwu
应该是笔误吧: init.py 下

_mearge_user_words -- 改为 --> _merge_user_words

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants