繁体的分词 #115

guoweihua1982 · 2018-05-28T05:31:04Z

貌似现在对繁体字的分词没有很好地支持。
请问是我的做法不对，还是现在只支持简体字分词？

比如下面的例子，简体字的地名等都能很好地分词，变成繁体字就被拆分成一个一个汉字了。

>>> s = '十條推薦東京必去路線'
>>> pynlpir.segment(s)
[('十', 'numeral'), ('條', 'noun'), ('推', 'verb'), ('薦', 'noun'), ('東', 'noun'), ('京', 'distinguishing word'), ('必', 'adverb'), ('去路', 'noun'), ('線', 'noun')]

>>> s = '十条推荐东京必去路线'
>>> pynlpir.segment(s)
[('十', 'numeral'), ('条', 'classifier'), ('推荐', 'verb'), ('东京', 'noun'), ('必', 'adverb'), ('去', 'verb'), ('路线', 'noun')]


>>> s = '台湾好好玩'
>>> pynlpir.segment(s)
[('台湾', 'noun'), ('好', 'adjective'), ('好玩', 'adjective')]

>>> s = '台灣好好玩'
>>> pynlpir.segment(s)
[('台', 'distinguishing word'), ('灣', 'noun'), ('好好', 'adverb'), ('玩', 'verb')]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

繁体的分词 #115

繁体的分词 #115

guoweihua1982 commented May 28, 2018 •

edited

Loading

繁体的分词 #115

繁体的分词 #115

Comments

guoweihua1982 commented May 28, 2018 • edited Loading

guoweihua1982 commented May 28, 2018 •

edited

Loading