We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
貌似现在对繁体字的分词没有很好地支持。 请问是我的做法不对,还是现在只支持简体字分词?
比如下面的例子,简体字的地名等都能很好地分词,变成繁体字就被拆分成一个一个汉字了。
>>> s = '十條推薦東京必去路線' >>> pynlpir.segment(s) [('十', 'numeral'), ('條', 'noun'), ('推', 'verb'), ('薦', 'noun'), ('東', 'noun'), ('京', 'distinguishing word'), ('必', 'adverb'), ('去路', 'noun'), ('線', 'noun')] >>> s = '十条推荐东京必去路线' >>> pynlpir.segment(s) [('十', 'numeral'), ('条', 'classifier'), ('推荐', 'verb'), ('东京', 'noun'), ('必', 'adverb'), ('去', 'verb'), ('路线', 'noun')] >>> s = '台湾好好玩' >>> pynlpir.segment(s) [('台湾', 'noun'), ('好', 'adjective'), ('好玩', 'adjective')] >>> s = '台灣好好玩' >>> pynlpir.segment(s) [('台', 'distinguishing word'), ('灣', 'noun'), ('好好', 'adverb'), ('玩', 'verb')]
The text was updated successfully, but these errors were encountered:
No branches or pull requests
貌似现在对繁体字的分词没有很好地支持。
请问是我的做法不对,还是现在只支持简体字分词?
比如下面的例子,简体字的地名等都能很好地分词,变成繁体字就被拆分成一个一个汉字了。
The text was updated successfully, but these errors were encountered: