Skip to content

Commit

Permalink
积累了一些优化和调整,小版本+1
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed Apr 15, 2015
1 parent bc9ac9f commit c82faae
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ public class DemoCustomDictionary

// AhoCorasickDoubleArrayTrie自动机分词
final char[] charArray = text.toCharArray();
CoreDictionary.trie.parseText(charArray, new AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute>()
CustomDictionary.parseText(charArray, new AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute>()
{
@Override
public void hit(int begin, int end, CoreDictionary.Attribute value)
Expand Down Expand Up @@ -303,6 +303,8 @@ public class DemoCustomDictionary
- 词典格式
* 每一行代表一个单词,格式遵从`[单词] [词性A] [A的频次] [词性B] [B的频次] ...` 如果不填词性则表示采用词典的默认词性。
* 词典的默认词性默认是名词n,可以通过配置文件修改:`全国地名大全.txt ns;`如果词典路径后面空格紧接着词性,则该词典默认是该词性。
* 自定义词典的优先级要低于核心词典,关于这一点,如果你有不同意见,可以讨论
* 在基于层叠隐马模型的最短路分词中,并不保证自定义词典中的词一定被切分出来。如果你认为这个词绝对应该切分出来,那么请将词频设大一些
* 关于用户词典的更多信息请参考**词典说明**一章。
- 算法详解
* [《Trie树分词》](http://www.hankcs.com/program/java/tire-tree-participle.html)
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>1.1.2</version>
<version>1.1.3</version>

<name>HanLP</name>
<url>http://www.hankcs.com/</url>
Expand Down

0 comments on commit c82faae

Please # to comment.