-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
stopword词典加载问题 #530
Comments
通过分析代码,真正的问题发生在MDAG.java中 将原来的IOAdapter.open(dataFile.getAbsolutePath())改成 IOAdapter.open(dataFile.getPath())即可 |
感谢建议
|
我的版本是1.3.2的,上面写成了1.3.0了,写错了。 |
感谢建议,以File参数构造MDAG的确与InputStream不兼容。现在已经改为直接由IOAdapter打开的InputStream读取,欢迎测试。 |
我现在用的是hanlp 1.3.0版本. 在分析CoreStopWordDictionary.java发现以下词典加载语句:
dictionary = new StopWordDictionary(new File(HanLP.Config.CoreStopWordDictionaryPath));
之前的核心词典,用户自定义词典等均采用以下方式。以核心词典为例:CoreDictionary.java
br = new BufferedReader(new InputStreamReader(IOUtil.newInputStream(path), "UTF-8"));
是采用IOUtil的统一接口。
而StopWordDictionary直接使用了File来做,造成了不统一。是否考虑对CoreStopWordDictionary建立统一性?
因为我自己定义的JarIOAdapter.java:
public class JarIOAdapter implements IIOAdapter
{
@OverRide
public InputStream open(String path) throws FileNotFoundException
{
/*
采用第一行的方式加载资料会在分布式环境报错
改用第二行的方式
*/
//return ClassLoader.getSystemClassLoader().getResourceAsStream(path);
return JarIOAdapter.class.getClassLoader().getResourceAsStream(path);
}
}
这里是实现代码与词典数据的分离,单独把hanlp.properties与data目录做成一个jar。但由于CoreStopDictionary.java读文件接口不统一,导致读不到停用词典文件。
作者是否有意把代码与词典数据分成两个jar包,我这边已差不多完成,可以提交代码
The text was updated successfully, but these errors were encountered: