Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

关于jdk7中 使用TextRankKeyword提取关键词报Comparison method violates its general contract!异常 #11

Closed
a198720 opened this issue May 8, 2015 · 1 comment

Comments

@a198720
Copy link

a198720 commented May 8, 2015

测试代码:
String src = "data/test.txt";
Scanner scanner = new Scanner(Paths.get(src),"gbk");
StringBuilder sb = new StringBuilder();
while(scanner.hasNextLine()){
sb.append(scanner.nextLine().trim());
}
// System.out.println(sb.toString());
scanner.close();
System.out.println(TextRankKeyword.getKeywordList(sb.toString(), 20));

错误代码:
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeLo(Unknown Source)
at java.util.TimSort.mergeAt(Unknown Source)
at java.util.TimSort.mergeCollapse(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at java.util.Collections.sort(Unknown Source)
at com.hankcs.hanlp.summary.TextRankKeyword.getKeyword(TextRankKeyword.java:115)
at com.hankcs.hanlp.summary.TextRankKeyword.getKeywordList(TextRankKeyword.java:47)

经过网上搜索:
http://www.tuicool.com/articles/MZreyuv
http://blog.csdn.net/ghsau/article/details/42012365

发现是jdk7 中 Collections的排序算法已经发生变化,需要处理两个比较对象相等的情况.
由于TextRankKeyword中的比较对象是Float对象,所以我查了下Float的compare方法(Float是实现Comparable接口的).代码:
public static int compare(float f1, float f2) {
if (f1 < f2)
return -1; // Neither val is NaN, thisVal is smaller
if (f1 > f2)
return 1; // Neither val is NaN, thisVal is larger

    // Cannot use floatToRawIntBits because of possibility of NaNs.
    int thisBits    = Float.floatToIntBits(f1);
    int anotherBits = Float.floatToIntBits(f2);

    return (thisBits == anotherBits ?  0 : // Values are equal
            (thisBits < anotherBits ? -1 : // (-0.0, 0.0) or (!NaN, NaN)
             1));                          // (0.0, -0.0) or (NaN, !NaN)
}

所以我将博主的代码:
Collections.sort(entryList, new Comparator<Map.Entry<String, Float>>()
{
@OverRide
public int compare(Map.Entry<String, Float> o1, Map.Entry<String, Float> o2)
{
return (o1.getValue() - o2.getValue() > 0 ? -1 : 1);
}
});

改为了:
Collections.sort(entryList, new Comparator<Map.Entry<String, Float>>()
{
@OverRide
public int compare(Map.Entry<String, Float> o1, Map.Entry<String, Float> o2)
{
return Float.compare(o1.getValue(),o1.getValue());
}
});

这样就不报错了.

请博主参考哈. 建议最好代码中的所有Float参数的比较实现都采用这中方式.

hankcs added a commit that referenced this issue May 8, 2015
@hankcs
Copy link
Owner

hankcs commented May 8, 2015

感谢指正,已经修复。
其实当初写这个TextRankKeyword的时候就发现了这个问题,当时采取的规避措施是:

    public TextRankKeyword()
    {
        // jdk bug : Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
        System.setProperty("java.util.Arrays.useLegacyMergeSort", "true");
    }

不过在你的JDK7中,似乎没有生效。
Anyway,你跟我想到一块儿去了,我现在将排序改为:
621178e
这样就没问题了。

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants