-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
CharsToNameCanonicalizer
: Internal error on SymbolTable.rehash()
with high number of hash collisions
#547
Comments
Thank you very much for reporting this: sounds like nasty (if rare; which just makes it harder to catch without fuzzing) problem. Any possibility of reducing test to a reproduction I could use to verify? I don't doubt there is problem but it'd be great to guard against regressions. |
@cowtowncoder: I've been trying to reproduce it, and I think I have a way that kinda works. I'm running the existing collision test int count = 0; // let's do sanity check
int size = _symbols.length;
for (int i = 0; i < size; ++i) {
String symbol = _symbols[i];
if (symbol != null) {
++count;
}
}
size >>= 1;
for (int i = 0; i < size; ++i) {
Bucket b = _buckets[i];
while (b != null) {
++count;
}
}
assert(count == _size); The assertion fails about 25-50% of the time when I run the test locally. The failures happen when the index passed to I now believe the issue is in I believe the following fixes the issue: @@ -502,7 +502,7 @@ public final class CharsToNameCanonicalizer
if (collLen > MAX_COLL_CHAIN_LENGTH) {
// 23-May-2014, tatu: Instead of throwing an exception right away,
// let's handle in bit smarter way.
- _handleSpillOverflow(bix, newB);
+ _handleSpillOverflow(bix, newB, index);
} else {
_buckets[bix] = newB;
_longestCollisionList = Math.max(collLen, _longestCollisionList);
@@ -511,7 +511,7 @@ public final class CharsToNameCanonicalizer
return newSymbol;
}
- private void _handleSpillOverflow(int bindex, Bucket newBucket)
+ private void _handleSpillOverflow(int bindex, Bucket newBucket, int index)
{
if (_overflows == null) {
_overflows = new BitSet();
@@ -529,12 +529,37 @@ public final class CharsToNameCanonicalizer
}
}
// regardless, if we get this far, clear up the bucket, adjust size appropriately.
- _symbols[bindex + bindex] = newBucket.symbol;
+ _symbols[index] = newBucket.symbol; |
Yes, I think you are right wrt Just need to reproduce the issue, ideally... I think what I can do is to add a new verification/sanity check method only meant for unit tests -- not cleanest thing but seems like acceptable way. I think sanity check is missing iteration btw (so would get into infinite loop if there are more chained entries), but I can patch that. |
Ok, so I was first unable to reproduce this but... actually, no, I am still unable to reproduce. size >>= 1;
for (int i = 0; i < size; ++i) {
Bucket b = _buckets[i];
while (b != null) {
++count;
}
} needs to be size >>= 1;
for (int i = 0; i < size; ++i) {
while (Bucket b = _buckets[i]; b != null; b = b.next) {
++count;
}
} to avoid missing some entries |
Working on this bit more and realizing that this is an incredible rare codepath as it requires multiple bucket overflow for this specific index. As such its reproduction is kind of tricky. |
CharsToNameCanonicalizer
: Internal error on SymbolTable.rehash()
with high number of hash collisions
Fuzzing jackson-core led to the following exception during CharsToNameCanonicalizer's rehashing:
Rehasing leads to additional entries in the hash table. This is probably due from
_size
not maching the actual number of entries in the hash table. I unfortunately cannot easily reproduce the issue since it is the results of parsing millions of inputs in the same process. Starting another fuzzing run may or may not trigger it.Looking at the code, I think this mismatch could be stemming from
_handleSpillOverflow
. The function reduces the_size
bynewBucket.length
, even though it keeps one item out of that bucket in the table. I believe that line should be_size -= (newBucket.length - 1);
. The code is unfamiliar to me, so it's very possibly I got that wrong.jackson-core was compiled from source (commit c8fe42d in 2.10). jackson-bind and jackson-annotations were downloaded from maven (2.10.0.pr1).
The text was updated successfully, but these errors were encountered: