Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Hyperloglog Example Not working #69

Closed
dbinnersley opened this issue Apr 2, 2024 · 3 comments
Closed

Hyperloglog Example Not working #69

dbinnersley opened this issue Apr 2, 2024 · 3 comments

Comments

@dbinnersley
Copy link

The example hyperloglog that is provided in the documentation does not work as expected. When counting the sketch when initialized without any updates, I get 71.36002532672464, when I update the sketch with alice, the count returns the same number. Is there something that I'm missing?

@timd73
Copy link

timd73 commented Jun 10, 2024

The HyperLogLog implementation seems totally wrong.
I tried the following test:

const { HyperLogLog } = require("bloom-filters");

for (let factor = 1; factor < 10; factor++) {
  const hll = new HyperLogLog(128);
  const nbDistinct = 100 * factor;
  for (let i = 0; i < 10e4; i++) {
    hll.update(`${i % nbDistinct}`);
  }
  console.log(nbDistinct, hll.count(), hll.accuracy());
}

The results:

100 104.48808480699336 0.09192388155425117
200 111.58038795620163 0.09192388155425117
300 116.24992796444349 0.09192388155425117
400 118.20574826550012 0.09192388155425117
500 119.23934383856889 0.09192388155425117
600 119.40069613063731 0.09192388155425117
700 119.79164234860218 0.09192388155425117
800 119.90655039881668 0.09192388155425117
900 119.93531186449161 0.09192388155425117

The estimates are completely wrong (except for 100 and that seems like it might be just chance).

Something is simply not right here.

@folkvir
Copy link
Collaborator

folkvir commented Jun 11, 2024

@folkvir folkvir closed this as completed Jun 11, 2024
@jakes-space
Copy link

jakes-space commented Feb 1, 2025

Re-running the test that timd73 posted using v3.0.4, I get:

100 43.6698569475542 0.09192388155425117
200 99.13508096177891 0.09192388155425117
300 390.2011954189235 0.09192388155425117
400 449.0102358719439 0.09192388155425117
500 518.5484212217428 0.09192388155425117
600 583.7264113198547 0.09192388155425117
700 686.1205010813935 0.09192388155425117
800 794.0347756179166 0.09192388155425117
900 903.0211124925557 0.09192388155425117

Works well for large cardinality; misses the mark by many estimated sigma below 500.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants