Skip to content

HllSketch performance improvement for strings

Compare
Choose a tag to compare
@AlexanderSaydakov AlexanderSaydakov released this 30 Jun 22:45
· 159 commits to master since this release
  • HLL DataToSketchUDAF: Input strings are converted to char[] before passing to HllSketch. This is substantially faster than passing strings due to avoiding UTF-8 conversion process. Warning: effectively a different hash function is used for strings. So unions of sketches produced by this version and the previous version will have no overlap, and therefore produce incorrect results. We recommend upgrading to this version, and, if any sketches have been created with string inputs and stored, we recommend recomputing them from the raw data.