Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYSTEMDS-418] Performance lineage tracing w/ probing (better hashing)
This patch fixes an interesting performance bug caused by the recursive hash computation of lineage items. Due to repeated operation sequences (from loop iterations) and integer overflows during the hash computation, there were systematic hash sequence within one lineage DAG. This in turn lead to less pruning power on recursive equals computations, and collisions in the lineage cache, leading to even more recursive equals comparisons. The fix is simple. We now handle such overflows on hash aggregation (e.g., hash(int,int)) with a long instead of int hash function on demand. On the following test scenario for(i in 1:1000) X = ((X + X) * 2 - X) / 3 the previous runtime was 162s while with this patch it reduced to 0.244s. Even with 10K iterations, the runtime is still 1.1s, which suggests that any super-linear behavior has been eliminated.
- Loading branch information