-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently #4060
Comments
@andygrove can you help take a look? thanks~ |
@pxLi How can I confirm that the cudf jar in this run included rapidsai/cudf#9537 which should be the fix for this issue? |
I just tested with Databricks 7.3 and I cannot reproduce the issue.
|
the commit info of cudf jar can be simply fetched as we also see failures today in but it passed in rapids_databricks_nightly-dev-github today, looks like the test result is non-deterministic. we may not always reproduce the error here I checked the cudf jar in these failed builds, it is based on 3280be2 which should include the #9537 |
The issue appears to be that the last bucket in the t-digest data is corrupted (intermittently). Here are the values for the last 10 buckets in the t-digest when the test is passing: mean
weight
Here are the last 10 values when the test is failing. The last entry for both mean
weight
|
If I'm not mistaken, the last few entries in the
Aren't these sorted in increasing value of mean? |
|
relates to #4060 skip some of the tests that intermittently fail in 21.12 to make sure they don't affect CI and release. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
`detail::segmented_gather()` inadvertently uses `cuda_default_stream` in some parts of its implementation, while using the user-specified stream in others. This applies to the calls to `copy_range_in_place()`, `allocate_like()`, and `make_lists_column()`. ~This might produce race conditions, which might explain NVIDIA/spark-rapids/issues/4060. It's a rare failure that's quite hard to reproduce.~ This might lead to over-synchronization, though bad output is unlikely. The commit here should sort this out, by switching to the `detail` APIs corresponding to the calls above. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #9679
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Describe the bug
related to #3770
rapids_databricks_nightly-dev-github build ID 212
The text was updated successfully, but these errors were encountered: