add tests for scan_by_key in-place execution #1696

gevtushenko · 2022-05-19T09:56:20Z

Our scan by key implementation doesn't allow keys/result aliasing. This is caused by non-synchronized reading of d_keys_in[tile_base - 1] in the following code. Therefore, aliasing of keys and results introduces a data race. The following code can be used as a reproducer:

#include <thrust/scan.h>
#include <thrust/device_vector.h>

int main() {
  const std::size_t n = 64 * 1024 * 1024;
  thrust::device_vector<int> keys(n, 1);
  thrust::device_vector<int> values(n, 1);
  thrust::device_vector<int> output_storage(n);

  // thrust::device_vector<int> &output = values; // alias is OK
  thrust::device_vector<int> &output = keys; // alias is NOT OK

  thrust::exclusive_scan_by_key(keys.begin(), keys.end(), values.begin(), output.begin());

  if (output[n - 1] != n - 1) {
    std::cerr << "wrong result: " << output[n - 1] << " != " << n << std::endl;
  }
}

We also don't have tests for this use case. I can fix the code to guarantee this behaviour (just like in adjacent difference), but it'll slow down non-in-place execution in some cases. My suggestion is to have a quick fix of documentation.

alliepiper · 2022-05-19T15:46:39Z

Summarizing some offline discussions and testing:

This has been broken for years, and is not a regression introduced by the recent changes to the scan_by_key implementation.
The common case of overlapping values_in/values_out is fine, the only issue is overlapping keys_in/values_out, which is a somewhat odd usecase.
Based on similar situations, fixing this is expected to introduce ~15% slowdown.

@senior-zero will be investigating this a bit more before we make a decision here.

gevtushenko · 2022-05-20T08:36:17Z

I've managed to fuse some kernels to reduce slowdown up to 6% on small to moderate problem sizes. The performance regression disappears at about 2^20 items for std::uint{16,32,64}_t and at about 2^22 for std::uint8_t.

alliepiper · 2022-05-20T16:04:32Z

6% for small inputs and no impact on larger inputs seems worthwhile to restore the documented guarantees. IMO we should update the implementation.

gevtushenko · 2022-05-24T19:38:23Z

@allisonvacanti I've bumped CUB version to one that incorporates updated scan by key. I'd like to re purpose this PR to update tests in thrust. Please, review the changes.

gevtushenko · 2022-05-24T19:38:40Z

run tests

alliepiper

Let's also add tests to the thrust/testing/scan_by_key.*.cu files to make sure the non-CUDA backends test this, too.

gevtushenko · 2022-06-27T08:04:45Z

run tests

miscco · 2022-07-27T09:39:41Z

testing/cuda/scan_by_key.cu

+  // in-place scans: keys/values aliasing
+  thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());
+  inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());
+  {
+    cudaError_t const err = cudaDeviceSynchronize();
+    ASSERT_EQUAL(cudaSuccess, err);
+  }
+  ASSERT_EQUAL(d_keys, h_output);


Nitpick, no change requested: I would prefer if we put the whole test in its own block rather than only the check, as that helps readability a lot

Suggested change

// in-place scans: keys/values aliasing

thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());

inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());

{

cudaError_t const err = cudaDeviceSynchronize();

ASSERT_EQUAL(cudaSuccess, err);

}

ASSERT_EQUAL(d_keys, h_output);

{ // in-place scans: keys/values aliasing

thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());

inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());

cudaError_t const err = cudaDeviceSynchronize();

ASSERT_EQUAL(cudaSuccess, err);

ASSERT_EQUAL(d_keys, h_output);

}

gevtushenko requested review from alliepiper and ericniebler May 19, 2022 09:56

alliepiper added this to the 2.0.0 milestone May 19, 2022

gevtushenko force-pushed the fix-main/github/scan_by_key_docs branch from b5126d6 to aa55ec2 Compare May 24, 2022 19:35

gevtushenko removed only: docs Documentation changes only. Doesn't need code CI. area: docs Related to documentation. labels May 24, 2022

alliepiper suggested changes Jun 24, 2022

View reviewed changes

alliepiper added P1: should have Necessary, but not critical. and removed P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. labels Jun 24, 2022

Fix scan by key docs for in-place execution

380870d

gevtushenko force-pushed the fix-main/github/scan_by_key_docs branch from aa55ec2 to 380870d Compare June 27, 2022 08:04

gevtushenko requested a review from alliepiper June 27, 2022 08:04

alliepiper self-assigned this Jul 25, 2022

alliepiper approved these changes Jul 25, 2022

View reviewed changes

alliepiper changed the title ~~Fix scan by key docs for in-place execution~~ add tests for scan_by_key in-place execution Jul 25, 2022

alliepiper removed their assignment Jul 25, 2022

gevtushenko merged commit a7aeeb6 into NVIDIA:main Jul 26, 2022

miscco reviewed Jul 27, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tests for scan_by_key in-place execution #1696

add tests for scan_by_key in-place execution #1696

gevtushenko commented May 19, 2022

alliepiper commented May 19, 2022

gevtushenko commented May 20, 2022

alliepiper commented May 20, 2022

gevtushenko commented May 24, 2022

gevtushenko commented May 24, 2022

alliepiper left a comment

gevtushenko commented Jun 27, 2022

miscco Jul 27, 2022

add tests for scan_by_key in-place execution #1696

add tests for scan_by_key in-place execution #1696

Conversation

gevtushenko commented May 19, 2022

alliepiper commented May 19, 2022

gevtushenko commented May 20, 2022

alliepiper commented May 20, 2022

gevtushenko commented May 24, 2022

gevtushenko commented May 24, 2022

alliepiper left a comment

Choose a reason for hiding this comment

gevtushenko commented Jun 27, 2022

miscco Jul 27, 2022

Choose a reason for hiding this comment