Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

add tests for scan_by_key in-place execution #1696

Merged

Conversation

gevtushenko
Copy link
Collaborator

Our scan by key implementation doesn't allow keys/result aliasing. This is caused by non-synchronized reading of d_keys_in[tile_base - 1] in the following code. Therefore, aliasing of keys and results introduces a data race. The following code can be used as a reproducer:

#include <thrust/scan.h>
#include <thrust/device_vector.h>

int main() {
  const std::size_t n = 64 * 1024 * 1024;
  thrust::device_vector<int> keys(n, 1);
  thrust::device_vector<int> values(n, 1);
  thrust::device_vector<int> output_storage(n);

  // thrust::device_vector<int> &output = values; // alias is OK
  thrust::device_vector<int> &output = keys; // alias is NOT OK

  thrust::exclusive_scan_by_key(keys.begin(), keys.end(), values.begin(), output.begin());

  if (output[n - 1] != n - 1) {
    std::cerr << "wrong result: " << output[n - 1] << " != " << n << std::endl;
  }
}

We also don't have tests for this use case. I can fix the code to guarantee this behaviour (just like in adjacent difference), but it'll slow down non-in-place execution in some cases. My suggestion is to have a quick fix of documentation.

@alliepiper alliepiper added type: bug: functional Does not work as intended. only: docs Documentation changes only. Doesn't need code CI. P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. area: docs Related to documentation. labels May 19, 2022
@alliepiper alliepiper added this to the 2.0.0 milestone May 19, 2022
@alliepiper
Copy link
Collaborator

Summarizing some offline discussions and testing:

  • This has been broken for years, and is not a regression introduced by the recent changes to the scan_by_key implementation.
  • The common case of overlapping values_in/values_out is fine, the only issue is overlapping keys_in/values_out, which is a somewhat odd usecase.
  • Based on similar situations, fixing this is expected to introduce ~15% slowdown.

@senior-zero will be investigating this a bit more before we make a decision here.

@gevtushenko
Copy link
Collaborator Author

I've managed to fuse some kernels to reduce slowdown up to 6% on small to moderate problem sizes. The performance regression disappears at about 2^20 items for std::uint{16,32,64}_t and at about 2^22 for std::uint8_t.

@alliepiper
Copy link
Collaborator

6% for small inputs and no impact on larger inputs seems worthwhile to restore the documented guarantees. IMO we should update the implementation.

@gevtushenko gevtushenko force-pushed the fix-main/github/scan_by_key_docs branch from b5126d6 to aa55ec2 Compare May 24, 2022 19:35
@gevtushenko
Copy link
Collaborator Author

@allisonvacanti I've bumped CUB version to one that incorporates updated scan by key. I'd like to re purpose this PR to update tests in thrust. Please, review the changes.

@gevtushenko
Copy link
Collaborator Author

run tests

@gevtushenko gevtushenko removed only: docs Documentation changes only. Doesn't need code CI. area: docs Related to documentation. labels May 24, 2022
Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add tests to the thrust/testing/scan_by_key.*.cu files to make sure the non-CUDA backends test this, too.

@alliepiper alliepiper added P1: should have Necessary, but not critical. and removed P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. labels Jun 24, 2022
@gevtushenko gevtushenko force-pushed the fix-main/github/scan_by_key_docs branch from aa55ec2 to 380870d Compare June 27, 2022 08:04
@gevtushenko gevtushenko requested a review from alliepiper June 27, 2022 08:04
@gevtushenko
Copy link
Collaborator Author

run tests

@alliepiper alliepiper self-assigned this Jul 25, 2022
@alliepiper alliepiper changed the title Fix scan by key docs for in-place execution add tests for scan_by_key in-place execution Jul 25, 2022
@alliepiper alliepiper removed their assignment Jul 25, 2022
@gevtushenko gevtushenko merged commit a7aeeb6 into NVIDIA:main Jul 26, 2022
Comment on lines +102 to +109
// in-place scans: keys/values aliasing
thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());
inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());
{
cudaError_t const err = cudaDeviceSynchronize();
ASSERT_EQUAL(cudaSuccess, err);
}
ASSERT_EQUAL(d_keys, h_output);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, no change requested: I would prefer if we put the whole test in its own block rather than only the check, as that helps readability a lot

Suggested change
// in-place scans: keys/values aliasing
thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());
inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());
{
cudaError_t const err = cudaDeviceSynchronize();
ASSERT_EQUAL(cudaSuccess, err);
}
ASSERT_EQUAL(d_keys, h_output);
{ // in-place scans: keys/values aliasing
thrust::inclusive_scan_by_key(h_keys.begin(), h_keys.end(), h_vals.begin(), h_output.begin());
inclusive_scan_by_key_kernel<<<1,1>>>(exec, d_keys.begin(), d_keys.end(), d_vals.begin(), d_keys.begin());
cudaError_t const err = cudaDeviceSynchronize();
ASSERT_EQUAL(cudaSuccess, err);
ASSERT_EQUAL(d_keys, h_output);
}

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants