Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add nvtext substring deduplication API #18104

Draft
wants to merge 20 commits into
base: branch-25.04
Choose a base branch
from

Conversation

davidwendt
Copy link
Contributor

Description

Adds new nvtext substring deduplication API

std::unique_ptr<cudf::column> nvtext::substring_deduplicate(
  cudf::strings_column_view const& input,
  cudf::size_type min_width,
  rmm::cuda_stream_view stream,
  rmm::device_async_resource_ref mr);

This finds and returns any duplicate substrings of at least min_width bytes within the input column.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Feb 26, 2025
@davidwendt davidwendt self-assigned this Feb 26, 2025
Copy link

copy-pr-bot bot commented Feb 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Feb 26, 2025
@davidwendt
Copy link
Contributor Author

/ok to test

@davidwendt
Copy link
Contributor Author

/ok to test

1 similar comment
@davidwendt
Copy link
Contributor Author

/ok to test

@davidwendt
Copy link
Contributor Author

/ok to test

@davidwendt
Copy link
Contributor Author

/ok to test

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API. strings strings issues (C++ and Python)
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

1 participant