Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reimplement check for non-regexp strings using RegexParser #4681

Merged
merged 6 commits into from
Feb 4, 2022

Conversation

andygrove
Copy link
Contributor

This PR reimplements the check used in GpuStringSplit to determine if a string is a regular expression or not. The previous approach had limitations and now that we have a regular expression parser we can improve the check so that there are fewer false positives.

This change is needed to support the work for #3542

@andygrove andygrove added the task Work required that improves the product but is not user facing label Feb 2, 2022
@andygrove andygrove added this to the Jan 31 - Feb 11 milestone Feb 2, 2022
@andygrove andygrove self-assigned this Feb 2, 2022
…ParserSuite.scala

Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
}

test("detect non-regexp strings") {
val strings = Seq("\\.", "A", ",", "\t", ":", "")
Copy link
Collaborator

@revans2 revans2 Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. For example if I did a split with "\\." as the separator on "a.b.c" I would expect an output of ["a", "b", "c"]. But I cannot take "\\." and pass it into a non-regexp string split, because it will try to look for a "\\" character in front of the '.'. Am I wrong/missing something? I think we could still support it, but we would have to transpile the regular expression into a constant string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, this isn't what we want here. Moving this to draft for now and will rethink this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to treat any string containing escaped characters as a regular expression.

@andygrove andygrove marked this pull request as draft February 2, 2022 20:06
@andygrove andygrove changed the title Reimplement check for non-regexp strings using RegexParser WIP: Reimplement check for non-regexp strings using RegexParser Feb 2, 2022
@andygrove andygrove changed the title WIP: Reimplement check for non-regexp strings using RegexParser Reimplement check for non-regexp strings using RegexParser Feb 2, 2022
@andygrove andygrove marked this pull request as ready for review February 2, 2022 22:28
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a follow on issue to do the transpile that I suggested. From a performance and memory standpoint NOT doing a regular expression is going to be much faster than trying to do it.

@andygrove
Copy link
Contributor Author

Can we have a follow on issue to do the transpile that I suggested. From a performance and memory standpoint NOT doing a regular expression is going to be much faster than trying to do it.

I've filed #4685 for this

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit ba0ec9f into NVIDIA:branch-22.04 Feb 4, 2022
@andygrove andygrove deleted the is-it-regexp branch February 4, 2022 19:27
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants