Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add support for POSIX predefined character classes #5692

Merged

Conversation

anthony-chang
Copy link
Contributor

Closes #4413

This PR adds support for \p{...} for these character classes defined here under "POSIX character classes (US-ASCII only)"

\p{Lower} | A lower-case alphabetic character: [a-z]
\p{Upper} | An upper-case alphabetic character:[A-Z]
\p{ASCII} | All ASCII:[\x00-\x7F]
\p{Alpha} | An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} | A decimal digit: [0-9]
\p{Alnum} | An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} | Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{\|}~
\p{Graph} | A visible character: [\p{Alnum}\p{Punct}]
\p{Print} | A printable character: [\p{Graph}\x20]
\p{Blank} | A space or a tab: [ \t]
\p{Cntrl} | A control character: [\x00-\x1F\x7F]
\p{XDigit} | A hexadecimal digit: [0-9a-fA-F]
\p{Space} | A whitespace character: [ \t\n\x0B\f\r]

Signed-off-by: Anthony Chang antchang@nvidia.com

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang anthony-chang self-assigned this May 31, 2022
@anthony-chang anthony-chang requested a review from andygrove May 31, 2022 00:07
@sameerz
Copy link
Collaborator

sameerz commented May 31, 2022

@anthony-chang @andygrove We are in burndown for 22.06. Is this meant to target 22.06, or should it be retargeted towards 22.08?

@sameerz sameerz added the feature request New feature or request label May 31, 2022
@sameerz sameerz added this to the May 23 - Jun 3 milestone May 31, 2022
@andygrove
Copy link
Contributor

@anthony-chang @andygrove We are in burndown for 22.06. Is this meant to target 22.06, or should it be retargeted towards 22.08?

Good point. @anthony-chang can you retarget for 22.08

Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retarget for 22.08

@anthony-chang anthony-chang changed the base branch from branch-22.06 to branch-22.08 May 31, 2022 17:59
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

This is blocked until the fix in #5610 is merged

@andygrove
Copy link
Contributor

This is blocked until the fix in #5610 is merged

@anthony-chang this has now been merged

@andygrove andygrove dismissed their stale review June 3, 2022 16:17

PR has been retargeted to 22.08 now

…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang anthony-chang force-pushed the posix-character-classes branch from 6be0c4a to 7a2a5a1 Compare June 6, 2022 14:53
@anthony-chang
Copy link
Contributor Author

build

…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
doTranspileTest("\\p{Alnum}", "[a-zA-Z0-9]")
doTranspileTest("\\p{Punct}", "[!\"#$%&'()*+,\\-./:;<=>?@\\^_`{|}~\\[\\]]")
doTranspileTest("\\p{Print}", "[a-zA-Z0-9!\"#$%&'()*+,\\-./:;<=>?@\\^_`{|}~\\[\\]\u0020]")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that only a subset are tested here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one is mostly a sanity check since the \p{Print} transpilation recursively uses the definition of the 4 \p{} classes above it.
I tested the full set in the integration tests.

Signed-off-by: Anthony Chang <antchang@nvidia.com>
andygrove
andygrove previously approved these changes Jun 6, 2022
@anthony-chang
Copy link
Contributor Author

build

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang
Copy link
Contributor Author

I've removed _ from the Scala fuzz tests until a fix for rapidsai/cudf#11062 is made.

NVnavkumar
NVnavkumar previously approved these changes Jun 7, 2022
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for POSIX characters in regular expressions
4 participants