Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Regexp: Only transpile once per expression rather than once per batch #4153

Merged
merged 7 commits into from
Nov 30, 2021

Conversation

andygrove
Copy link
Contributor

The main motivation for this PR was to implement a performance optimization so that regular expressions get transpiled just once per expression (twice if you count the tagging) rather than being transpiled for each invocation (per batch).

While refactoring, I found some inconsistencies and unreachable code paths related to null patterns and empty patterns which are also fixed in this PR along with additional tests.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
…used code paths

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove added the performance A performance related task/issue label Nov 18, 2021
@andygrove andygrove added this to the Nov 15 - Nov 26 milestone Nov 18, 2021
@andygrove andygrove self-assigned this Nov 18, 2021
searchExpr.getValue match {
case null =>
// Return original string if search string is null
strExpr.getBase.asStrings()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path was unreachable

Signed-off-by: Andy Grove <andygrove@nvidia.com>
…n-null literal string

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@jlowe
Copy link
Contributor

jlowe commented Nov 29, 2021

build

@andygrove
Copy link
Contributor Author

I'm not sure why the bulid failed. The last entry in the log was:

2021-11-29T18:10:22.4493599Z [2021-11-29T18:09:46.702Z] [gw3]^[[36m [ 81%] ^[[0m^[[33mSKIPPED^[[0m ../../src/main/python/repart_test.py::test_union_struct_missing_children[(Struct(not_null)(['child0', Struct(['child0', Struct(['child0', Struct(['child0', Struct(['child0', Decimal(7,2)])])])])]), Struct(not_null)(['child1', Integer]))]
2021-11-29T18:10:22.4494720Z script returned exit code 120
2021-11-29T18:10:22.4535049Z Cleaning up orphan processes

@andygrove
Copy link
Contributor Author

build

@jlowe
Copy link
Contributor

jlowe commented Nov 30, 2021

I'm not sure why the bulid failed.

This was likely caused by the recent CI storage space exhaustion. Should be fixed now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants