-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
JSON single quote normalization API #14729
Conversation
As discussed with @shrshi offline, we would ideally like to see a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just one question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍
Just a few minor things that I must have missed in your quote normalization test PR before. Otherwise this looks pretty good to go from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great; just a few more things that can be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake approval
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright (c) 2023-2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did the copyrights drop the year 2023?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new file in this PR implementing the normalization FST, so I think only year 2024 is included in the copyright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the java side this looks good to me.
/merge |
The goal of this PR is to address [10004](rapidsai#10004) by supporting parsing of JSON files containing single quotes for field/value strings. This is a follow-up work to the POC [PR 14545](rapidsai#14545) Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Andy Grove (https://github.com/andygrove) - Vyas Ramasubramani (https://github.com/vyasr) - Vukasin Milovanovic (https://github.com/vuule) - Elias Stehle (https://github.com/elstehle) - Robert (Bobby) Evans (https://github.com/revans2) URL: rapidsai#14729
This PR provides a proof-of-concept for the usage of FST in removing unquoted spaces and tabs in JSON strings. This is a useful feature in the cases where we want to cast a hierarchical JSON object to a string, and overcomes the challenge of processing mixed types using Spark. [#14865](#14865) The FST assumes that the single quotes in the input data have already been normalized (possibly using [`normalize_single_quotes`](#14729)). Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Elias Stehle (https://github.com/elstehle) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #14931
Description
The goal of this PR is to address 10004 by supporting parsing of JSON files containing single quotes for field/value strings. This is a follow-up work to the POC PR 14545
Checklist