-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
getJsonObject number normalization #1897
Merged
thirtiseven
merged 2 commits into
NVIDIA:get-json-object-feature
from
thirtiseven:get-json-object-number-normalization-inplace
Mar 26, 2024
Merged
getJsonObject number normalization #1897
thirtiseven
merged 2 commits into
NVIDIA:get-json-object-feature
from
thirtiseven:get-json-object-number-normalization-inplace
Mar 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
res-life
reviewed
Mar 26, 2024
src/main/cpp/src/get_json_object.hpp
Outdated
: output(nullptr), output_len(0), hide_outer_array_tokens(_hide_outer_array_tokens) | ||
{ | ||
} | ||
CUDF_HOST_DEVICE CUDF_HOST_DEVICE json_generator<>& operator=(const json_generator<>& other) | ||
__device__ __device__ json_generator<>& operator=(const json_generator<>& other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two device
Add several JNI cases will be good. |
Yes, -0000000 is not a valid JSON number. |
This was referenced Mar 26, 2024
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
thirtiseven
added a commit
that referenced
this pull request
Mar 27, 2024
* get-json-object: Add JSON parser and parser utility (#1836) * Add Json Parser; Add Json Parser utility; Define internal interfaces; Copy get-json-obj CUDA code from cuDF; Signed-off-by: Chong Gao <res_life@163.com> * Code format --------- Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * get-json-object: match current field name (#1857) Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * get-json-object: add utility write_escaped_text for JSON generator (#1863) Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * Add JNI for GetJsonObject (#1862) * Add JNI for GetJsonObject Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Parse json path in plugin Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Apply suggestions from code review Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com> * Use table_view Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Update java Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Apply suggestions from code review Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * use matched enum for type Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * upmerge Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * format Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com> Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com> * get-json-object: main flow (#1868) Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * Optimize memory usage in match_current_field_name (#1889) * Optimize match_current_field_name using less memory Signed-off-by: Chong Gao <res_life@163.com> * Convert a function to device code * Add a JNI test case * Add JNI test case * Change nesting depth to 4 * Change nesting depth to 8 to fix test Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * remove clang format change Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Chong Gao <res_life@163.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com> Co-authored-by: Chong Gao <res_life@163.com> * get-json-object: Recursive to iterative (#1890) * Change recursive to iterative Signed-off-by: Chong Gao <res_life@163.com> --------- Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * Fix bug * Format * Use uppercase for path_instruction_type Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Add test cases from Baidu * Fix escape char error; add test case * getJsonObject number normalization (#1897) * Support number normalization Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * delete cpp test and add a java test case Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Add test case * Fix a escape/unescape size bug Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Fix bug: handle leading zeros for number; Refactor * Apply suggestions from code review Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com> * Address comments Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * fix java test Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Add test cases; Fix a bug * follow up escape/unescape bug fix Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Minor refactor * Add a case; Fix bug --------- Signed-off-by: Chong Gao <res_life@163.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com> Co-authored-by: Chong Gao <res_life@163.com> Co-authored-by: Haoyang Li <haoyangl@nvidia.com> Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1831
This PR supports number normalization in getJsonObject.
In getJsonObject in Spark, a float number is converted to a double when parsing and then converted to a string when returning. This PR uses
stod
in cudf andftos_converter
in jni to simulate this behavior. There may be some compatibility issues inftos_converter
because it is based on ryu, but it is acceptable because the difference is very minor. We use this solution to support double to string in spark-rapids as well.For int number, the only special case I know is "-0" is normalized to "0". Note that "-0000000" is invalid as data.
We need to merge it into the feature branch after removing the cpu tests, because this pr calls device-only functions.