Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

getJsonObject number normalization #1897

Conversation

thirtiseven
Copy link
Collaborator

Closes #1831

This PR supports number normalization in getJsonObject.

In getJsonObject in Spark, a float number is converted to a double when parsing and then converted to a string when returning. This PR uses stod in cudf and ftos_converter in jni to simulate this behavior. There may be some compatibility issues in ftos_converter because it is based on ryu, but it is acceptable because the difference is very minor. We use this solution to support double to string in spark-rapids as well.

For int number, the only special case I know is "-0" is normalized to "0". Note that "-0000000" is invalid as data.

We need to merge it into the feature branch after removing the cpu tests, because this pr calls device-only functions.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
: output(nullptr), output_len(0), hide_outer_array_tokens(_hide_outer_array_tokens)
{
}
CUDF_HOST_DEVICE CUDF_HOST_DEVICE json_generator<>& operator=(const json_generator<>& other)
__device__ __device__ json_generator<>& operator=(const json_generator<>& other)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two device

@res-life
Copy link
Collaborator

Add several JNI cases will be good.

@res-life res-life requested a review from ttnghia March 26, 2024 06:40
@res-life
Copy link
Collaborator

Yes, -0000000 is not a valid JSON number.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven merged commit 2ae268a into NVIDIA:get-json-object-feature Mar 26, 2024
2 checks passed
thirtiseven added a commit that referenced this pull request Mar 27, 2024
* get-json-object:  Add JSON parser and parser utility (#1836)

* Add Json Parser;
Add Json Parser utility;
Define internal interfaces;
Copy get-json-obj CUDA code from cuDF;

Signed-off-by: Chong Gao <res_life@163.com>

* Code format

---------

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>

* get-json-object: match current field name (#1857)

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>

* get-json-object: add utility write_escaped_text for JSON generator (#1863)

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>

* Add JNI for GetJsonObject (#1862)

* Add JNI for GetJsonObject

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* clean up

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Parse json path in plugin

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

* Use table_view

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Update java

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

* clean up

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* use matched enum for type

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* clean up

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* upmerge

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* format

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

---------

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

* get-json-object: main flow (#1868)

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>

* Optimize memory usage in match_current_field_name (#1889)

* Optimize match_current_field_name using less memory

Signed-off-by: Chong Gao <res_life@163.com>

* Convert a function to device code

* Add a JNI test case

* Add JNI test case

* Change nesting depth to 4

* Change nesting depth to 8 to fix test

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* remove clang format change

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

---------

Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Chong Gao <res_life@163.com>

* get-json-object: Recursive to iterative (#1890)

* Change recursive to iterative

Signed-off-by: Chong Gao <res_life@163.com>

---------

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>

* Fix bug

* Format

* Use uppercase for path_instruction_type

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Add test cases from Baidu

* Fix escape char error; add test case

* getJsonObject number normalization (#1897)

* Support number normalization

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* delete cpp test and add a java test case

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

---------

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Add test case

* Fix a escape/unescape size bug

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Fix bug: handle leading zeros for number; Refactor

* Apply suggestions from code review

Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

* Address comments

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* fix java test

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Add test cases; Fix a bug

* follow up escape/unescape bug fix

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

* Minor refactor

* Add a case; Fix bug

---------

Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Chong Gao <res_life@163.com>
Co-authored-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants