Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve JSON and CSV parsing of integer values #4790

Merged
merged 11 commits into from
Feb 17, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Feb 15, 2022

Closes #126 and #1986 and #4762

Changes in this PR:

  • Updates JSON and CSV support for integer values to ask cuDF to read strings and then performs casting to the requested integer type with compatibility with Spark.
  • Remove redundant CSV configs for enabling reading boolean, integer, and floating-point values

I filed a follow-on issue #4793 for handling JSON with strings containing integers.

Status

  • Implementation and updated JSON & CSV tests
  • Remove redundant csv configs
  • Fix regression in Mortgage tests
  • File follow-on issues

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title WIP: Improve JSON and CSV parsing of integer values Improve JSON and CSV parsing of integer values Feb 15, 2022
@andygrove andygrove marked this pull request as ready for review February 15, 2022 22:45
revans2
revans2 previously approved these changes Feb 16, 2022
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

There were test failures in test_json_input_meta. I am investigating.

@andygrove
Copy link
Contributor Author

The changes in this PR exposed a bug where the code assumed that GpuTextBasedPartitionReader#readToTable would return a table with the read schema projection applied, and this was not the case for the JSON implementation. This is now fixed.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 1db5070 into NVIDIA:branch-22.04 Feb 17, 2022
@andygrove andygrove deleted the json-integer branch February 17, 2022 22:24
@sameerz sameerz added the bug Something isn't working label Feb 21, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
3 participants