-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improve compatibility when reading timestamps from JSON and CSV sources #4938
Conversation
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
build |
build |
2 similar comments
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is okay, but it is really complicated and a lot of assumptions that only a very specific set of formats are allowed. I keep thinking that there might be a simpler way to make it more data driven with look up tables instead of transpiling everything. But then I see we convert the format to both regular expressions and to the CUDF format and I just don't know if the lookup table will actually be small/less error prone or not.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Show resolved
Hide resolved
|
||
// fix timestamps that have milliseconds but no microseconds | ||
// example ".296" => ".296000" | ||
val placeholder = "@@@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about why @@@
is an okay sequence to use here an will never interfere with a real timestamp.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Outdated
Show resolved
Hide resolved
build |
build |
1 similar comment
build |
build |
@revans2 could you re-approve this one, please. I had to upmerge since your last approval. |
Signed-off-by: Andy Grove andygrove@nvidia.com
Closes #4863 and closes #123
Improves timestamp support in JSON and CSV to match Spark, by reading from cuDF as strings and then converting to timestamps in the plugin.
There is one follow-on issues:
Status