-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[FEA] Explore ways to not use HadoopFileLinesReader for CSV parsing #6
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
SQL
part of the SQL/Dataframe plugin
Comments
revans2
added
feature request
New feature or request
? - Needs Triage
Need team to review and classify
SQL
part of the SQL/Dataframe plugin
performance
A performance related task/issue
labels
May 28, 2020
sameerz
changed the title
[FEA] explore ways not use HadoopFileLinesReader for CSV parseing
[FEA] Explore ways to not use HadoopFileLinesReader for CSV parsing
Oct 13, 2020
I filed rapidsai/cudf#6572 in cudf to try and support this. |
wjxiz1992
pushed a commit
to wjxiz1992/spark-rapids
that referenced
this issue
Oct 29, 2020
Update scala app version to 0.2.2
gerashegalov
pushed a commit
to gerashegalov/spark-rapids
that referenced
this issue
Nov 18, 2022
…tampNTZEnabled Fix errors caused by 340+ not working on DB
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Use contains instead for that case Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add config to switch Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add tests and config Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Use contains instead for that case Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add config to switch Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add tests and config Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Use contains instead for that case Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add config to switch Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * clean up Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * wip Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * add tests and config Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
sperlingxx
pushed a commit
to sperlingxx/spark-rapids
that referenced
this issue
May 16, 2024
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
SQL
part of the SQL/Dataframe plugin
Is your feature request related to a problem? Please describe.
when parsing CSV currently the CPU will read through the data using the HadoopFileLinesReader and replace the line endings. It would be great from a performance standpoint to do a block copy of most of the data, and skip the line ending translation. This would require that the cudf CSV reader support line endings that are '\r', '\n', or '\r\n'. This is not a simple task but could reduce the CPU utilization significantly.
The text was updated successfully, but these errors were encountered: