Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Speed up copying decimal column from parquet buffer to GPU buffer #4872

Merged
merged 3 commits into from
Mar 7, 2022

Conversation

sperlingxx
Copy link
Collaborator

Signed-off-by: sperlingxx lovedreamf@gmail.com

Closes #4784

Adds specialized support for the columnar copy of WritableColumnVector on decimal type. The new implementation copies the unscaled values directly to avoid the round trip of Decimal encoding/decoding.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

1 similar comment
@sperlingxx
Copy link
Collaborator Author

build

Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any microbenchmark numbers showing the performance effects of this change?

jlowe
jlowe previously approved these changes Feb 28, 2022
import ai.rapids.cudf.HostColumnVectorCore;
import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector;
import org.apache.spark.sql.execution.datasources.orc.OrcAtomicColumnVector;
import org.apache.spark.sql.execution.datasources.orc.OrcColumnVector;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think that the ORC columns are used and should probably be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fixed it.

@sameerz sameerz added the performance A performance related task/issue label Feb 28, 2022
@sameerz sameerz added this to the Feb 28 - Mar 18 milestone Feb 28, 2022
@firestarman
Copy link
Collaborator

build

1 similar comment
@sameerz
Copy link
Collaborator

sameerz commented Mar 4, 2022

build

@sperlingxx
Copy link
Collaborator Author

build

@sperlingxx sperlingxx merged commit f22795d into NVIDIA:branch-22.04 Mar 7, 2022
@sperlingxx sperlingxx deleted the columnar_copy_decimal branch March 7, 2022 10:13
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improve copying decimal data from CPU columnar data
5 participants