Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FEA] Implement merged 'mega' kernel to parse leaf-level columns in JSON reader #16965

Open
shrshi opened this issue Oct 1, 2024 · 0 comments
Assignees
Labels
feature request New feature or request

Comments

@shrshi
Copy link
Contributor

shrshi commented Oct 1, 2024

Is your feature request related to a problem? Please describe.
Inferring types and parsing leaf level columns in the JSON reader launches separate kernels for each column.
Image

We can obtain improved performance by gathering the offsets for columns contiguously, and then parsing them in a single kernel.

Describe the solution you'd like
Partitioning strategies to consider:

  • For parsing, 1 thread per offset.
  • 1 warp / column (but 32 offsets/warp), consecutive warps will most likely access nearby memory and probably benefit from coalescing
  • Fixed number of characters per thread, but more careful thought is required for distributing work depending on column type.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants