Skip to content

Speed up Parquet utf8 validation #6667

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Dandandan opened this issue Oct 31, 2024 · 1 comment · Fixed by #6668
Closed

Speed up Parquet utf8 validation #6667

Dandandan opened this issue Oct 31, 2024 · 1 comment · Fixed by #6668
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate performance

Comments

@Dandandan
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Utf8 validation comes up in profiles when reading Parquet.

Describe the solution you'd like

We could use https://docs.rs/simdutf8/latest/simdutf8/ to speed up validation of utf8.

Describe alternatives you've considered

Additional context

@alamb
Copy link
Contributor

alamb commented Nov 7, 2024

Here is another idea: #6701

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants