Skip to content

Document when the ParquetRecordBatchReader will re-read metadata #5887

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jun 15, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 14, 2024

Which issue does this PR close?

N/A

Rationale for this change

While working on apache/datafusion#10701 and discussing with @marsupialtail I finally figured out how to use the page index with cached parquet metadata without extra object store requests.

However, I think it is quite tricky and subtle, so it would be good to document this in the parquet crate itself so I don't have to go spelunking in the future.

What changes are included in this PR?

Document when ArrowReaderMetadata::load will make object store calls when using the page index

Are there any user-facing changes?

Just docs

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 14, 2024
/// construct multiple separate readers, for example, to distribute readers
/// across multiple threads
///
/// 2. Using a cached copy of the [`ParquetMetadata`] rather than reading it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the second and important usecase of not re-reading the metadata on subsequent reads

@marsupialtail
Copy link

Cheaply clonaable is good :-)

@alamb alamb merged commit c191294 into apache:master Jun 15, 2024
16 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jun 15, 2024

Thank you for the reviews

@alamb alamb deleted the alamb/clarify_metadata_loading branch June 17, 2024 18:12
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants