-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Minor: Update min_statistics
and max_statistics
to be helpers, update docs
#10866
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb
Thanks LGTM.
I'm currently thinking about the following, which might be a refactor to consider once we support Data Pages as well?
In order to have less code duplication, we could probably introduce an enum and match in a single get_statistic
function (or macro like we already have). WDYT?
enum StatisticPrefix {
Min,
Max,
// ... and the rest (e.g NullCount, RowCount)
}
enum StatisticType<I> {
RowGroup{iter: I, prefix: StatisticPrefix},
DataPage{iter: I, prefix: StatisticPrefix},
// ... do we have more to support?
}
// This would be a macro like we already have in place
fn get_statistics<I: Iterator>(data_type: &DataType, statistic_type: StatisticType<I>) -> Result<ArrayRef>{
match data_type {
DataType::Boolean => {
match statistic_type {
StatisticType::RowGroup{iter, prefix} => {
// create boolean array from iter
// using prefix to create e.g. `MinBooleanRowGroupStatsIterator::new(iter)`
},
// do the same for StatisticType::DataPage
_ => unimplemented!()
}
}
// support all the other data_types
_ => unimplemented!()
}
}
Then in the StatisticsConverter
we could...
// impl StatisticConverter
// ...
fn row_group_mins<I>(&self, metadatas: I) -> Result<ArrayRef>
where
I: IntoIterator<Item = &'a RowGroupMetaData>,
{
let data_type = self.arrow_field.data_type();
let Some(parquet_index) = self.parquet_index else {
return Ok(self.make_null_array(data_type, metadatas));
};
let iter = metadatas
.into_iter()
.map(|x| x.column(parquet_index).statistics());
let statistic_type = StatisticType::RowGroup{iter, prefix: StatisticPrefix::Min};
get_statistics(data_type, statistic_type);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Which issue does this PR close?
Part of #10453
Inspired by #10852
Rationale for this change
The new
StatisticsConverter
API is ready, so we can remove the oldmin_statistics
andmax_statistics
from "crate" level APIsWhat changes are included in this PR?
min_statistics
andmax_statistics
APIs module privateAre these changes tested?
Are there any user-facing changes?