Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Change Accumulator::evaluate and Accumulator::state to take &mut self #8934

Closed
alamb opened this issue Jan 21, 2024 · 0 comments · Fixed by #8925
Closed

Change Accumulator::evaluate and Accumulator::state to take &mut self #8934

alamb opened this issue Jan 21, 2024 · 0 comments · Fixed by #8925
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 21, 2024

Is your feature request related to a problem or challenge?

As part of #8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy)

However, I couldn't implement a zero copy algorithm because evaluate and merge take a &self not a &mut self (well I worked around it with a Mutex 🤮 ).

The need to copy intermediate state likely doesn't matter for most Accumulators as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count DISTINCT) using the same internal state can save a lot of copying (again, see #8849 for an example)

Also, the actual Accumulator instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again.

Describe the solution you'd like

I would like to change Accumulator::evaluate and Accumulator::state to take &mut self

This is also consistent with GroupsAccumulator which takes mut for its state and evaluate functions:

pub trait GroupsAccumulator: Send {
...
    fn evaluate(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Arc<dyn Array>, DataFusionError>;
    fn state(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
...

Describe alternatives you've considered

No response

Additional context

No response

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant