Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

make sure aggregation uses fast path if a single column is used #2423

Closed
bkamins opened this issue Sep 11, 2020 · 4 comments
Closed

make sure aggregation uses fast path if a single column is used #2423

bkamins opened this issue Sep 11, 2020 · 4 comments
Labels
non-breaking The proposed change is not breaking performance
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Sep 11, 2020

I will fix it. Currently :x => sum uses a fast path, but [:x] => sum does not.

@bkamins bkamins added this to the 1.0 milestone Sep 11, 2020
@bkamins bkamins added the non-breaking The proposed change is not breaking label Sep 11, 2020
@bkamins bkamins mentioned this issue Mar 4, 2021
19 tasks
@sa-
Copy link
Contributor

sa- commented Mar 25, 2021

Looking into this. Pasting some of your Slack messages here in case the messages get deleted:

fast path is implemented here: https://github.com/JuliaData/DataFrames.jl/blob/main/src/groupeddataframe/fastaggregates.jl

10:30
In the fast path you do not have to calculate grouping views. You can just pre-allocate the target container (whose eltype you know upfront) and just iterate groups and perform reduction.

@bkamins
Copy link
Member Author

bkamins commented Mar 25, 2021

The key thing to do here is to rewrite code like:

@chain df begin
    groupby(:a)
    combine([:b] => sum)
end

to:

@chain df begin
    groupby(:a)
    combine(:b => sum)
end

before executing it.

We should check in the syntax src => fun and src => fun => dst if src is Union{AbstractVector{<:AbstractString}}, AbstractVector{Symbol}, AbstractVector{<:Union{Signed, Unsigned}} of length 1 and if this is so its first element should be fetched with only before further processing.

Or even better - check that src after preprocessing the column list using a standard mechanism of src preprocessing is a single element vector (it then should be Vector{Int} as this is what Index should ensure).

@bkamins bkamins modified the milestones: 1.0, 1.x Mar 25, 2021
sa- added a commit to sa-/DataFrames.jl that referenced this issue Apr 10, 2021
@sa-
Copy link
Contributor

sa- commented Apr 14, 2021

Can this be closed now?

@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

yes

@bkamins bkamins closed this as completed Apr 14, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
non-breaking The proposed change is not breaking performance
Projects
None yet
Development

No branches or pull requests

2 participants