Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Store first value in Dict directly in innerjoin #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

non-Jedi
Copy link

@non-Jedi non-Jedi commented Aug 28, 2020

I saw some of the discussion in
JuliaData/DataFrames.jl#2340 and got
curious about what was possible.

This avoids allocating a Vector for the case where l does not have
multiple indices with the same value. For the smoke-test benchmark in
JuliaData/DataFrames.jl#2340 (comment),
this reduces allocations by half and overall runtime by 10%.

Most of the allocations still come from this
line

which it's much less clear how to reduce allocations in. I'm not sure
how much JuliaLang/julia#24909 affects
performance in this case. One option would be to heuristically
estimate the size of out based on the size of l and r and call
sizehint! on it; this didn't seem to help in my testing.

I realize I'm only optimizing a single method of innerjoin, but I'm
not super familiar with this field nor with the inner workings of this
package, so I leave it to you to decide if this is a worthwhile
complication and if it's relevant elsewhere in the package.

This avoids allocating a Vector for the case where l does not have
multiple indices with the same value. For the smoke-test benchmark in
<JuliaData/DataFrames.jl#2340 (comment)>,
this reduces allocations by half and overall runtime by 10%.
@andyferris
Copy link
Member

Ah thanks.

Another strategy I tried a while back was to first assume the join keys are distinct and then bail to a more general implementation when that's not the case. It would be awesome to compare the performance vesus this.

(Sorry @non-Jedi I haven't been keeping track of PRs...)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants