-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DES: How much perf penalty will we accept to get rid of libreduction? #40263
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
The referenced benchmark is: N = 10 ** 4
labels = np.random.randint(0, 2000, size=N)
labels2 = np.random.randint(0, 3, size=N)
df = DataFrame(
{
"key": labels,
"key2": labels2,
"value1": np.random.randn(N),
"value2": ["foo", "bar", "baz", "qux"] * (N // 4),
}
)
df.groupby("key").apply(lambda x: 1) Running this with current master, I get:
When disabling the usage of libreduction fast_apply using this patch: --- a/pandas/core/groupby/ops.py
+++ b/pandas/core/groupby/ops.py
@@ -390,6 +390,7 @@ class BaseGrouper:
# for now -> relies on BlockManager internals
pass
elif (
+ False and
com.get_callable_name(f) not in base.plotting_methods
and isinstance(splitter, FrameSplitter)
and axis == 0 I get the following timing:
So a 3-4x slowdown. However, the applied function is not doing anything useful (just returning a constant), so basically this benchmark is only measuring the overhead. And whether a 3-4x slowdown in the overhead is significant in a real use case, depends on how much time this overhead itself takes. So using a slightly more complex example, calculating the mean of one of the columns (which is still a relatively simple/fast function, I think). With master, this gives
with libreduction disabled, I get:
So still slower, but no longer a 3-4x slowdown. (it's probably useful to see if those numbers are similar on different machines) |
@jbrockmendel did #42992 close this? |
Only half of it |
Closed by #43189 |
xref some discussion #40171 (comment)
libreduction and the associated callers are a disproportionate maintenance headache [citation needed]. It would be nice to be able to rip it out and just have one path for those methods, but that would entail a non-trivial performance hit. Recently, though, we've managed to optimize the pure-python path a bit, and im optimistic we can shave off some more of the difference.
The question: how much of a perf penalty are we willing to accept in order to remove libreduction?
Copying from #40171 (comment)
The text was updated successfully, but these errors were encountered: