DES: How much perf penalty will we accept to get rid of libreduction? #40263

jbrockmendel · 2021-03-06T03:35:31Z

libreduction and the associated callers are a disproportionate maintenance headache [citation needed]. It would be nice to be able to rip it out and just have one path for those methods, but that would entail a non-trivial performance hit. Recently, though, we've managed to optimize the pure-python path a bit, and im optimistic we can shave off some more of the difference.

The question: how much of a perf penalty are we willing to accept in order to remove libreduction?

Copying from #40171 (comment)

I'll throw out a number: if we could get worst-case down to within about 3x and most-cases within 1.5x, I'd be open to removing the cython paths. They are a disproportionate producer of headaches. (From #36459 (possibly out of date) "entirely disabling the cython path leads to 4 currently-xfailed tests passing")

Besides which, if/when cython3 becomes available, we may have to get rid of these anyway.

jorisvandenbossche · 2021-04-02T18:14:44Z

The referenced benchmark is:

N = 10 ** 4
labels = np.random.randint(0, 2000, size=N)
labels2 = np.random.randint(0, 3, size=N)
df = DataFrame(
    {
        "key": labels,
        "key2": labels2,
        "value1": np.random.randn(N),
        "value2": ["foo", "bar", "baz", "qux"] * (N // 4),
    }
)
df.groupby("key").apply(lambda x: 1)

Running this with current master, I get:

In [2]: %timeit df.groupby("key").apply(lambda x: 1)
5.45 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

When disabling the usage of libreduction fast_apply using this patch:

--- a/pandas/core/groupby/ops.py
+++ b/pandas/core/groupby/ops.py
@@ -390,6 +390,7 @@ class BaseGrouper:
             # for now -> relies on BlockManager internals
             pass
         elif (
+            False and
             com.get_callable_name(f) not in base.plotting_methods
             and isinstance(splitter, FrameSplitter)
             and axis == 0

I get the following timing:

In [4]: %timeit df.groupby("key").apply(lambda x: 1)
16.7 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

So a 3-4x slowdown.

However, the applied function is not doing anything useful (just returning a constant), so basically this benchmark is only measuring the overhead. And whether a 3-4x slowdown in the overhead is significant in a real use case, depends on how much time this overhead itself takes.

So using a slightly more complex example, calculating the mean of one of the columns (which is still a relatively simple/fast function, I think). With master, this gives

In [4]: %timeit df.groupby("key").apply(lambda x: x['value1'].mean())
147 ms ± 5.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

with libreduction disabled, I get:

In [6]: %timeit df.groupby("key").apply(lambda x: x['value1'].mean())
182 ms ± 3.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So still slower, but no longer a 3-4x slowdown.
And this slowdown is for me in the range of what's acceptable to get rid of libreduction, I think.

(it's probably useful to see if those numbers are similar on different machines)

mroeschke · 2021-08-19T01:45:24Z

@jbrockmendel did #42992 close this?

jbrockmendel · 2021-08-19T04:06:23Z

Only half of it

jbrockmendel · 2021-10-02T03:07:32Z

Closed by #43189

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 6, 2021

jbrockmendel mentioned this issue Mar 9, 2021

REF: de-duplicate Block.__init__ #38134

Merged

5 tasks

jbrockmendel mentioned this issue Mar 24, 2021

PERF: cache_readonly for Block properties #40620

Merged

This was referenced Apr 20, 2021

PERF: put BlockManager constructor in cython #40842

Merged

[ArrayManager] Add libreduction frame Slider for ArrayManager #40171

Closed

jbrockmendel mentioned this issue May 11, 2021

Cython 3.0 Checklist #34213

Open

10 tasks

jbrockmendel mentioned this issue Jul 7, 2021

PERF: Try fast/slow paths only once in DataFrameGroupby.transform #42195

Merged

3 tasks

jbrockmendel mentioned this issue Aug 23, 2021

REF: remove libreduction.SeriesBinGrouper #43189

Merged

4 tasks

jbrockmendel closed this as completed Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DES: How much perf penalty will we accept to get rid of libreduction? #40263

DES: How much perf penalty will we accept to get rid of libreduction? #40263

jbrockmendel commented Mar 6, 2021

jorisvandenbossche commented Apr 2, 2021

Uh oh!

mroeschke commented Aug 19, 2021

Uh oh!

jbrockmendel commented Aug 19, 2021

Uh oh!

jbrockmendel commented Oct 2, 2021

Uh oh!

Uh oh!

DES: How much perf penalty will we accept to get rid of libreduction? #40263

DES: How much perf penalty will we accept to get rid of libreduction? #40263

Comments

jbrockmendel commented Mar 6, 2021

jorisvandenbossche commented Apr 2, 2021

Uh oh!

mroeschke commented Aug 19, 2021

Uh oh!

jbrockmendel commented Aug 19, 2021

Uh oh!

jbrockmendel commented Oct 2, 2021

Uh oh!