Range/inequality joins are slow #8393

simonvandel · 2023-12-01T12:51:03Z

Describe the bug

Joins where the ON filter are not equality, but rather inequalities like <, `> etc. seem slow. Atleast compared to DuckDB which seem like a direct "competitor".

The main difference between the DuckDB and Datafusion plans seem to be that Datafusion uses a NestedLoopJoinExec, while DuckDB uses a IEJoin.

Note that the query could be written better with a ASOF-join, but Datafusion does not support that (see issue #318).

To Reproduce

Create some test data with this SQL (saved as repro-dataset.sql) in DuckDB:

CREATE
OR REPLACE TABLE # AS
SELECT
    t,
    RANDOM() as v
FROM
    range(
        '2022-01-01' :: TIMESTAMP,
        '2023-01-01' :: TIMESTAMP,
        INTERVAL 30 DAY
    ) ts(t);

COPY # to '#.parquet' (format 'parquet');

CREATE
OR REPLACE TABLE timestamps AS
SELECT
    t
FROM
    range(
        '2022-01-01' :: TIMESTAMP,
        '2023-01-01' :: TIMESTAMP,
        INTERVAL 10 SECOND
    ) ts(t);

COPY timestamps to 'timestamps.parquet' (format 'parquet');

$ duckdb < repro-dataset.sql

We will compare the performance of the following query in DuckDB and Datafusion. The query is saved as repro-range-query.sql.

WITH #_state AS (
    SELECT
        t as valid_from,
        COALESCE(
            LEAD(t, 1) OVER (
                ORDER BY
                    t
            ),
            '9999-12-31'
        ) as valid_to,
        v
    FROM
        '#.parquet'
)
SELECT
    t.t,
    p.v
FROM
    #_state p
    LEFT JOIN 'timestamps.parquet' t ON t.t BETWEEN p.valid_from
    AND p.valid_to;

DuckDB performance:

$ time duckdb < repro-range-query.sql
...
real    0m0.999s
user    0m6.070s
sys     0m3.600s

Datafusion performance:

$ time datafusion-cli -f repro-range-query.sql
...
real    0m8.269s
user    0m6.358s
sys     0m1.907s

Expected behavior

It would be nice if the above query (or something equivalent) would be faster in Datafusion.

If someone knows of a better way to express the query, then that could also be a workaround for me.

Additional context

Machine tested on:
CPU:Ryzen 3900x
OS: Ubuntu 22.04

Versions used:

$ duckdb --version
v0.9.2 3c695d7ba9

$ datafusion-cli --version
datafusion-cli 33.0.0

The text was updated successfully, but these errors were encountered:

simonvandel · 2023-12-01T15:03:20Z

I just noticed that what I really want is to actually do a RIGHT join. That is, if there is no matching # for a timestamp, it should give null.

Changing the query to that, Datafusion is much faster. I believe it's because with a RIGHT join, # becomes the outer table (single partition), while timestamps becomes the inner table (unspecified partitioning), which allows for greater parallelism (see https://github.com/apache/arrow-datafusion/blob/e19c669855baa8b78ff86755803944d2ddf65536/datafusion/physical-plan/src/joins/nested_loop_join.rs#L72-L77C4)

But I think the issue should still be open - the LEFT join is still slower

alamb · 2023-12-01T20:42:29Z

I think IEJoin is a form of RangeJoin (https://duckdb.org/2022/05/27/iejoin.html) -- I agree it would be neat to make this fast in DataFusion, but I think it is a pretty major project (it typically requires a specialized operator, as described in the DuckDB blog)

alamb · 2023-12-01T20:50:47Z

I stared trying to collect a list of various join improvments on #8398

my-vegetable-has-exploded · 2024-03-07T05:43:25Z

I am interested in this ticket. Since it is a pretty major project, I will write a proposal first.

alamb · 2024-03-08T20:17:37Z

Thank you @my-vegetable-has-exploded -- that is a great idea

cc @korowa / @viirya / @metesynnada who have been involved in Join implementations recently and who may be interested as well

korowa · 2024-03-11T19:27:00Z

Disregarding IEJoin -- time output from the issue description seems to show that both DuckDB and DF spend +- same cputime (user + system) and the only difference is parallelism (shown by real time), which, how @simonvandel noticed, depends on left/right input + join type) -- this makes me think that the x8 slowdown is not related to how join performed internally, but more like caused by physical optimizer skipping join reordering for NestedLoopJoin.

So, if i'm not mistaken, this issue is mostly about covering NLJoin in join_selection.rs.

UPD: in addition, to make join reordering useful, it's also required to modify NLJoin, since currently it chooses build-side based on logical join type.

my-vegetable-has-exploded · 2024-03-20T02:13:01Z

So, if i'm not mistaken, this issue is mostly about covering NLJoin in join_selection.rs.

I think it is a good idea to improve performance in this scenario. Your pr is also good for me. But I think it is also ok to keep old parallelism strategy. In my opinion, the old paralleism strategy should works, but the check in enforce_distribution.rs block the reparition of it whick would check the row number. In this query, #_state 's row numbers is less than batch_size and the RepartitionExec also just works for a batch a time.

https://github.com/apache/arrow-datafusion/blob/ad8d552b9f150c3c066b0764e84f72b667a649ff/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L1099-L1106

I think it may another way to write a new enforce_distribution strategy for NestLoopJoin and CrossJoin. We can check repartition_beneficial_stats by the left table size multiply right partition size rather than just right partition size (take RIGHTJOIN for example).

korowa · 2024-03-20T18:27:37Z

the old paralleism strategy should works, but the check in enforce_distribution.rs block the reparition

I don't think it's proper way to go -- it'll give some benefits in terms of runtime, but it will be suboptimal in terms of memory utilization, and cputime (as we'll need to perform BuildSideRows * NumberOfPartitions filter evaluations instead of BuildSideRows * 1, where 1 is probe side input batches)

Dandandan · 2024-04-22T18:47:28Z

I don't think this issue should be closed.

#9676 seems to take care of ordering but I think it doesn't improve range/inequality joins much?

korowa · 2024-04-23T17:59:23Z

My intention was to fix NLJoin parallelism issue due to fixed build-side choice (since right join instead of left had acceptable performance, as it was claimed above), and in the same time we also have #318 for specialized operator implementation, so, I supposed #9676 to be enough.

Don't mind to keep it open, though.

my-vegetable-has-exploded · 2024-10-04T16:02:23Z

Could anyone do me a favour here？
#12754 (comment)

simonvandel added the bug Something isn't working label Dec 1, 2023

alamb mentioned this issue Dec 1, 2023

[Epic] A collection of Join Improvements #8398

Open

10 tasks

korowa mentioned this issue Mar 18, 2024

feat: support input reordering for NestedLoopJoinExec #9676

Merged

my-vegetable-has-exploded mentioned this issue Mar 23, 2024

Performance regression on timestemp range join. #9755

Closed

alamb closed this as completed in #9676 Apr 22, 2024

Dandandan reopened this Apr 22, 2024

my-vegetable-has-exploded mentioned this issue Oct 4, 2024

feat: support inner iejoin #12754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range/inequality joins are slow #8393

Range/inequality joins are slow #8393

simonvandel commented Dec 1, 2023

simonvandel commented Dec 1, 2023

alamb commented Dec 1, 2023

alamb commented Dec 1, 2023

my-vegetable-has-exploded commented Mar 7, 2024

alamb commented Mar 8, 2024

korowa commented Mar 11, 2024 •

edited

Loading

my-vegetable-has-exploded commented Mar 20, 2024

korowa commented Mar 20, 2024

Dandandan commented Apr 22, 2024

korowa commented Apr 23, 2024 •

edited

Loading

my-vegetable-has-exploded commented Oct 4, 2024

Range/inequality joins are slow #8393

Range/inequality joins are slow #8393

Comments

simonvandel commented Dec 1, 2023

Describe the bug

To Reproduce

Expected behavior

Additional context

simonvandel commented Dec 1, 2023

alamb commented Dec 1, 2023

alamb commented Dec 1, 2023

my-vegetable-has-exploded commented Mar 7, 2024

alamb commented Mar 8, 2024

korowa commented Mar 11, 2024 • edited Loading

my-vegetable-has-exploded commented Mar 20, 2024

korowa commented Mar 20, 2024

Dandandan commented Apr 22, 2024

korowa commented Apr 23, 2024 • edited Loading

my-vegetable-has-exploded commented Oct 4, 2024

korowa commented Mar 11, 2024 •

edited

Loading

korowa commented Apr 23, 2024 •

edited

Loading