Skip to content

Minor: Add SMJ to TPCH benchmark usage #10747

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Jun 1, 2024
Merged

Minor: Add SMJ to TPCH benchmark usage #10747

merged 4 commits into from
Jun 1, 2024

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #10100 .

Rationale for this change

Basically fix for #10380 fixed the issue, I just fixing also the usage info for SMJ

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@comphead
Copy link
Contributor Author

I checked the TPCH benchmarks passes with SMJ on and row counts are the same

RUST_BACKTRACE=1 RESULTS_NAME=smj ./benchmarks/bench.sh run tpch_smj
RUST_BACKTRACE=1 RESULTS_NAME=hj ./benchmarks/bench.sh run tpch
RUST_BACKTRACE=1 RESULTS_NAME=smj10 ./benchmarks/bench.sh run tpch_smj10
RUST_BACKTRACE=1 RESULTS_NAME=hj10 ./benchmarks/bench.sh run tpch10

tpch_mem: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), query from memory
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, hash join
tpch_smj10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, sort merge join
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to get rid of tpch_smj* soon and get the hash join type from the user input any bench can run with a choice of join type

Copy link

Benchmark results

Benchmarks comparing d6ddd23 (main) and 8353d20 (PR)
Comparing d6ddd23 and 8353d20
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 311.23ms │ 314.48ms │    no change │
│ QQuery 2     │  39.75ms │  44.90ms │ 1.13x slower │
│ QQuery 3     │  58.71ms │  59.99ms │    no change │
│ QQuery 4     │  83.26ms │  85.53ms │    no change │
│ QQuery 5     │  97.94ms │ 100.15ms │    no change │
│ QQuery 6     │  15.20ms │  15.67ms │    no change │
│ QQuery 7     │ 215.63ms │ 217.48ms │    no change │
│ QQuery 8     │  40.10ms │  40.95ms │    no change │
│ QQuery 9     │ 117.77ms │ 118.47ms │    no change │
│ QQuery 10    │ 104.43ms │ 101.81ms │    no change │
│ QQuery 11    │  75.79ms │  77.27ms │    no change │
│ QQuery 12    │  60.18ms │  59.87ms │    no change │
│ QQuery 13    │ 112.28ms │ 109.35ms │    no change │
│ QQuery 14    │  18.76ms │  18.58ms │    no change │
│ QQuery 15    │  30.72ms │  30.86ms │    no change │
│ QQuery 16    │  46.01ms │  45.91ms │    no change │
│ QQuery 17    │ 167.60ms │ 164.57ms │    no change │
│ QQuery 18    │ 465.70ms │ 545.63ms │ 1.17x slower │
│ QQuery 19    │  61.25ms │  60.38ms │    no change │
│ QQuery 20    │ 116.80ms │ 120.37ms │    no change │
│ QQuery 21    │ 335.60ms │ 342.52ms │    no change │
│ QQuery 22    │  30.19ms │  30.47ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 2604.90ms │
│ Total Time (8353d20)   │ 2705.21ms │
│ Average Time (d6ddd23) │  118.40ms │
│ Average Time (8353d20) │  122.96ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         2 │
│ Queries with No Change │        20 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 453.39ms │ 459.54ms │    no change │
│ QQuery 2     │  55.09ms │  57.11ms │    no change │
│ QQuery 3     │ 142.59ms │ 145.54ms │    no change │
│ QQuery 4     │  88.22ms │  89.39ms │    no change │
│ QQuery 5     │ 200.19ms │ 204.77ms │    no change │
│ QQuery 6     │ 105.65ms │ 105.17ms │    no change │
│ QQuery 7     │ 273.96ms │ 287.56ms │    no change │
│ QQuery 8     │ 182.75ms │ 179.28ms │    no change │
│ QQuery 9     │ 283.66ms │ 295.45ms │    no change │
│ QQuery 10    │ 228.35ms │ 233.43ms │    no change │
│ QQuery 11    │  41.04ms │  41.67ms │    no change │
│ QQuery 12    │ 127.32ms │ 129.31ms │    no change │
│ QQuery 13    │ 177.37ms │ 183.10ms │    no change │
│ QQuery 14    │ 124.31ms │ 124.03ms │    no change │
│ QQuery 15    │ 183.90ms │ 186.47ms │    no change │
│ QQuery 16    │  49.60ms │  49.47ms │    no change │
│ QQuery 17    │ 313.09ms │ 321.38ms │    no change │
│ QQuery 18    │ 447.60ms │ 493.86ms │ 1.10x slower │
│ QQuery 19    │ 226.96ms │ 228.16ms │    no change │
│ QQuery 20    │ 189.06ms │ 195.03ms │    no change │
│ QQuery 21    │ 317.85ms │ 315.83ms │    no change │
│ QQuery 22    │  40.09ms │  40.57ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 4252.06ms │
│ Total Time (8353d20)   │ 4366.11ms │
│ Average Time (d6ddd23) │  193.28ms │
│ Average Time (8353d20) │  198.46ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         1 │
│ Queries with No Change │        21 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃   d6ddd23 ┃   8353d20 ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 4470.71ms │ 4454.97ms │ no change │
│ QQuery 2     │  512.15ms │  491.60ms │ no change │
│ QQuery 3     │ 1709.79ms │ 1718.17ms │ no change │
│ QQuery 4     │  835.04ms │  831.02ms │ no change │
│ QQuery 5     │ 2157.94ms │ 2179.48ms │ no change │
│ QQuery 6     │ 1005.84ms │ 1005.22ms │ no change │
│ QQuery 7     │ 3452.80ms │ 3556.18ms │ no change │
│ QQuery 8     │ 2463.24ms │ 2497.21ms │ no change │
│ QQuery 9     │ 3975.42ms │ 3996.31ms │ no change │
│ QQuery 10    │ 2480.86ms │ 2486.30ms │ no change │
│ QQuery 11    │  343.56ms │  346.09ms │ no change │
│ QQuery 12    │ 1222.34ms │ 1224.75ms │ no change │
│ QQuery 13    │ 2313.42ms │ 2286.39ms │ no change │
│ QQuery 14    │ 1249.23ms │ 1263.20ms │ no change │
│ QQuery 15    │ 1908.59ms │ 1903.24ms │ no change │
│ QQuery 16    │  516.33ms │  509.42ms │ no change │
│ QQuery 17    │ 5413.51ms │ 5443.66ms │ no change │
│ QQuery 18    │ 6777.95ms │ 6896.24ms │ no change │
│ QQuery 19    │ 2243.45ms │ 2267.72ms │ no change │
│ QQuery 20    │ 2615.21ms │ 2579.24ms │ no change │
│ QQuery 21    │ 4479.53ms │ 4403.21ms │ no change │
│ QQuery 22    │  468.13ms │  451.80ms │ no change │
└──────────────┴───────────┴───────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 52615.04ms │
│ Total Time (8353d20)   │ 52791.42ms │
│ Average Time (d6ddd23) │  2391.59ms │
│ Average Time (8353d20) │  2399.61ms │
│ Queries Faster         │          0 │
│ Queries Slower         │          0 │
│ Queries with No Change │         22 │
└────────────────────────┴────────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @comphead

@alamb alamb merged commit 3777114 into apache:main Jun 1, 2024
23 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Fix: Sort Merge Join crashes on TPCH Q21

* Fix LeftAnti SMJ join when the join filter is set

* rm dbg

* Add SMJ to TPCH benchmark usage
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix Sort Merge Join to pass TPCH tests
2 participants