Skip to content

Add benchmark for STDDEV and VAR to Clickbench extended #12146

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Aug 25, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 24, 2024

Which issue does this PR close?

Part of #12094

Rationale for this change

@ejbyfeldt's great PR to improve STDDEV and VAR #12095 did not had not improve any existing benchmarks. I think it would be good to add one.

What changes are included in this PR?

Add a query to the "extended" clickbench queries (that we use to benchmark datafusion) that has stddev and var

Are these changes tested?

I tested it manually like this:

./bench.sh run clickbench_extended
Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), COUNT(DISTINCT "MobilePhoneModel") FROM hits;
Query 0 iteration 0 took 788.3 ms and returned 1 rows
Query 0 iteration 1 took 723.0 ms and returned 1 rows
Query 0 iteration 2 took 722.1 ms and returned 1 rows
Query 0 iteration 3 took 741.7 ms and returned 1 rows
Query 0 iteration 4 took 744.6 ms and returned 1 rows
Q1: SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), COUNT(DISTINCT "BrowserLanguage")  FROM hits;
Query 1 iteration 0 took 285.5 ms and returned 1 rows
Query 1 iteration 1 took 256.2 ms and returned 1 rows
Query 1 iteration 2 took 262.2 ms and returned 1 rows
Query 1 iteration 3 took 266.6 ms and returned 1 rows
Query 1 iteration 4 took 277.7 ms and returned 1 rows
Q2: SELECT "BrowserCountry",  COUNT(DISTINCT "SocialNetwork"), COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserLanguage"), COUNT(DISTINCT "SocialAction") FROM hits GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
Query 2 iteration 0 took 549.7 ms and returned 10 rows
Query 2 iteration 1 took 540.9 ms and returned 10 rows
Query 2 iteration 2 took 611.5 ms and returned 10 rows
Query 2 iteration 3 took 552.7 ms and returned 10 rows
Query 2 iteration 4 took 566.2 ms and returned 10 rows
Q3: SELECT "SocialSourceNetworkID", "RegionID", COUNT(*), AVG("Age"), AVG("ParamPrice"), STDDEV("ParamPrice") as s, VAR("ParamPrice")  FROM hits GROUP BY "SocialSourceNetworkID", "RegionID" HAVING s IS NOT NULL ORDER BY s DESC LIMIT 10;
Query 3 iteration 0 took 419.4 ms and returned 10 rows
Query 3 iteration 1 took 423.5 ms and returned 10 rows
Query 3 iteration 2 took 410.4 ms and returned 10 rows
Query 3 iteration 3 took 402.7 ms and returned 10 rows
Query 3 iteration 4 took 408.4 ms and returned 10 rows
Done

Are there any user-facing changes?

No this is an internal benchmark script

@alamb alamb marked this pull request as ready for review August 24, 2024 10:26
@alamb alamb changed the title Add query with STDDEV and VAR to Clickbench extended Add benchmark for STDDEV and VAR to Clickbench extended Aug 24, 2024
@alamb alamb added the development-process Related to development process of DataFusion label Aug 24, 2024
@alamb
Copy link
Contributor Author

alamb commented Aug 25, 2024

Thanks @Dandandan

@alamb alamb merged commit 83d3c5a into apache:main Aug 25, 2024
28 checks passed
@alamb alamb deleted the alamb/extended_bench branch August 25, 2024 10:57
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
development-process Related to development process of DataFusion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants