Skip to content

fix q35 #17

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions tpcds/queries-spark/q35.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ select
cd_marital_status,
cd_dep_count,
count(*) cnt1,
min(cd_dep_count),
max(cd_dep_count),
stddev_samp(cd_dep_count),
stddev_samp(cd_dep_count),
avg(cd_dep_count),
Comment on lines +10 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries in this repo are the official queries as generated by the TPC-DS query generator provided by the TPC. I don't think that we should be changing them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the query generator has a bug in this case, see https://github.com/apache/datafusion/blob/main/datafusion/core/tests/tpc-ds/35.sql as another generated version that correctly substituted the aggregations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I just checked the query description in the TPC-DS specification, and it does not use stdev_sample. I wonder if this was inadvertently checked in after some manual experimentation. Thanks for catching this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, latest version of the tpc-ds generator also generated this query wrong for me, so I think it is some bug in the param substitution in the dsqgen executable

cd_dep_employed_count,
count(*) cnt2,
min(cd_dep_employed_count),
max(cd_dep_employed_count),
stddev_samp(cd_dep_employed_count),
stddev_samp(cd_dep_employed_count),
avg(cd_dep_employed_count),
cd_dep_college_count,
count(*) cnt3,
min(cd_dep_college_count),
max(cd_dep_college_count),
stddev_samp(cd_dep_college_count),
stddev_samp(cd_dep_college_count)
avg(cd_dep_college_count)
from
customer c,customer_address ca,customer_demographics
where
Expand Down
12 changes: 6 additions & 6 deletions tpcds/queries/q35.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ select
cd_marital_status,
cd_dep_count,
count(*) cnt1,
min(cd_dep_count),
max(cd_dep_count),
stddev_samp(cd_dep_count),
stddev_samp(cd_dep_count),
avg(cd_dep_count),
cd_dep_employed_count,
count(*) cnt2,
min(cd_dep_employed_count),
max(cd_dep_employed_count),
stddev_samp(cd_dep_employed_count),
stddev_samp(cd_dep_employed_count),
avg(cd_dep_employed_count),
cd_dep_college_count,
count(*) cnt3,
min(cd_dep_college_count),
max(cd_dep_college_count),
stddev_samp(cd_dep_college_count),
stddev_samp(cd_dep_college_count)
avg(cd_dep_college_count)
from
customer c,customer_address ca,customer_demographics
where
Expand Down