Skip to content

fix: make ntile work in some corner cases #8371

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Dec 1, 2023
Merged

Conversation

haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #8284

Rationale for this change

fix #8284

What changes are included in this PR?

Add parameter type checking using Signature,
before

❯ select ntile('1') OVER(ORDER BY a) from t1;
Internal error: Cannot convert Utf8("1") to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯ select ntile([]) OVER(ORDER BY a) from t1;
Internal error: Cannot convert List([NullArray(0)]) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

after

❯ select ntile([]) OVER(ORDER BY a) from t1;
Error during planning: No function matches the given name and argument types 'NTILE(List(Field { name: "item", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }))'. You might need to add explicit type casts.
        Candidate functions:
        NTILE(UInt64/UInt32/UInt16/UInt8/Int64/Int32/Int16/Int8)
❯ select ntile('1') OVER(ORDER BY a) from t1;
Error during planning: No function matches the given name and argument types 'NTILE(Utf8)'. You might need to add explicit type casts.
        Candidate functions:
        NTILE(UInt64/UInt32/UInt16/UInt8/Int64/Int32/Int16/Int8)

make ntile work properly, when parameter greater than table's row number

Are these changes tested?

yes, add more test

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Nov 30, 2023
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Nov 30, 2023

cc @msirek @alamb , PTAL, I fix all incorrect results find in #8284, and add more tests.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin we prob can also test partition by null order by null cases

@alamb
Copy link
Contributor

alamb commented Nov 30, 2023

Perhaps @mustafasrepo or @ozankabak has time to review this code as they are more familiar with NTILE

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks good -- thank you @haohuaijin . If you can make the changes requested by @comphead I think this PR will be good to go

@haohuaijin
Copy link
Contributor Author

@comphead I find partition by null and order by null will fail in all windows function(maybe, I'm not test all)

❯ CREATE TABLE t1 (a int) AS VALUES (1), (2), (3);
0 rows in set. Query took 0.018 seconds.

❯ select rank() over (partition by null) from t1;
thread 'tokio-runtime-worker' panicked at /home/hhj/datafusion/datafusion/physical-plan/src/repartition/mod.rs:208:79:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("must either specify a row count or at least one column")
Execution error: Expects PARTITION BY expression to be ordered

❯ select rank() over (order by null) from t1;
type_coercion
caused by
Internal error: Cannot run range queries on datatype: Null.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

❯ select sum(a) over (order by null) from t1;
type_coercion
caused by
Internal error: Cannot run range queries on datatype: Null.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

❯ select sum(a) over (partition by null) from t1;
Internal error: All partition by columns should have an ordering.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

I file an issue #8386 to track this.

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @haohuaijin for this PR. It is rally nice. I added a minor comment (purely stylistic, you can ignore it if you wish). This PR is LGTM!.

Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin and @mustafasrepo

@alamb alamb merged commit eb5aa22 into apache:main Dec 1, 2023
@haohuaijin haohuaijin deleted the fix_ntile branch December 1, 2023 23:33
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 14, 2023
* fix: make ntile work in some corner cases

* fix comments

* minor

* Update datafusion/sqllogictest/test_files/window.slt

Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>

---------

Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Miscellaneous ntile function bugs, possible incorrect results
4 participants