Skip to content

feat: Support array_sort(list_sort) #8279

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 50 commits into from
Dec 6, 2023
Merged

feat: Support array_sort(list_sort) #8279

merged 50 commits into from
Dec 6, 2023

Conversation

Asura7969
Copy link
Contributor

Which issue does this PR close?

Closes #109 .

Rationale for this change

Support append_sort(list_sort) function

What changes are included in this PR?

Four array functions now start to support column data:
array_sort
list_sort

Are these changes tested?

Yes
test_array_sort

Are there any user-facing changes?

Yes

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Nov 20, 2023
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since DuckDB has array_sort already, and I think there is no good reason that we should not follow theirs. Can you have the similar style like theirs?

https://duckdb.org/docs/sql/functions/nested#sorting-lists

Especially, I think array_sort(list, 'asc') or array_sort(list, 'desc') is more much straightforward then true, false. We need to check one of the result to know whether true is asc or desc. So as null_first

@@ -2669,6 +2698,48 @@ mod tests {
);
}

#[test]
fn test_array_sort() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont add test here, just add them in slt file

@Asura7969
Copy link
Contributor Author

Since DuckDB has array_sort already, and I think there is no good reason that we should not follow theirs. Can you have the similar style like theirs?

https://duckdb.org/docs/sql/functions/nested#sorting-lists

Especially, I think array_sort(list, 'asc') or array_sort(list, 'desc') is more much straightforward then true, false. We need to check one of the result to know whether true is asc or desc. So as null_first

I'll try

@@ -1044,6 +1044,20 @@ select make_array(['a','b'], null);
----
[[a, b], ]

## array_sort (aliases: `list_sort`)
query ???
select array_sort(make_array(4, 2, 3, 1)), array_sort(make_array(1.0, 4.0, null, 3.0), true, true), array_sort(make_array('a', 'd', 'c', null, 'b'), false, false);
Copy link
Contributor

@jayzhan211 jayzhan211 Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, you can also try column-wise support, pass the test with table, you can refer other array_function that run with table. But it is fine for me to support that in follow on PR

[, 11, 12, 13, 14, 15, 16, 17, 18, 20]
[, 21, 22, 23, 25, 26, 27, 28, 29, 30]
[, 31, 32, 33, 34, 35, 37, 38, 39, 40]
[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_sort(null) should return null, not []

Copy link
Contributor

@jayzhan211 jayzhan211 Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Conflicts:
#	datafusion/proto/proto/datafusion.proto
#	datafusion/proto/src/generated/pbjson.rs
#	datafusion/proto/src/generated/prost.rs
@Asura7969
Copy link
Contributor Author

cc @alamb

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @Asura7969

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Asura7969 (and @jayzhan211 and @Weijun-H for the reviews), and I am really sorry for taking so long to look at this PR

I think all that is needed now is to merge up from main, and resolve some conflicts, and this PR will be ready to merge

Thanks again

# Conflicts:
#	datafusion/expr/src/built_in_function.rs
#	datafusion/proto/proto/datafusion.proto
#	datafusion/proto/src/generated/pbjson.rs
#	datafusion/proto/src/generated/prost.rs
@alamb alamb merged commit d9d8ddd into apache:main Dec 6, 2023
@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

Thanks again 🚀

@Asura7969 Asura7969 deleted the list_sort branch December 7, 2023 00:44
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
* Minor: Improve the document format of JoinHashMap

* list sort

* fix: example doc

* fix: ci

* fix: doc error

* fix pb

* like DuckDB function semantics

* fix ci

* fix pb

* fix: doc

* add table test

* fix: not as expected

* fix: return null

* resolve conflicts

* doc

* merge
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for list_sort
4 participants