Skip to content

Implement array_has_all function #6973

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
izveigor opened this issue Jul 15, 2023 · 3 comments · Fixed by #6990
Closed

Implement array_has_all function #6973

izveigor opened this issue Jul 15, 2023 · 3 comments · Fixed by #6990
Labels
enhancement New feature or request

Comments

@izveigor
Copy link
Contributor

izveigor commented Jul 15, 2023

Is your feature request related to a problem or challenge?

Summary

Characteristic Description
Function name: array_has_all
Aliases: list_has_all, @>, <@
Original function?: No
Function Description: Clickhouse: Checks whether one array is a subset of another.
DuckDB: Returns true if all elements of sub-list exist in list.
Sources: Concept ClickHouse DuckDB

Examples:

select array_has_all([1, 2, 3, 4], [3, 4]);
----
true
select array_has_all([1, 2, 3, 4], [5, 4]);
----
false

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

#6972
No response

@jayzhan211
Copy link
Contributor

array_has_all(array_contains) rhs is allowed to pass nested array now, should we only accept 1d array for rhs?

@izveigor
Copy link
Contributor Author

@jayzhan211 We should accept arrays with all possible dimensions.
But I think you mean that should we use the trick used in array_concat (use different dimensions for input arrays). For example, ClickHouse does not support the current DataFusion (array_contains):

SELECT hasAll([[1, 2, 3], [4, 5, 6]], [[1, 2, 3]])

Query id: 4f4aaabd-a56c-4182-96f5-32b53b845fa0

┌─hasAll([[1, 2, 3], [4, 5, 6]], [[1, 2, 3]])─┐
│                                           1 │
└─────────────────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec.

BUT:

SELECT hasAll([[1, 2, 3], [4, 5, 6]], [1, 2, 3])

Query id: 6055996d-8db7-4674-a36b-8d620335ff96


0 rows in set. Elapsed: 0.001 sec. 

Received exception from server (version 23.6.2):
Code: 386. DB::Exception: Received from localhost:9000. DB::Exception: There is no supertype for types Array(UInt8), UInt8 because some of them are Array and some of them are not: While processing hasAll([[1, 2, 3], [4, 5, 6]], [1, 2, 3]). (NO_COMMON_TYPE)

Against, PostgreSQL supports our version:

postgres=# select array[array[array[1, 2, 3], array[4, 5, 6]], array[array[4, 5, 6], array[7, 8, 9]]] @> array[1, 2, 3];
 ?column? 
----------
 t
(1 row)

I really don't know which version we should follow. Therefore, in this matter, I trust your taste.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jul 17, 2023

I think we can have both.

For the postgres version, which is the current one, my understanding of array_contains(array_has_all) now is that we have array_has_all(list: nested array, sub-list: Vec<?>), And we iterate sub-list to check whether the element exists in the list, any similar to bitwise OR, all similar to bitwise AND, array_has is array_has_all but with element, not sub-list. In this case, it does not make sense to me to support nested list (>1d) for sub-list, since there is not a lot of difference if we limit sub-list to 1d array. For example, array_has_all(list, [[[1,2,2,3],[3,4,4,4]]]) is the same as array_has_all(list, [1,2,3,4]) but MUCH MORE SIMPLE.

The Clickhouse version to me is more like set operations and really does what contains do. We can have array_has_all(list, sub-list) where the sub-list SHOULD accept the nested array, and we check if the list contains the sub-list. In the above example, we should check if [[[1,2,2,3],[3,4,4,4]]] exists in parts of the list.

To combine these two, array_has/array_contains is the basic function that has two arguments (list: nested_array, sub-list: nested_array with dimension <= list). array_has_all is the extended version that is equivalent to array_has(list, sub-list[0]) && array_has(list, sub-list[1]) .. && array_has(list, sub-list[n]), bitwise AND version. Similarly, array_has_any is the bitwise OR version. And, arguments of array_has_all and array_has_any are (list: nested_array, sub-list: Vec<nested_array>).

@izveigor, what do you think about this approach?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants