-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix comparison of dictionary arrays #1606
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
assert_batches_eq!(expected, &actual); | ||
|
||
// comparison with another dictionary column | ||
let sql = "SELECT d1 FROM test WHERE d1 = d2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test fails without the change
98e574d
to
c4db8f3
Compare
@@ -77,7 +77,7 @@ pub(crate) fn coerce_types( | |||
} | |||
|
|||
fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the arrow eq
kernels don't support DictionaryArray
<==> DictionaryArray
comparisons yet, so we need to handle this case specially. I will also file a ticket to add native DictionaryArray comparison support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat | |||
} | |||
} | |||
|
|||
pub(crate) fn is_dictionary(t: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication of code between this module and binary_rule.rs
is not good -- I am going to try and consolidate it as a follow on PR cc @liukun4515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1607 <-- doesn't remove duplication but it at least consolidates the logic into a single module
@@ -77,7 +77,7 @@ pub(crate) fn coerce_types( | |||
} | |||
|
|||
fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add TODO
or some comments for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do so -- it is a good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in e27110f
@@ -90,7 +90,7 @@ fn comparison_order_coercion( | |||
lhs_type: &DataType, | |||
rhs_type: &DataType, | |||
) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's same with above.
@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat | |||
} | |||
} | |||
|
|||
pub(crate) fn is_dictionary(t: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point I want to point out is that the arrow-rs has some codes to check data type, and the logic is also in datafusion, for example, https://docs.rs/arrow/7.0.0/arrow/datatypes/enum.DataType.html#method.is_numeric, Maybe we should refactor them and make the codebase clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed as #1613
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @alamb
Which issue does this PR close?
Closes #1605
See ticket for more details