-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: from_plan shouldn't use original schema #6595
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
LogicalPlan::Projection(Projection { schema, .. }) => { | ||
Ok(LogicalPlan::Projection(Projection::try_new_with_schema( | ||
expr.to_vec(), | ||
Arc::new(inputs[0].clone()), | ||
schema.clone(), | ||
)?)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here exist a bug, we shouldn't use original schema
, because it can be changed.
Original code hidden some BUG.
bd31450
to
0276537
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jackwener -- this PR is definitely heading in the right direction. Fixing the projection schema is definitely exposing a bunch of bugs.
There are a few things that don't look right in the PR that I commented on.
Thanks again for trying to make things better. It is really apprecaited
@@ -384,8 +384,7 @@ impl DFSchema { | |||
let self_fields = self.fields().iter(); | |||
let other_fields = other.fields().iter(); | |||
self_fields.zip(other_fields).all(|(f1, f2)| { | |||
f1.qualifier() == f2.qualifier() | |||
&& f1.name() == f2.name() | |||
f1.qualified_name() == f2.qualified_name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it matters, but I believe that qualified_name
creates a new String
where the previous version avoids that allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this, I have a new idea on weekend. we may need handle alias schema()
to specify the schema.
Because alias('t1.a')
, field
is qualifier: none, name: t1.a
, we hope field
is qualifier: t1, name: a
.
I will do it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related issue #6681
@@ -110,10 +111,10 @@ select array_position(['h', 'e', 'l', 'l', 'o'], 'l', 4), array_position([1, 2, | |||
4 5 2 | |||
|
|||
# array_positions scalar function | |||
query III | |||
query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these errors look not quite right -- I think there is something wrong with sqllogictest --complete
with multi-line errors 🤔
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jackwener this looks like a nice improvement to me
However, it does seem to introduce a regression in some of the array functions recently introduced by @izveigor . The #6596 (comment) comment suggests to me we need to support NULL in array, so it is a known issue but it might be good to get @izveigor 's opinion
@@ -512,15 +512,22 @@ async fn test_regex_expressions() -> Result<()> { | |||
|
|||
#[tokio::test] | |||
async fn test_cast_expressions() -> Result<()> { | |||
test_expression!("CAST('0' AS INT)", "0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move it to slt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a good follow on PR
Seems like the hope to was to move it to slt as part of #6210 but wasn't completed 🤔
I created PR about the 'nulls' problem: #6662. So I think it can solve the regression. |
I prepare to merge this PR in tomorrow unless there are other comments I will continue doing more job. |
* revert: from_plan keep same schema Project in apache#6595 * revert: from_plan keep same schema Agg/Window in apache#6820 * revert type coercion * add comment
* revert: from_plan keep same schema Project in apache#6595 * revert: from_plan keep same schema Agg/Window in apache#6820 * revert type coercion * add comment
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
* revert array.slt that changed by #6595 Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add test for to string Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * first draft Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Which issue does this PR close?
part of #6596.
Closes #6613
Rationale for this change
What changes are included in this PR?
In original code,
project
callfrom_plan
, it will keep original schema even if expression are different.It's wrong! Because different expression will have different schema. So I correct it.
Are these changes tested?
Are there any user-facing changes?