Skip to content

fix: from_plan shouldn't use original schema #6595

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jun 15, 2023
Merged

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Jun 8, 2023

Which issue does this PR close?

part of #6596.
Closes #6613

Rationale for this change

What changes are included in this PR?

In original code, project call from_plan, it will keep original schema even if expression are different.
It's wrong! Because different expression will have different schema. So I correct it.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jun 8, 2023
@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jun 8, 2023
@jackwener jackwener changed the title expression return wrong expression fix: simplify expression sometimes need to convert type. Jun 8, 2023
Comment on lines -733 to -739
LogicalPlan::Projection(Projection { schema, .. }) => {
Ok(LogicalPlan::Projection(Projection::try_new_with_schema(
expr.to_vec(),
Arc::new(inputs[0].clone()),
schema.clone(),
)?))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here exist a bug, we shouldn't use original schema, because it can be changed.
Original code hidden some BUG.

@jackwener jackwener changed the title fix: simplify expression sometimes need to convert type. fix: from_plan shouldn't use original schema & simplify expression need to convert type. Jun 8, 2023
@jackwener jackwener force-pushed the expr branch 2 times, most recently from bd31450 to 0276537 Compare June 10, 2023 06:04
@jackwener jackwener marked this pull request as ready for review June 10, 2023 06:04
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener -- this PR is definitely heading in the right direction. Fixing the projection schema is definitely exposing a bunch of bugs.

There are a few things that don't look right in the PR that I commented on.

Thanks again for trying to make things better. It is really apprecaited

@@ -384,8 +384,7 @@ impl DFSchema {
let self_fields = self.fields().iter();
let other_fields = other.fields().iter();
self_fields.zip(other_fields).all(|(f1, f2)| {
f1.qualifier() == f2.qualifier()
&& f1.name() == f2.name()
f1.qualified_name() == f2.qualified_name()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it matters, but I believe that qualified_name creates a new String where the previous version avoids that allocation.

Copy link
Member Author

@jackwener jackwener Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this, I have a new idea on weekend. we may need handle alias schema() to specify the schema.

Because alias('t1.a'), field is qualifier: none, name: t1.a, we hope field is qualifier: t1, name: a.

I will do it in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related issue #6681

@@ -110,10 +111,10 @@ select array_position(['h', 'e', 'l', 'l', 'o'], 'l', 4), array_position([1, 2,
4 5 2

# array_positions scalar function
query III
query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these errors look not quite right -- I think there is something wrong with sqllogictest --complete with multi-line errors 🤔

@jackwener jackwener marked this pull request as draft June 12, 2023 09:07
@jackwener jackwener changed the title fix: from_plan shouldn't use original schema & simplify expression need to convert type. fix: from_plan shouldn't use original schema Jun 13, 2023
@github-actions github-actions bot removed the optimizer Optimizer rules label Jun 13, 2023
@jackwener jackwener marked this pull request as ready for review June 13, 2023 08:05
@jackwener jackwener requested a review from alamb June 13, 2023 08:05
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener this looks like a nice improvement to me

However, it does seem to introduce a regression in some of the array functions recently introduced by @izveigor . The #6596 (comment) comment suggests to me we need to support NULL in array, so it is a known issue but it might be good to get @izveigor 's opinion

@@ -512,15 +512,22 @@ async fn test_regex_expressions() -> Result<()> {

#[tokio::test]
async fn test_cast_expressions() -> Result<()> {
test_expression!("CAST('0' AS INT)", "0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it to slt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a good follow on PR

Seems like the hope to was to move it to slt as part of #6210 but wasn't completed 🤔

@izveigor
Copy link
Contributor

I created PR about the 'nulls' problem: #6662. So I think it can solve the regression.

@jackwener
Copy link
Member Author

jackwener commented Jun 14, 2023

I prepare to merge this PR in tomorrow unless there are other comments

I will continue doing more job.

@jackwener jackwener merged commit 36123ee into apache:main Jun 15, 2023
@jackwener jackwener deleted the expr branch June 15, 2023 09:58
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit that referenced this pull request Jul 4, 2023
* revert: from_plan keep same schema Project in #6595

* revert: from_plan keep same schema Agg/Window in #6820

* revert type coercion

* add comment
2010YOUY01 pushed a commit to 2010YOUY01/arrow-datafusion that referenced this pull request Jul 5, 2023
* revert: from_plan keep same schema Project in apache#6595

* revert: from_plan keep same schema Agg/Window in apache#6820

* revert type coercion

* add comment
yukkit pushed a commit to cnosdb/arrow-datafusion that referenced this pull request Jul 10, 2023
* revert: from_plan keep same schema Project in apache#6595

* revert: from_plan keep same schema Agg/Window in apache#6820

* revert type coercion

* add comment
jayzhan211 added a commit to jayzhan211/datafusion that referenced this pull request Jul 13, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
alamb pushed a commit that referenced this pull request Jul 16, 2023
* revert array.slt that changed by #6595

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test for to string

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* first draft

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

from_plan shouldn't create projection by using original schema
4 participants