Skip to content

Add Expr::try_as_col, deprecate Expr::try_into_col (speed up optimizer) #10448

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
May 13, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 10, 2024

Which issue does this PR close?

Part of #9637

I ran into this while working on #10444

Rationale for this change

There are places in the code that check if an expr is a column by using Expr::try_into_col() which has two potential performance problems:

  1. It requires cloning the Column (and thus a string) to check if an expr is a Expr::Column
  2. It requires creating and ignoring a DataFusionError to check if an expr is not a Expr::Column

What changes are included in this PR?

  1. Add Expr::try_as_col
  2. Deprecate Expr::try_into_col
  3. Change code to use Expr::try_into_col

Are these changes tested?

By existing tests

Are there any user-facing changes?

A new function / less required copying

I doubt we will see any significant performance changes however it does reduce allocations during planning

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels May 10, 2024
@alamb alamb marked this pull request as ready for review May 11, 2024 20:03
@alamb
Copy link
Contributor Author

alamb commented May 12, 2024

Thanks @sadboy 🙏

@alamb alamb changed the title Add Expr::try_as_col, deprecate Expr::try_into_col Add Expr::try_as_col, deprecate Expr::try_into_col (speed up optimizer) May 13, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most places seem to still be cloning it, but I guess baby steps 😅

@alamb
Copy link
Contributor Author

alamb commented May 13, 2024

Most places seem to still be cloning it, but I guess baby steps 😅

Yes indeed -- baby steps. Thank you for the review @tustvold

Another thing that would likely help non trivially would be to introduce a version of Expr::to_columns that does not requiring cloneing each Column. I'll file at ticket about that too

accumu.insert(l.try_into_col()?);
accumu.insert(r.try_into_col()?);
let Some(l) = l.try_as_col().cloned() else {
return internal_err!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making the error here (rather than in try_into_col) results in a more specific error message that will be easier to debug if we ever hit it

@alamb alamb merged commit 53de994 into apache:main May 13, 2024
24 checks passed
@alamb alamb deleted the alamb/as_column branch May 13, 2024 13:10
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants