Skip to content

RFC: add a field for Expr::Alias #6681

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
jackwener opened this issue Jun 15, 2023 · 3 comments
Closed

RFC: add a field for Expr::Alias #6681

jackwener opened this issue Jun 15, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@jackwener
Copy link
Member

jackwener commented Jun 15, 2023

Is your feature request related to a problem or challenge?

In the current optimization of Datafusion, we often use Expr::Alias instances for purposes:

  1. make the parent plan to correctly access the corresponding column.
  2. ensure that the column name remains unchanged.
  3. prevent changes to the schema.

However, there is a vulnerability in this point 3. When an alias contains a qualifier, the schema cannot remain unchanged.

Here's an example:

Original Expr: column: t.c1 (field: qualifier: t, name: c1)
New Expr: c2 as t.c1 (its field will be: field: qualifier: None, name: t.c1)

Describe the solution you'd like

I try to match Expr::Alias and handle it in to_field().

such as

    fn to_field(&self, input_schema: &DFSchema) -> Result<DFField> {
        match self {
            Expr::Alias(e, alias) if alias.contains('.') => {
                let index = alias.rfind('.').unwrap();
                let qualifier = alias[..index].to_string();
                let name = alias[index + 1..].to_string();
                Ok(DFField::new(
                    Some(qualifier),
                    &name,
                    e.get_type(input_schema)?,
                    e.nullable(input_schema)?,
                ))
            }
            Expr::Column(c) => Ok(DFField::new(
                c.relation.clone(),
                &c.name,
                self.get_type(input_schema)?,
                self.nullable(input_schema)?,
            )),

but I failed, because alias can be complex, we hard to parse it and separate qualifier and name.
such as above code will fail in expr like alias is Cast(t1.a as int)

So, I proposal to add a DFField in Expr::Alias, so we will get DFField of Expr::Alias directly

Describe alternatives you've considered

No response

Additional context

I find it in this PR #6595 (comment)

@alamb
Copy link
Contributor

alamb commented Jun 15, 2023

I think this plan makes sense to me -- though perhaps we should use DFFieldRef to avoid so many copies 🤔

@jackwener
Copy link
Member Author

jackwener commented Jun 16, 2023

though perhaps we should use DFFieldRef to avoid so many copies

Agree with it. The method I said above just explain the idea.

@jonahgao
Copy link
Member

Closed it since it was finished by qualified alias.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants