Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown #3985

Closed
ywang529 opened this issue Nov 1, 2021 · 2 comments · Fixed by #5315
Closed
Assignees
Labels
feature request New feature or request

Comments

@ywang529
Copy link

ywang529 commented Nov 1, 2021

What is your question?
Hi, It is a great open-source project, thanks for sharing. We have been evaluating the udf-compiler, it works great as advertised, but we run into a problem when we try to test a filter pushdown. We wrote a simple UDF as below, but the PushedFilters is empty with udf-compiler:

    spark.udf.register("isOlderThan20", (i: Int) => {i>20})
    val udfResult = spark.sql("SELECT * FROM people_with_schema WHERE isOlderThan20(age)")
    
   +- FileScan json [name#0,age#1] Batched: false, DataFilters: [if ((age#1 > 20)) true else false], ..... PushedFilters: [], 

What we expected is to have a physical plan like this:

    val udfResult = spark.sql("SELECT * FROM people_with_schema WHERE age > 20")
   
    !+- Filter (isnotnull(age#1) AND (age#1 > 20))                                                                                                                                                                                                                                                     
    +- FileScan ...PushedFilters: [IsNotNull(age), GreaterThan(age,20)], 

By turning on trace, it looks like that the udf-compiler translated this udf as "if-else", instead of just "age#1 >20":
Filter if ((age#1 > 20)) true else false

Wonder if this is by design? If not, wonder if you can suggest some ideas to us on how to fix this?

Thank you very much!
-Yong Wang

@ywang529 ywang529 added ? - Needs Triage Need team to review and classify question Further information is requested labels Nov 1, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021
@jlowe jlowe added feature request New feature or request and removed question Further information is requested labels Nov 3, 2021
@jlowe jlowe changed the title [QST] UDF-Compiler: Predicate pushdown is not working in this case [FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown Nov 3, 2021
@jlowe
Copy link
Contributor

jlowe commented Nov 3, 2021

Thanks for the report! I verified with a vanilla Spark session that specifying an expression as an IF statement is sufficient to defeat the predicate pushdown into the load.

Migrating this into a feature request to add support for predicate pushdown for Catalyst operations resulting from the UDF compiler.

@ywang529
Copy link
Author

ywang529 commented Nov 3, 2021

Thank you very much for the help!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants