Fix regression from 21.12 where udfs defined in repl no longer worked #5030

abellina · 2022-03-23T21:07:55Z

Signed-off-by: Alessandro Bellina abellina@nvidia.com

Closes #5019.

Reverts a localized change to the udf-compiler introduced with this PR (#3726).

With ShimLoader.loadClass, we couldn't resolve a UDF defined in the repl, as documented here: #5019.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina · 2022-03-23T21:28:49Z

build

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala

wjxiz1992 · 2022-03-24T02:36:03Z

Should we add some tests for such case?

abellina · 2022-03-24T17:19:13Z

Should we add some tests for such case?

I am not sure this is worth it, and honestly I do not know how to replicate the repl in a test. I guess I could define a lambda in a different class loader, but then again I am not sure if it is wortwhile.

gerashegalov

LGTM
Would be great if we can also verify a non-REPL use of Scala UDF in a Spark Job manually. We don't have Scala integration tests.

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala

…ion.scala Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

abellina · 2022-03-25T21:15:25Z

@gerashegalov

Would be great if we can also verify a non-REPL use of Scala UDF in a Spark Job manually. We don't have Scala integration tests.

Sorry it took me a little while. We do have the OpcodeSuite unit tests that do cover all the translations available (a spark session is created there). I also create a simple app, and submitted it using spark-submit against a standalone cluster:

package com.nvidia.spark.udf

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf

object SimpleApp {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
    import spark.implicits._
    val dataset = List(true, false, true, false).toDS().repartition(1)
    val myudf: Boolean => Boolean = { x => !x }
    val u = udf(myudf)

    var result = dataset.withColumn("new", u('value))
    result.explain(true)
    result.show()

    spark.conf.set("spark.rapids.sql.udfCompiler.enabled", "true")
    result = dataset.withColumn("new", u('value))
    result.explain(true)

    result.show()
    spark.stop()
  }
}

With the plugin disabled:

== Physical Plan ==
*(1) Project [value#1, UDF(value#1) AS new#8]
+- Exchange RoundRobinPartitioning(1), REPARTITION_WITH_NUM, [id=#9]
   +- LocalTableScan [value#1]

With the plugin enabled:

== Physical Plan ==
*(1) Project [value#1, if (NOT value#1) true else false AS new#23]
+- Exchange RoundRobinPartitioning(1), REPARTITION_WITH_NUM, [id=#41]
   +- LocalTableScan [value#1]

abellina · 2022-03-25T21:16:35Z

build

Fix regression from 21.12 where udfs defined in repl no longer worked

d6fb013

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina requested review from jlowe, gerashegalov and tgravescs March 23, 2022 21:07

abellina mentioned this pull request Mar 23, 2022

[BUG] udf compiler failed to translate UDF in spark-shell #5019

Closed

abellina added the bug Something isn't working label Mar 23, 2022

Update copyrights

9d46c92

gerashegalov reviewed Mar 23, 2022

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala Show resolved Hide resolved

Add a comment explaining the use of Class.forName

1020f54

gerashegalov previously approved these changes Mar 25, 2022

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala Outdated Show resolved Hide resolved

Update udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflect…

514c7a2

…ion.scala Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

abellina dismissed gerashegalov’s stale review via 514c7a2 March 25, 2022 20:22

gerashegalov approved these changes Mar 25, 2022

View reviewed changes

jlowe approved these changes Mar 25, 2022

View reviewed changes

abellina merged commit 35d777e into NVIDIA:branch-22.04 Mar 25, 2022

abellina deleted the udf/fix_repl_issue branch March 25, 2022 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression from 21.12 where udfs defined in repl no longer worked #5030

Fix regression from 21.12 where udfs defined in repl no longer worked #5030

abellina commented Mar 23, 2022

abellina commented Mar 23, 2022

wjxiz1992 commented Mar 24, 2022

abellina commented Mar 24, 2022

gerashegalov left a comment

abellina commented Mar 25, 2022 •

edited

Loading

abellina commented Mar 25, 2022

Fix regression from 21.12 where udfs defined in repl no longer worked #5030

Fix regression from 21.12 where udfs defined in repl no longer worked #5030

Conversation

abellina commented Mar 23, 2022

abellina commented Mar 23, 2022

wjxiz1992 commented Mar 24, 2022

abellina commented Mar 24, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

abellina commented Mar 25, 2022 • edited Loading

abellina commented Mar 25, 2022

abellina commented Mar 25, 2022 •

edited

Loading