[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering #9743

tgravescs · 2023-11-16T15:14:50Z

Describe the bug
https://issues.apache.org/jira/browse/SPARK-45652 was part of feature for storage partitioned join. It changes BatchScanExec which we have the equivalent code in GpuBatchScanExec so we should pull this change in.

apache/spark@75aed566018

The text was updated successfully, but these errors were encountered:

razajafri · 2023-12-13T22:54:57Z

I have tried reproducing this but I am unable to see it in local mode. Here are the steps I took

$> spark-homes/spark-3.4.1-bin-hadoop3/bin/spark-shell --conf spark.sql.autoBroadcastJoinThreshold=“-1” --conf spark.sql.adaptive.enabled=“false” --conf spark.sql.optimizer.dynamicPartitionPruning.enabled=“true” --conf spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=“false” --conf spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=“10”

scala> import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf
scala> spark.sql("create table test_purchases (item_id long, price float, time timestamp) using parquet partitioned by (item_id)")
scala> spark.sql("create table test_table (id long, name string, price float, arrive_time timestamp) using parquet partitioned by (id)")
scala> spark.sql("insert into test_purchases values((19.5, cast('2020-02-01' as timestamp), 3))")
scala> spark.sql("insert into test_purchases values((11.0, cast('2020-01-01' as timestamp), 2))")
scala> spark.sql("insert into test_purchases values((45.0, cast('2020-01-15' as timestamp), 1))")
scala> spark.sql("insert into test_purchases values((44.0, cast('2020-01-15' as timestamp), 1))")
scala> spark.sql("insert into test_purchases values((42.0, cast('2020-01-01' as timestamp), 1))")
scala> spark.sql("insert into test_table values(('cc', 15.5, cast('2020-02-01' as timestamp), 3))")
scala> spark.sql("insert into test_table values(('bb', 10.5, cast('2020-01-01' as timestamp), 2))")
scala> spark.sql("insert into test_table values(('bb', 10.0, cast('2020-01-01' as timestamp), 2))")
scala> spark.sql("insert into test_table values(('aa', 41.0, cast('2020-01-15' as timestamp), 1))")
scala> spark.sql("insert into test_table values(('aa', 40.0, cast('2020-01-01' as timestamp), 1))")
scala> Seq(true, false).foreach { pushDownValues =>
     | Seq(true, false).foreach { partiallyClustered => {
     | spark.conf.set(SQLConf.V2_BUCKETING_PARTIALLY_CLUSTERED_DISTRIBUTION_ENABLED.key, partiallyClustered)
     | spark.conf.set(SQLConf.V2_BUCKETING_PUSH_PART_VALUES_ENABLED.key, pushDownValues)
     | spark.sql("select /*+ REPARTITION(3)*/ p.price from test_table i, test_purchases p WHERE i.id = p.item_id AND i.price > 50.0").show
     | }
     | }
     | }
+-----+
|price|
+-----+
+-----+

+-----+
|price|
+-----+
+-----+

+-----+
|price|
+-----+
+-----+

+-----+
|price|
+-----+
+-----+

After discussing this with @tgravescs the next step he suggested was to build Spark 3.4.1 locally and run the unit tests

tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify audit_3.4.2 Audit related tasks for 3.4.2 audit_3.5.1 Audit related tasks for 3.5.1 audit_4.0.0 Audit related tasks for 4.0.0 labels Nov 16, 2023

sameerz removed the ? - Needs Triage Need team to review and classify label Nov 21, 2023

This was referenced Dec 7, 2023

[FEA] Create spark 3.5.1 shim and build env #9258

Closed

Add 3.5.1-SNAPSHOT Shim #9962

Merged

razajafri closed this as completed in #9962 Dec 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering #9743

[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering #9743

tgravescs commented Nov 16, 2023

razajafri commented Dec 13, 2023

[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering #9743

[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering #9743

Comments

tgravescs commented Nov 16, 2023

razajafri commented Dec 13, 2023