[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

NVnavkumar · 2022-10-03T16:00:00Z

Describe the bug
Spark updated AQE to handle a user-specified repartition (e.g using df.partition(<num>) in the output when AQE was enabled. Previously Spark did not have to respect this when AQE was enabled because AQE optimization by definition could never fully respect this partitioning after shuffle. Spark now ensures that the output partitions will match when the user uses repartition by adjusting shuffle partitions afterwards.

Spark commit - apache/spark@801ca252f4

The text was updated successfully, but these errors were encountered:

sameerz · 2022-10-04T20:08:39Z

We need to validate that we are not impacted by doing testing.

NVnavkumar · 2022-10-04T20:49:58Z

Sort of related, this issue was filed for Spark 3.1 a while ago: #1527

NVnavkumar added bug Something isn't working ? - Needs Triage Need team to review and classify audit_3.4.0 Audit related tasks for 3.4.0 Spark 3.4+ Spark 3.4+ issues labels Oct 3, 2022

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Oct 4, 2022

NVnavkumar mentioned this issue May 17, 2023

[EPIC] Spark 3.4 Remaining Functionality #8316

Open

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

NVnavkumar commented Oct 3, 2022

sameerz commented Oct 4, 2022

NVnavkumar commented Oct 4, 2022

[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

Comments

NVnavkumar commented Oct 3, 2022

sameerz commented Oct 4, 2022

NVnavkumar commented Oct 4, 2022