Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Audit][BUG] Ensure GPU handles user specified repartition in Spark 3.4 when AQE is enabled #6678

Open
NVnavkumar opened this issue Oct 3, 2022 · 2 comments
Labels
audit_3.4.0 Audit related tasks for 3.4.0 bug Something isn't working P1 Nice to have for release Spark 3.4+ Spark 3.4+ issues

Comments

@NVnavkumar
Copy link
Collaborator

Describe the bug
Spark updated AQE to handle a user-specified repartition (e.g using df.partition(<num>) in the output when AQE was enabled. Previously Spark did not have to respect this when AQE was enabled because AQE optimization by definition could never fully respect this partitioning after shuffle. Spark now ensures that the output partitions will match when the user uses repartition by adjusting shuffle partitions afterwards.

Spark commit - apache/spark@801ca252f4

@NVnavkumar NVnavkumar added bug Something isn't working ? - Needs Triage Need team to review and classify audit_3.4.0 Audit related tasks for 3.4.0 Spark 3.4+ Spark 3.4+ issues labels Oct 3, 2022
@sameerz sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Oct 4, 2022
@sameerz
Copy link
Collaborator

sameerz commented Oct 4, 2022

We need to validate that we are not impacted by doing testing.

@NVnavkumar
Copy link
Collaborator Author

Sort of related, this issue was filed for Spark 3.1 a while ago: #1527

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
audit_3.4.0 Audit related tasks for 3.4.0 bug Something isn't working P1 Nice to have for release Spark 3.4+ Spark 3.4+ issues
Projects
None yet
Development

No branches or pull requests

2 participants