[SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements #29677

sarutak · 2020-09-08T09:36:59Z

What changes were proposed in this pull request?

This PR changes EnsureRequirements to let it remove redundant ShuffleExchange.
Normally, redundant repartition operations are removed by CollapseRepartition rule but EnsureRequirements can insert another HashPartitioning or RangePartitioning immediately after the repartition, leading adjacent ShuffleExchange will be in the physical plan.
Even if their outputPartitioning are different, those adjacent ShuffleExcnahge are redundant.

An example.

val ordered = spark.range(1, 100).repartitionByRange(10, $"id".desc).orderBy($"id")
ordered.explain(true)

...

== Physical Plan ==
*(2) Sort [id#0L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(id#0L ASC NULLS FIRST, 200), true, [id=#15]
   +- Exchange rangepartitioning(id#0L DESC NULLS LAST, 10), false, [id=#14]
      +- *(1) Range (1, 100, step=1, splits=12)

In this case, the lower Exchange for rangepartitioning is redundant.

Another example.

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 0)
val left = Seq(1,2,3).toDF.repartition(10)
val right = Seq(1,2,3).toDF
val joined = left.join(right, left("value") + 1 === right("value"))
joined.explain(true)

...

== Physical Plan ==
*(3) SortMergeJoin [(value#11 + 1)], [value#1], Inner
:- *(1) Sort [(value#11 + 1) ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning((value#11 + 1), 200), true, [id=#41]
:     +- Exchange RoundRobinPartitioning(10), false, [id=#37]
:        +- LocalTableScan [value#11]
+- *(2) Sort [value#1 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(value#1, 200), true, [id=#42]
      +- LocalTableScan [value#1]

In this case, the lower Exchange for RoundRobinPartitioninginserted byEnsureRequirements` for the left side is not necessary.

Why are the changes needed?

To remove unnecessary shuffle.

Does this PR introduce any user-facing change?

Yes. After this change, such redundant ShuffleExchange will be removed.

== Physical Plan ==
*(2) Sort [id#0L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(id#0L ASC NULLS FIRST, 200), true, [id=#14]
   +- *(1) Range (1, 100, step=1, splits=12)

== Physical Plan ==
*(3) SortMergeJoin [(value#6 + 1)], [value#1], Inner
:- *(1) Sort [(value#6 + 1) ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning((value#6 + 1), 200), true, [id=#16]
:     +- LocalTableScan [value#6]
+- *(2) Sort [value#1 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(value#1, 200), true, [id=#17]
      +- LocalTableScan [value#1]

How was this patch tested?

New tests.

…ges.

maropu · 2020-09-08T12:24:39Z

cc: @c21 @imback82

maropu · 2020-09-08T12:25:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

@@ -52,7 +52,8 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
      case (child, distribution) =>
        val numPartitions = distribution.requiredNumPartitions
          .getOrElse(conf.numShufflePartitions)
-        ShuffleExchangeExec(distribution.createPartitioning(numPartitions), child)
+        val newChild = if (child.isInstanceOf[ShuffleExchangeExec]) child.children.head else child


Could you add some comments about why we can remove it?

SparkQA · 2020-09-08T15:09:15Z

Test build #128397 has finished for PR 29677 at commit 1effe75.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-08T19:01:10Z

Test build #128407 has finished for PR 29677 at commit 0da9e82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

c21

Thanks @sarutak for making this change. I have a question whether this optimization should be done on user side or on system side.

EnsureRequirements will add shuffle/sort when it's necessary to add, but will not remove the shuffle/sort added explicitly by users (DISTRIBUTE BY/SORT BY in SQL, repartitionByRange/orderBy in dataframe, etc). Users can choose to remove these repartitionByRange/orderBy in query by themselves to save the shuffle/sort, as they are not necessary to add. E.g. we can have more complicated case if user don't do the right thing: spark.range(1, 100).repartitionByRange(10, $"id".desc).repartitionByRange(10, $"id").orderBy($"id"), should we also handle these cases?

I vaguely remember removing the redundant shuffle exchange/sort explicitly added by users in query, is a won't fix. But I cannot find the old PR now, cc @cloud-fan . Thanks.

sarutak · 2020-09-09T22:42:07Z

@c21 Thanks for the comment.

Users can choose to remove these repartitionByRange/orderBy in query by themselves to save the shuffle/sort, as they are not necessary to add.

Yes, user can choose it but it requires users to understand how Spark and Spark SQL work internally and some distributed computing knowledge beforehand.
Also, if a data processing logic or query is very complex, it will be difficult for users to judge which repartition operations can be removed.
Should Spark hide complexity for users?

E.g. we can have more complicated case if user don't do the right thing: spark.range(1, 100).repartitionByRange(10, $"id".desc).repartitionByRange(10, $"id").orderBy($"id"), should we also handle these cases?

Actually, this case is already handled by CollapseRepartition rule.

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

SparkQA · 2020-09-11T07:05:02Z

Test build #128550 has finished for PR 29677 at commit 5680d48.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2020-09-11T07:10:05Z

retest this please.

SparkQA · 2020-09-11T12:18:53Z

Test build #128557 has finished for PR 29677 at commit 5680d48.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2020-09-25T02:00:56Z

@c21 @imback82 @maropu @HyukjinKwon
Any other feedback for this change?

HyukjinKwon · 2020-09-28T07:12:44Z

retest this please

HyukjinKwon · 2020-09-28T07:58:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

-        ShuffleExchangeExec(distribution.createPartitioning(numPartitions), child)
+        // Like optimizer.CollapseRepartition removes adjacent repartition operations,
+        // adjacent repartitions performed by shuffle can be also removed.
+        val newChild = if (child.isInstanceOf[ShuffleExchangeExec]) child.children.head else child


This reminds me of #26946. cc @cloud-fan, @maryannxue and @stczwd FYI

To avoid the case @HyukjinKwon pointed out above, it seems we need to check if outputPartitioning is the same for narrowing down this optimization scope.

Btw, in my opinion, to avoid complicating the EnsureRequirements rule more, it would be better to remove these kinds of redundant shuffles in a new rule after EnsureRequirements like #27096.

it seems we need to check if outputPartitioning is the same for narrowing down this optimization scope.

Do you mean we should check whether outputPartitioning of ShuffleExchangeExec to be inserted and the one of the existing ShuffleExchangeExec?
If you mean so, it should already match this condition.

Just removing the existing ShuffleExchange and inserting the new ShuffleExchange whose outputPartitioning satisfies the required Distribution works, doesn't it?

Btw, in my opinion, to avoid complicating the EnsureRequirements rule more, it would be better to remove these kinds of redundant shuffles in a new rule after EnsureRequirements like #27096.

Having a new rule sounds better.

Do you mean we should check whether outputPartitioning of ShuffleExchangeExec to be inserted and the one of the existing ShuffleExchangeExec?

I didn't mean so; have you checked #26946? This PR currently removes shuffles incrrectly;

scala> spark.range(1).selectExpr("id as a").write.saveAsTable("test") scala> sql("SELECT /*+ REPARTITION(5) */ * FROM test ORDER BY a").explain() == Physical Plan == *(2) Sort [a#5L ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(a#5L ASC NULLS FIRST, 200), true, [id=#53] +- Exchange RoundRobinPartitioning(5), false, [id=#52] <--- !!! Removed? !!! +- *(1) ColumnarToRow +- FileScan parquet default.test[a#5L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/test], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint>

I have considered such a case but if a shuffle/partitioning performs after its immediate child of another shuffle/partitioning, is the child shuffle/partitioning meaningful?
In the example above, the result is not different regardless of RoundRobinPartitioning removed or not right?

I checked #26946 and I understand that the root cause of that issue was the bug about how to handle hints in the parser so the approach which uses optimization was wrong and incomplete for that issue.

The solutions are similar between this PR and that PR but the issue is different.
This PR just focuses on removing redundant shuffles.

I might misunderstand about what you point out and that PR, please correct me if my understanding is wrong.

Ah, I got your point and it makes sense. Coud you update the PR description? The current one only describes the same output partioning cases.

Yeah, I've updated. Thanks.

SparkQA · 2020-09-28T08:17:37Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33786/

SparkQA · 2020-09-28T08:36:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33786/

SparkQA · 2020-09-28T11:37:44Z

Test build #129171 has finished for PR 29677 at commit 5680d48.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…redundant-shuffleexchange

SparkQA · 2020-10-01T03:57:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33914/

SparkQA · 2020-10-01T04:15:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33914/

SparkQA · 2020-10-01T08:47:41Z

Test build #129298 has finished for PR 29677 at commit e699cb6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-04T22:47:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34003/

SparkQA · 2020-10-04T23:09:01Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34003/

sarutak · 2020-10-04T23:15:14Z

retest this please.

SparkQA · 2020-10-05T00:00:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34004/

SparkQA · 2020-10-05T00:17:18Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34004/

SparkQA · 2020-10-05T02:42:46Z

Test build #129396 has finished for PR 29677 at commit d6e8cbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan]
abstract class JdbcConnectionProvider

tanelk · 2020-10-05T03:18:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/PruneShuffle.scala

+case class PruneShuffle() extends Rule[SparkPlan] {
+
+  override def apply(plan: SparkPlan): SparkPlan = plan.transform {
+    case op @ ShuffleExchangeExec(_, child: ShuffleExchangeExec, _) =>
+      op.withNewChildren(Seq(pruneShuffle(child)))
+    case other => other
+  }
+
+  private def pruneShuffle(plan: SparkPlan): SparkPlan = {
+    plan match {
+      case shuffle: ShuffleExchangeExec =>
+        pruneShuffle(shuffle.child)
+      case other => other
+    }
+  }
+}


I'd propose a more concise way:

Suggested change

case class PruneShuffle() extends Rule[SparkPlan] {

override def apply(plan: SparkPlan): SparkPlan = plan.transform {

case op @ ShuffleExchangeExec(_, child: ShuffleExchangeExec, _) =>

op.withNewChildren(Seq(pruneShuffle(child)))

case other => other

}

private def pruneShuffle(plan: SparkPlan): SparkPlan = {

plan match {

case shuffle: ShuffleExchangeExec =>

pruneShuffle(shuffle.child)

case other => other

}

}

}

case class PruneShuffle() extends Rule[SparkPlan] {

override def apply(plan: SparkPlan): SparkPlan = plan transformUp {

case op @ ShuffleExchangeExec(_, ShuffleExchangeExec(_, grandchild, _), _) =>

op.withNewChildren(grandchild :: Nil)

}

}

Hmm, in the current implementation, at most two ShuffleExchangeExec can be consecutive. But if it is more consecutive in the future, transformUp is not efficient.

TBH performance wise we are talking about millisecond level optimization here. I would value readability over micro optimizations.

Similar to what is discussed here, I'd like to avoid unnecessary transformation.

SparkQA · 2020-10-05T03:43:58Z

Test build #129397 has finished for PR 29677 at commit d6e8cbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan]
abstract class JdbcConnectionProvider

sarutak · 2020-10-28T02:00:12Z

cc: @cloud-fan

SparkQA · 2020-10-28T02:50:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/

SparkQA · 2020-10-28T03:13:47Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/

SparkQA · 2020-10-28T06:35:22Z

Test build #130347 has finished for PR 29677 at commit 95454a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

github-actions · 2021-02-06T00:42:55Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Changed EnsureRequirements to let it removes redundant shuffle exchan…

840ff97

…ges.

probot-autolabeler bot added the SQL label Sep 8, 2020

Removed a blank line.

1effe75

maropu reviewed Sep 8, 2020

View reviewed changes

Added comments.

0da9e82

c21 reviewed Sep 8, 2020

View reviewed changes

HyukjinKwon reviewed Sep 10, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala Outdated Show resolved Hide resolved

sarutak added 2 commits September 11, 2020 13:48

Resolve conflict.

1e3216e

Added JIRA number prefix to the test.

5680d48

HyukjinKwon reviewed Sep 28, 2020

View reviewed changes

sarutak added 3 commits September 29, 2020 14:47

Merge branch 'master' of https://github.com/apache/spark into remove-…

ebdec14

…redundant-shuffleexchange

Added PruneShuffle rule to prune unnecessary shuffles.

23c8b70

Removed unnecessary assertions.

e699cb6

Fix a conflict.

d6e8cbe

tanelk reviewed Oct 5, 2020

View reviewed changes

Resolved conflict.

95454a2

github-actions bot added the Stale label Feb 6, 2021

github-actions bot closed this Feb 7, 2021

[SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements #29677

[SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements #29677

Conversation

sarutak commented Sep 8, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

maropu commented Sep 8, 2020

Choose a reason for hiding this comment

SparkQA commented Sep 8, 2020

SparkQA commented Sep 8, 2020

c21 left a comment

Choose a reason for hiding this comment

sarutak commented Sep 9, 2020 • edited Loading

SparkQA commented Sep 11, 2020

sarutak commented Sep 11, 2020

SparkQA commented Sep 11, 2020

sarutak commented Sep 25, 2020

HyukjinKwon commented Sep 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarutak Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarutak Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Oct 1, 2020

SparkQA commented Oct 1, 2020

SparkQA commented Oct 1, 2020

SparkQA commented Oct 4, 2020

SparkQA commented Oct 4, 2020

sarutak commented Oct 4, 2020

SparkQA commented Oct 5, 2020

SparkQA commented Oct 5, 2020

SparkQA commented Oct 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 5, 2020

sarutak commented Oct 28, 2020

SparkQA commented Oct 28, 2020

SparkQA commented Oct 28, 2020

SparkQA commented Oct 28, 2020

github-actions bot commented Feb 6, 2021

sarutak commented Sep 8, 2020 •

edited

Loading

sarutak commented Sep 9, 2020 •

edited

Loading

sarutak Sep 29, 2020 •

edited

Loading

sarutak Sep 29, 2020 •

edited

Loading