Improve performance of Sort for the common single batch use case #10572

revans2 · 2024-03-12T15:10:44Z

This fixes #10570

The performance speed up is not that huge, but it is there.

I ran

spark.time(spark.range(0, 100000000L, 1, 12).selectExpr("id as oc2", "id DIV 10 as oc3", "CAST(id * 10 AS STRING) as oc", "CAST(id % 4 AS STRING) as pc", "id").selectExpr("*", "row_number() over (PARTITION BY pc ORDER BY oc, oc2, oc3) as rn", "max(id) OVER (PARTITION BY pc ORDER BY oc, oc2, oc3 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as m").orderBy(desc("oc")).show())

Both with this patch and without it. I captured the metric for sort time and the total run time of the query.

With this patch, on my desktop, the median run time was 6968 ms, and from the Spark UI the median sort time was 5.5 seconds.
Without this patch the runtime was 7051 and the sort time was 5.9 seconds. That saves 83 ms (about 1% which is not really that huge), but the sort time metric showed about 0.4 seconds of savings or about 6% less time, which is a lot better.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2024-03-12T15:10:54Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

gerashegalov · 2024-03-12T21:35:57Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

+    val spillableIter = iter.flatMap { cb =>
+        // Filter out empty batches and make them spillable
+        if (cb.numRows() > 0) {
+          Some(SpillableColumnarBatch(cb, SpillPriorities.ACTIVE_ON_DECK_PRIORITY))
+        } else {
+          cb.close()
+          None
+        }
+    }


If single batch is the common case it does not seem to matter but if we needed to save intermediate Option generation we could use an explicit PartialFunction instance:

Suggested change

val spillableIter = iter.flatMap { cb =>

// Filter out empty batches and make them spillable

if (cb.numRows() > 0) {

Some(SpillableColumnarBatch(cb, SpillPriorities.ACTIVE_ON_DECK_PRIORITY))

} else {

cb.close()

None

}

}

val spillableIter = iter.collect {

// Filter out empty batches and make them spillable

new PartialFunction[ColumnarBatch, SpillableColumnarBatch] {

override def isDefinedAt(cb: ColumnarBatch): Boolean = if (cb.numRows() > 0) {

true

} else {

cb.close()

false

}

override def apply(cb: ColumnarBatch): SpillableColumnarBatch =

SpillableColumnarBatch(cb, SpillPriorities.ACTIVE_ON_DECK_PRIORITY)

}

}

I'll keep it in mind, but I'm not sure it matters that much here.

gerashegalov · 2024-03-12T21:40:13Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

+        if (cb.numRows() > 0) {
+          Some(SpillableColumnarBatch(cb, SpillPriorities.ACTIVE_ON_DECK_PRIORITY))
+        } else {
+          cb.close()


What if this throws?

When someone calls next the exception is thrown and they would need to handle it.

https://github.com/scala/scala/blob/1bdf362b0d8344927460116380b67fcaa766435b/src/library/scala/collection/Iterator.scala#L587-L620

gerashegalov · 2024-03-12T21:41:33Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

+   */
+  private final def firstPassReadBatches(scb: SpillableColumnarBatch): Unit = {
+    splitOneSortedBatch(scb)
+    while(alreadySortedIter.hasNext) {


nit: space

Suggested change

while(alreadySortedIter.hasNext) {

while (alreadySortedIter.hasNext) {

revans2 · 2024-03-13T13:16:31Z

build

Improve performance of Sort for the common single batch use case

0b7ff9d

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

sameerz added the performance A performance related task/issue label Mar 12, 2024

jlowe reviewed Mar 12, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala Show resolved Hide resolved

gerashegalov reviewed Mar 12, 2024

View reviewed changes

Review Comments

efe57cb

jlowe approved these changes Mar 13, 2024

View reviewed changes

gerashegalov approved these changes Mar 13, 2024

View reviewed changes

revans2 merged commit 9105fd7 into NVIDIA:branch-24.04 Mar 13, 2024
43 checks passed

revans2 deleted the sort_better_expr branch March 13, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Sort for the common single batch use case #10572

Improve performance of Sort for the common single batch use case #10572

revans2 commented Mar 12, 2024

revans2 commented Mar 12, 2024

gerashegalov Mar 12, 2024

revans2 Mar 13, 2024

gerashegalov Mar 12, 2024

revans2 Mar 12, 2024

gerashegalov Mar 12, 2024

revans2 commented Mar 13, 2024

	while(alreadySortedIter.hasNext) {
	while (alreadySortedIter.hasNext) {

Improve performance of Sort for the common single batch use case #10572

Improve performance of Sort for the common single batch use case #10572

Conversation

revans2 commented Mar 12, 2024

revans2 commented Mar 12, 2024

gerashegalov Mar 12, 2024

Choose a reason for hiding this comment

revans2 Mar 13, 2024

Choose a reason for hiding this comment

gerashegalov Mar 12, 2024

Choose a reason for hiding this comment

revans2 Mar 12, 2024

Choose a reason for hiding this comment

gerashegalov Mar 12, 2024

Choose a reason for hiding this comment

revans2 commented Mar 13, 2024