[FEA] Nonblocking API to acquire GpuSemaphore #10261

sperlingxx · 2024-01-24T11:07:33Z

Is your feature request related to a problem? Please describe.

It would be great if GpuSemaphore supports nonblocking API like

def tryAcquireIfNecessary(context: TaskContext): Boolean

The nonblocking API will be very useful to determine if it is good time to offload some work to the host side.

For instance, we can decode (and decompress) Parquet file buffers on the host side if there is no available GPU resource at that moment .(I am working on the PoC of this feature.) The entrance of demo implementation looks like :

    val hostSideRead = if (enableHostSideRead) {
      !GpuSemaphore.tryAcquireIfNecessary(TaskContext.get())
    } else {
      GpuSemaphore.acquireIfNecessary(TaskContext.get())
      false
    }
    RmmRapidsRetryIterator.withRetry(hostBuffer, splitBatchSizePolicy) { _ =>
      // The MakeParquetTableProducer will close the input buffer, and that would be bad
      // because we don't want to close it until we know that we are done with it
      hostBuffer.incRefCount()

      val tableReader = if (hostSideRead) {
        new VectorizedParquetGpuProducer(conf, currentTargetBatchSize.toInt,
          hostBuffer, 0, dataSize, metrics,
          dateRebaseMode, timestampRebaseMode, hasInt96Timestamps,
          clippedSchema, readDataSchema)
      } else {
        MakeParquetTableProducer(useChunkedReader, conf, currentTargetBatchSize,
          parseOpts,
          hostBuffer, 0, dataSize, metrics,
          dateRebaseMode, timestampRebaseMode, hasInt96Timestamps,
          isSchemaCaseSensitive, useFieldId, readDataSchema, clippedSchema, files,
          debugDumpPrefix, debugDumpAlways)
      }

      val batchIter = CachedGpuBatchIterator(tableReader, colTypes)

      if (allPartValues.isDefined) {
        val allPartInternalRows = allPartValues.get.map(_._2)
        val rowsPerPartition = allPartValues.get.map(_._1)
        new GpuColumnarBatchWithPartitionValuesIterator(batchIter, allPartInternalRows,
          rowsPerPartition, partitionSchema, maxGpuColumnSizeBytes)
      } else {
        // this is a bit weird, we don't have number of rows when allPartValues isn't
        // filled in so can't use GpuColumnarBatchWithPartitionValuesIterator
        batchIter.flatMap { batch =>
          // we have to add partition values here for this batch, we already verified that
          // its not different for all the blocks in this batch
          BatchWithPartitionDataUtils.addSinglePartitionValueToBatch(batch,
            partedFile.partitionValues, partitionSchema, maxGpuColumnSizeBytes)
        }
      }
    }.flatten

https://github.com/sperlingxx/spark-rapids/tree/cpu_parquet_decomp

The text was updated successfully, but these errors were encountered:

sperlingxx added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 24, 2024

sperlingxx assigned sperlingxx, revans2 and winningsix and unassigned sperlingxx Jan 24, 2024

revans2 mentioned this issue Jan 30, 2024

Add tryAcquire to GpuSemaphore #10330

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 30, 2024

revans2 closed this as completed Jan 31, 2024

sameerz added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed feature request New feature or request labels Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Nonblocking API to acquire GpuSemaphore #10261

[FEA] Nonblocking API to acquire GpuSemaphore #10261

sperlingxx commented Jan 24, 2024

[FEA] Nonblocking API to acquire GpuSemaphore #10261

[FEA] Nonblocking API to acquire GpuSemaphore #10261

Comments

sperlingxx commented Jan 24, 2024