Enable Strings as a supported type for GpuColumnarToRow transitions #5998

amahussein · 2022-07-13T17:16:36Z

Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me

fixes #5633, fixes #5634, fixes #5635, fixes #5636

Only fixed width schema were supported by the GpuColumnarToRow transitions.
This PR adds stringType to the list of supported columns.

Changes:

Addressing points in rapidsai/cudf#10033 (comment)

Added new JCudfUtil. I hope that this Util class will hide all the details of how data types should be aligned in the unsafeRow. This will make code it easier to use the same code as we: estimate size, set offsets, build the unsafeRow
Added String types to the supported transitions.
Updated AcceleratedColumnarToRowIterator and how the packMap is constructed.
Added a new variable to CudfUnsafeRow that represents an initial rough estimate of the unsafeRow. This estimate can be later used to decide if the schema fits the CUDF optimized conversion.
I changed the way plugin figures out when we call convertToRowsFixedWidthOptimized vs convertToRows, because now we might have Strings in the data. Since we already visited the schema to get a row size estimate, we have more accurate information about the width. Also, we know whether it is var Vs. fixed width schema.
added variable width schema to the integration test row_conversion_test.py
added unit-tests to test the correctness of offset calculations.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> fix the type of the dataOffsetTmp Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-13T17:22:07Z

build

revans2

Just some nits. It is looking good

integration_tests/src/main/python/row_conversion_test.py

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala

hyperbolic2346

Looking good to me. Nice to see that it isn't a massive change, but it is dense in parts. Sorry for being so pedantic, I was just writing things down as I went and looking back at the number of comments this appears that I have huge requests for this code, but I really don't feel that way. Thanks for doing this work!

hyperbolic2346 · 2022-07-13T18:46:37Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/CudfUnsafeRow.java

+
+  /**
+   * Calculates the offset of the variable width section.
+   * @return Total bytes used by the fixed width and the validity bytes.


A note about alignment here would be nice. In this case, we are byte aligned at the end of the validity data and we require no alignment, but calling that out would be useful for when we forget and get worried about alignment and look it up in the format spec.

sql-plugin/src/main/java/com/nvidia/spark/rapids/CudfUnsafeRow.java

hyperbolic2346 · 2022-07-13T18:50:22Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuColumnarToRowExec.scala

@@ -111,9 +102,10 @@ class AcceleratedColumnarToRowIterator(
          // most 184 double/long values. Spark by default limits codegen to 100 fields


Are these comments still correct?

hyperbolic2346 · 2022-07-13T18:52:15Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala

+      val cudfColOff = jcudfRowVisitor.getColOffset()
+      val colLength = jcudfRowVisitor.getColLength()
+      val ret = colLength match {
+        // strings return -15


This seems very magical. Why -15? Should this be some sort of define?

refactored. Not needed any more. Util will return the generated-code.

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

hyperbolic2346 · 2022-07-13T19:08:58Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+
+  /**
+   * A helper class to calculate the columns, validity, and variable width offsets.
+   * This helper is used to get an estimate size for the row including variable-sized rows.


Suggested change

* This helper is used to get an estimate size for the row including variable-sized rows.

* This helper is used to get an estimate size for the row including variable-sized data.

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

hyperbolic2346 · 2022-07-13T19:38:52Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+      for (int i = 0; i < attributes.length; i++) {
+        calcColOffset(i);
+      }


This seems odd to throw away the offsets you're calculating from the return value, but now I see the goal is the side-effect in the function of setting varSizeColIndex. I wouldn't have thought much of this had this called setCOlumnOffsets and I would have missed the side-effect entirely. I wonder if something should change around here to better name, indicate the side-effect, or not require offset calculation at all if it has been done before.

done! changed the name to advanceColCursorAndGet

hyperbolic2346 · 2022-07-13T19:40:24Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+      return byteCursor;
+    }
+    private int addVarSizeData(DType rapidsType) {
+      int length = getEstimateSizeForVarLengthTypes(rapidsType);


What happens if this estimate is low? Is this used for buffer allocation?

This handled as described in #5998 (comment)

hyperbolic2346 · 2022-07-13T19:43:54Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+     * @param ind index of the column
+     * @return the offset of the colum.
+     */
+    private int calcColOffset(int ind) {


This seems very familiar. Would it make sense to wrap the above class in a caching class or is it different enough?

I was avoiding inheritance to reduce the overhead. That's why I opted to repeat the code rather than using overrides/interfaces.
In the most recent code commit, the two inner classes have different functionality.

JCudfUtil.RowOffsetsCalculator: is used to calculate the offsets and estimate size for the buffer allocation.

JCudfUtil.RowBuilder: it iterates on the columns to generate the code used to copy the column field.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-15T01:58:05Z

build

amahussein · 2022-07-15T13:51:35Z

I see the following failures in the Pre-merge:

 [2022-07-15T02:36:43.516Z] - test with small input batches *** FAILED ***
[2022-07-15T02:36:43.516Z]   java.lang.AssertionError: End address is too high for copy range src 0x7f2168058510 < 0x7f2168058500
[2022-07-15T02:36:43.516Z]   at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)
[2022-07-15T02:36:43.516Z]   at ai.rapids.cudf.BaseDeviceMemoryBuffer.copyFromHostBuffer(BaseDeviceMemoryBuffer.java:42)
[2022-07-15T02:36:43.516Z]   at ai.rapids.cudf.BaseDeviceMemoryBuffer.copyFromHostBuffer(BaseDeviceMemoryBuffer.java:105)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createNestedColumnVector(ColumnView.java:4470)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createNewNestedColumnVector(ColumnView.java:4396)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createColumnVector(ColumnView.java:4334)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.HostColumnVector.copyToDevice(HostColumnVector.java:220)
[2022-07-15T02:36:43.517Z]   at com.nvidia.spark.rapids.InternalRowToColumnarBatchIterator.next(InternalRowToColumnarBatchIterator.java:168)
[2022-07-15T02:36:43.517Z]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificInternalRowToColumnarBatchIterator.next(Unknown Source)
[2022-07-15T02:36:43.517Z]   at com.nvidia.spark.rapids.CollectTimeIterator.$anonfun$next$1(GpuExec.scala:199)
[2022-07-15T02:36:43.517Z]   ...
[2022-07-15T02:36:43.517Z] - test multiple output batches *** FAILED ***
[2022-07-15T02:36:43.517Z]   java.lang.AssertionError: End address is too high for copy range src 0x7f2264f04c30 < 0x7f2264f04c20
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.BaseDeviceMemoryBuffer.copyFromHostBuffer(BaseDeviceMemoryBuffer.java:42)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.BaseDeviceMemoryBuffer.copyFromHostBuffer(BaseDeviceMemoryBuffer.java:105)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createNestedColumnVector(ColumnView.java:4470)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createNewNestedColumnVector(ColumnView.java:4396)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.ColumnView$NestedColumnVector.createColumnVector(ColumnView.java:4334)
[2022-07-15T02:36:43.517Z]   at ai.rapids.cudf.HostColumnVector.copyToDevice(HostColumnVector.java:220)
[2022-07-15T02:36:43.517Z]   at com.nvidia.spark.rapids.InternalRowToColumnarBatchIterator.next(InternalRowToColumnarBatchIterator.java:168)
[2022-07-15T02:36:43.517Z]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificInternalRowToColumnarBatchIterator.next(Unknown Source)
[2022-07-15T02:36:43.517Z]   at com.nvidia.spark.rapids.CollectTimeIterator.$anonfun$next$1(GpuExec.scala:199)
[2022-07-15T02:36:43.517Z]   ...
[2022-07-15T02:36:43.775Z] - require single batch
[2022-07-15T02:36:43.775Z] munmap_chunk(): invalid pointer

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-21T04:03:45Z

build

revans2 · 2022-07-21T13:35:04Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+        , "  long cudfDstAddress = startAddress + dataDstOffset;"
+        , "  long newDataOffset = cudfDstAddress + strSize;"
+        , "  if (newDataOffset > bufferEndAddress) {"
+        , "    throw new java.lang.RuntimeException("


As it is now any time we try to copy a String that does not fit, this will throw an exception. I am not okay with that because there can be many cases where we want to copy more than one batch to the GPU. I am fine with throwing an exception if the first batch cannot fit, but we do need a way to detect that later rows don't fit and just go on.

Thanks Bobby!
I modified the behavior to handle the corner case.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-21T19:54:43Z

Reasons of old failure in GpuCoalesceBatchesSuite:

InternalRowToColumnarBatchIterator is initialized with small goalTargetSize = 1 byte
the dataLength is set to be the estimate of the row size (20 bytes for string).
The crash happens when one row exceeds that size, and we cannot fit it into the dataBuffer

The fix:

The following changes fixes the unit tests.

modify the generated method copyUTF8StringInto to return
negative value if the strData excceds the endAddress
modify the method copyInto(UnsafeRow input, long startAddress, long endAddress) to return
negative value if copyUTF8StringInto return negative value.
inside InternalRowToColumnarBatchIterator.next(), check that no rows were copied, and that there
is still pending row to copy. In that case throw an exception.
handle the exception by increasing the dataLEngth.
retry.

batchDone = false;
int retryCount = 0;
while (!batchDone) {
   try (HostMemoryBuffer dataBuffer = HostMemoryBuffer.allocate(dataLength);
        HostMemoryBuffer offsetsBuffer =
           HostMemoryBuffer.allocate(((long)numRowsEstimate + 1) * BYTES_PER_OFFSET)) {
     int[] used = fillBatch(dataBuffer, offsetsBuffer);
     int dataOffset = used[0];
     int currentRow = used[1];
     // if we fail to copy at least one row then we need to throw an exception
     if (currentRow == 0 && pending != null) {
       throw new BufferOverflowException();
     }
     batchDone = true
     if (retryCount > 0) {
       // restore the original dataLength and numRowsEstimate
       // so that we do not continue using outliers.
     }
     // rest code to create and copy CV here
     ....
     ...
   } catch (BufferOverflowException ex) { // batch does not fit a single row
     // increase dataLength by 25%
     int newDataLength = Math.Min(Integer.MAX_VALUE, (dataLength * 125) / 100));
     if (newDataLength <= dataLength) { // we already reached the limit
       throw RuntimeException(ex);
     }
     dataLength = newDataLength;
     numRowsEstimate = //calculate new value;
     retryCount++;
   }
}

@revans2 and @hyperbolic2346 let me know if you are fine with the new fix.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-21T20:06:36Z

build

hyperbolic2346 · 2022-07-21T20:36:19Z

InternalRowToColumnarBatchIterator is initialized with small goalTargetSize = 1 byte

Why wouldn't we start with a target size of a single page of memory? Do we really need to get it down to the byte if we increase it by 25% each time?

the dataLength is set to be the estimate of the row size (20 bytes for string).

Is this a valid assumption for strings? cudf limits a single column to 2 gigs, but that could be a single row. Further, there could be any number of string columns in a table. This limits a single row to ~50% of the gpu memory available before we are unable to convert it due to the double allocation, etc. If we assume 20 bytes per string and the string is even remotely large, adding 25% each iteration will take a long time to get up to large enough.

modify the generated method copyUTF8StringInto to return
negative value if the strData excceds the endAddress

Can the failure case return the number of bytes required so we can better tune the allocation?

modify the method copyInto(UnsafeRow input, long startAddress, long endAddress) to return
negative value if copyUTF8StringInto return negative value.

Would this process have to repeat for each string then? Say we have a row with 10 strings that are each 1 meg. Would we have to call into it 10 times in the best case where we fail once and are able to resize to a large enough buffer to hold the failed allocation?

hyperbolic2346

Getting there for sure. These are just nits and questions.

hyperbolic2346 · 2022-07-22T16:31:36Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

+        long newRowSizeEst = dataLength << 1 ;
+        newRowSizeEst = Math.min(newRowSizeEst, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);


Suggested change

long newRowSizeEst = dataLength << 1 ;

newRowSizeEst = Math.min(newRowSizeEst, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);

long newRowSizeEst = Math.min(dataLength << 1, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);

It looks odd to me to write to newRowSizeEst and then immediately assign to it again. This got me looking more at this code and now I have questions too.

So we have a buffer of size dataLength and it is too small, so we double the buffer size in newRowSizeEst and then set it to the min of that doubled buffer size and a max. Then we compare the new size we want to make the buffer to the old buffer size and if it is smaller or the same, we bail. The only time this could happen is if dataLength is currently JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH, right? If that is the case, why won't we just check that? This way seems a little convoluted.

if (dataLength == JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH) { // this won't work... // double buffer size and try again long newRowSizeEst = Math.min(dataLength << 1, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);

Ok sure. Easy fix.

hyperbolic2346 · 2022-07-22T16:34:12Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

@@ -62,7 +62,8 @@ public final class JCudfUtil {
  /**
   * The maximum buffer size allocated to copy JCudf row.
   */
-  public static final long JCUDF_MAX_DATA_BUFFER_LENGTH = Integer.MAX_VALUE;
+  public static final long JCUDF_MAX_DATA_BUFFER_LENGTH =
+      Integer.MAX_VALUE - (JCUDF_ROW_ALIGNMENT - 1);


Why are we using Integet.MAX_VALUE as a limit? It is a limit on a cudf column due to cudf::size_type being a signed int, but is there a reason to limit the row here to an int? I'm not against this, just ensuring it isn't being coupled unnecessarily to cudf::size_type.

During transitions, the rows are 8-bytes aligned. So, Integer.MAX_VALUE - 7 is 8 bytes aligned.
Probably I should rename this to a JCUDF_MAX_ROW_SIZE_LENGTH because the bufferSize which is multiple rows can be larger.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-22T17:28:36Z

build

amahussein · 2022-07-22T19:27:28Z

Failure in CI is unrelated and being reported in #6054

jlowe · 2022-07-25T18:57:55Z

build

sameerz · 2022-07-26T04:52:52Z

build

hyperbolic2346

Looking good. Most of my comments are about comments, so I think we can declare this very close.

integration_tests/src/main/python/row_conversion_test.py

sql-plugin/src/main/java/com/nvidia/spark/rapids/CudfUnsafeRow.java

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

hyperbolic2346 · 2022-07-26T18:42:39Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

+      return attributes.length > 0 && varSizeColIndex != attributes.length;
+    }
+
+    public int getValidityBytesOffset() {


I'm noticing some new functions without comments. I'm not sure the comment on function policy of the java code though, so I'm just going to comment that I see it. :)

….java Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

….java Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2022-07-27T04:35:06Z

buil

amahussein · 2022-07-27T04:35:43Z

build

hyperbolic2346

Tiniest of nits left, I'm happy with how this has turned out. Thank you for humoring me through so many tiny changes.

…itions (NVIDIA#5998)" This reverts commit 452e7ba. Signed-off-by: Jason Lowe <jlowe@nvidia.com>

…itions (#5998)" (#6367) This reverts commit 452e7ba. Signed-off-by: Jason Lowe <jlowe@nvidia.com> Signed-off-by: Jason Lowe <jlowe@nvidia.com>

amahussein added 8 commits July 13, 2022 11:32

Enable Strings as a supported type for GpuColumnarToRow transitions

61ec025

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

add codeGen and move all code to the Util class

51ee98d

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> fix the type of the dataOffsetTmp Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

add unitTests

42955c7

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

add null check to the string copy

429aa08

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

fix a bug in calculating string offsets

3136197

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

add unit test for jcudf size estimator

5d49093

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

adjust the packing of the columns

02b27b4

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

fix bug in string code-gen and add integration tests

8675b63

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein added feature request New feature or request performance A performance related task/issue labels Jul 13, 2022

amahussein requested review from revans2 and hyperbolic2346 July 13, 2022 17:16

amahussein self-assigned this Jul 13, 2022

revans2 reviewed Jul 13, 2022

View reviewed changes

hyperbolic2346 reviewed Jul 13, 2022

View reviewed changes

amahussein added 2 commits July 14, 2022 20:32

fix code reviews

7698f6b

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Merge branch 'branch-22.08' into rapids-5633

2e30360

amahussein added 2 commits July 20, 2022 22:19

fix failures in UT GpuCoalesceBatchesSuite

ca4f906

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Merge branch 'branch-22.08' into rapids-5633

7530509

revans2 reviewed Jul 21, 2022

View reviewed changes

add code to retry allocating row if not fit

5b8d399

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

remove err.println message used for debugging

bd3da00

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

revans2 previously approved these changes Jul 22, 2022

View reviewed changes

hyperbolic2346 requested changes Jul 22, 2022

View reviewed changes

rename max-row_size and rearrange reaching limit

15c00a8

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein dismissed revans2’s stale review via 15c00a8 July 22, 2022 17:21

amahussein mentioned this pull request Jul 22, 2022

[FEA] GPU columnar-row transition performance evaluation and tuning #6065

Open

hyperbolic2346 requested changes Jul 26, 2022

View reviewed changes

amahussein and others added 9 commits July 26, 2022 21:57

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/CudfUnsafeRow…

49b884e

….java Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

5ca64f4

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update integration_tests/src/main/python/row_conversion_test.py

3c3cec5

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/CudfUnsafeRow…

10d251c

….java Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

6505811

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

2b5266f

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update sql-plugin/src/main/java/com/nvidia/spark/rapids/JCudfUtil.java

5e8facb

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

fix pr-feedback

a608a8d

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Merge branch 'branch-22.08' into rapids-5633

fa17ae7

hyperbolic2346 approved these changes Jul 27, 2022

View reviewed changes

amahussein merged commit 452e7ba into NVIDIA:branch-22.08 Jul 27, 2022

jlowe added a commit to jlowe/spark-rapids that referenced this pull request Aug 18, 2022

Revert "Enable Strings as a supported type for GpuColumnarToRow trans…

907ac7f

…itions (NVIDIA#5998)" This reverts commit 452e7ba. Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe mentioned this pull request Aug 19, 2022

Revert "Enable Strings as a supported type for GpuColumnarToRow transitions" #6367

Merged

		@@ -111,9 +102,10 @@ class AcceleratedColumnarToRowIterator(
		// most 184 double/long values. Spark by default limits codegen to 100 fields

	* This helper is used to get an estimate size for the row including variable-sized rows.
	* This helper is used to get an estimate size for the row including variable-sized data.

		long newRowSizeEst = dataLength << 1 ;
		newRowSizeEst = Math.min(newRowSizeEst, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);

	long newRowSizeEst = dataLength << 1 ;
	newRowSizeEst = Math.min(newRowSizeEst, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);
	long newRowSizeEst = Math.min(dataLength << 1, JCudfUtil.JCUDF_MAX_DATA_BUFFER_LENGTH);

Enable Strings as a supported type for GpuColumnarToRow transitions #5998

Enable Strings as a supported type for GpuColumnarToRow transitions #5998

Conversation

amahussein commented Jul 13, 2022

amahussein commented Jul 13, 2022

revans2 left a comment

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein commented Jul 15, 2022

amahussein commented Jul 15, 2022 • edited Loading

amahussein commented Jul 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein commented Jul 21, 2022

amahussein commented Jul 21, 2022

hyperbolic2346 commented Jul 21, 2022 • edited Loading

hyperbolic2346 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein commented Jul 22, 2022

amahussein commented Jul 22, 2022

jlowe commented Jul 25, 2022

sameerz commented Jul 26, 2022

hyperbolic2346 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein commented Jul 27, 2022

amahussein commented Jul 27, 2022

hyperbolic2346 left a comment

Choose a reason for hiding this comment

amahussein commented Jul 15, 2022 •

edited

Loading

hyperbolic2346 commented Jul 21, 2022 •

edited

Loading