Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG] Spark 3.3 IT test cache_test.py::test_passing_gpuExpr_as_Expr fails with IllegalArgumentException #4931

Closed
tgravescs opened this issue Mar 10, 2022 · 1 comment · Fixed by #4926
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
Spark 3.3 integration test build fails:

 FAILED ../../src/main/python/cache_test.py::test_passing_gpuExpr_as_Expr[{'spark.sql.inMemoryColumnarStorage.enableVectorizedReader': 'true'}][ALLOW_NON_GPU(CollectLimitExec)]
11:36:44  FAILED ../../src/main/python/cache_test.py::test_passing_gpuExpr_as_Expr[{'spark.sql.inMemoryColumnarStorage.enableVectorizedReader': 'false'}][ALLOW_NON_GPU(CollectLimitExec)]

 Caused by: java.lang.IllegalArgumentException: For input string: "null"
11:36:44  E                   	at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330)
11:36:44  E                   	at scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289)
11:36:44  E                   	at scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289)
11:36:44  E                   	at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33)
11:36:44  E                   	at org.apache.spark.sql.execution.datasources.parquet.SparkToParquetSchemaConverter.<init>(ParquetSchemaConverter.scala:455)
11:36:44  E                   	at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:114)
11:36:44  E                   	at com.nvidia.spark.rapids.shims.ParquetOutputFileFormat.getRecordWriter(ParquetCachedBatchSerializer.scala:1505)
11:36:44  E                   	at com.nvidia.spark.rapids.shims.ParquetCachedBatchSerializer$CachedBatchIteratorProducer$InternalRowToCachedBatchIterator.$anonfun$next$1(ParquetCachedBatchSerializer.scala:1247)
11:36:44  E                   	at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
11:36:44  E                   	at com.nvidia.spark.rapids.shims.ParquetCachedBatchSerializer$CachedBatchIteratorProducer$InternalRowToCachedBatchIterator.next(ParquetCachedBatchSerializer.scala:1247)

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Mar 10, 2022
@firestarman firestarman self-assigned this Mar 11, 2022
@sameerz sameerz added this to the Feb 28 - Mar 18 milestone Mar 11, 2022
@firestarman
Copy link
Collaborator

This is due to missing the parquet field ID setting in the Configuration for parquet writing in PCBS.

Added this in PR #4926

Details is here https://github.com/NVIDIA/spark-rapids/pull/4926/files#diff-d2170a624b05030a6b93a827792ae5ee35d9d870bab6e86823cc4264f32bee47R1442

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Mar 15, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants