Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable testing parquet with zstd for spark releases 3.2.0 and later #5898

Merged
merged 1 commit into from
Jun 24, 2022

Conversation

jbrennan333
Copy link
Contributor

Signed-off-by: Jim Brennan jimb@nvidia.com

Closes #5580

Spark releases starting with 3.2.0 include support for zstd compression without requiring any additional jars/libs. Now that we have zstd decompression support in cuDF, we should add zstd to the list of compressors to use in test_parquet_compress_read_round_trip.

Signed-off-by: Jim Brennan <jimb@nvidia.com>
@jbrennan333
Copy link
Contributor Author

I have verified this with spark-3.1.3, spark-3.2.0, and spark-3.2.1.

@jbrennan333
Copy link
Contributor Author

build

@jlowe jlowe added this to the Jun 20 - Jul 8 milestone Jun 23, 2022
@jlowe jlowe added the test Only impacts tests label Jun 23, 2022
@jbrennan333
Copy link
Contributor Author

I'm not sure what failed here. @tgravescs should I rerun tests?

@jlowe
Copy link
Contributor

jlowe commented Jun 24, 2022

It failed in the Scala unit tests:

22/06/23 20:27:33.626 dispatcher-event-loop-1 INFO ExecutorPluginContainer: Exception while shutting down plugin com.nvidia.spark.SQLPlugin.
ai.rapids.cudf.RmmException: Could not shut down RMM there appear to be outstanding allocations
	at ai.rapids.cudf.Rmm.shutdown(Rmm.java:219)
	at ai.rapids.cudf.Rmm.shutdown(Rmm.java:179)
	at com.nvidia.spark.rapids.GpuDeviceManager$.shutdown(GpuDeviceManager.scala:146)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.shutdown(Plugin.scala:285)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$shutdown$4(PluginContainer.scala:144)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$shutdown$4$adapted(PluginContainer.scala:141)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.shutdown(PluginContainer.scala:141)
	at org.apache.spark.executor.Executor.$anonfun$stop$4(Executor.scala:332)
	at org.apache.spark.executor.Executor.$anonfun$stop$4$adapted(Executor.scala:332)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:332)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.stop(Executor.scala:332)
	at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receiveAndReply$1.applyOrElse(LocalSchedulerBackend.scala:83)
	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
	at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

[...]

22/06/23 20:27:33.710 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-06-23T19:54:14Z, revision=64fbd7de41d37016e9a7014732be3d97b7bbeecf, cudf_version=22.08.0-SNAPSHOT, branch=HEAD}
22/06/23 20:27:33.710 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: cudf build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/rapidsai/cudf.git, date=2022-06-23T02:28:56Z, revision=31ad35c583fad22d6b976af7e1990df50efd7bc7, branch=HEAD}
22/06/23 20:27:33.711 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: Initializing memory from Executor Plugin
22/06/23 20:27:33.711 ScalaTest-main-running-GpuDeviceManagerSuite ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.IllegalStateException: Cannot initialize memory due to previous shutdown failing
	at com.nvidia.spark.rapids.GpuDeviceManager$.initializeMemory(GpuDeviceManager.scala:327)
	at com.nvidia.spark.rapids.GpuDeviceManager$.initializeGpuAndMemory(GpuDeviceManager.scala:137)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:232)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199)
	at org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:253)
	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2678)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:942)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:936)
	at com.nvidia.spark.rapids.TestUtils$.withGpuSparkSession(TestUtils.scala:126)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.$anonfun$new$3(GpuDeviceManagerSuite.scala:50)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
	at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
	at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
	at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
	at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
	at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
	at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.org$scalatest$BeforeAndAfter$$super$runTest(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
	at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.runTest(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
	at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
	at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
	at org.scalatest.Suite.run(Suite.scala:1147)
	at org.scalatest.Suite.run$(Suite.scala:1129)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
	at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
	at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.org$scalatest$BeforeAndAfter$$super$run(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
	at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.run(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
	at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at org.scalatest.Suite.runNestedSuites(Suite.scala:1255)
	at org.scalatest.Suite.runNestedSuites$(Suite.scala:1189)
	at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)
	at org.scalatest.Suite.run(Suite.scala:1144)
	at org.scalatest.Suite.run$(Suite.scala:1129)
	at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1346)
	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1340)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1340)
	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:1031)
	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:1010)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1506)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
	at org.scalatest.tools.Runner$.main(Runner.scala:827)
	at org.scalatest.tools.Runner.main(Runner.scala)

Looks like there's a unit test that may have a leak in it somewhere. Does not appear to be related to this PR, since it doesn't modify the unit tests.

@jlowe
Copy link
Contributor

jlowe commented Jun 24, 2022

build

@tgravescs tgravescs changed the title Enable testing zstd for spark releases 3.2.0 and later Enable testing parquet with zstd for spark releases 3.2.0 and later Jun 24, 2022
@jbrennan333 jbrennan333 merged commit 321f9b9 into NVIDIA:branch-22.08 Jun 24, 2022
jbrennan333 added a commit to jbrennan333/spark-rapids that referenced this pull request Jun 27, 2022
…A#5898)"

This reverts commit 321f9b9.

Signed-off-by: Jim Brennan <jimb@nvidia.com>
jbrennan333 added a commit that referenced this pull request Jun 27, 2022
This reverts commit 321f9b9.

Signed-off-by: Jim Brennan <jimb@nvidia.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Enable zstd integration tests for parquet and orc
3 participants