Skip to content

TiSpark FAQ

shiyuhang0 edited this page Sep 2, 2022 · 7 revisions

FAQ

Q: What are the pros and cons of independent deployment as opposed to a shared resource with an existing Spark / Hadoop cluster?

A: You can use the existing Spark cluster without a separate deployment, but if the existing cluster is busy, TiSpark will not be able to achieve the desired speed.

Q: Can I mix Spark with TiKV?

A: If TiDB and TiKV are overloaded and run critical online tasks, consider deploying TiSpark separately.

You also need to consider using different NICs to ensure that OLTP's network resources are not compromised so that online business is not affected.

If the online business requirements are not high or the loading is not large enough, you can mix TiSpark with TiKV deployment.

Q: How to use PySpark with TiSpark?

A: Follow TiSpark on PySpark.

Error

[Error] NoSuchDatabaseException when upgrade to TiSpark 3.x

With TiSpark 3.x, you must to specify catalog.

//1. with catalog prefix
spark.sql("select from tidb_catalog.$database.table")

//2. use catalog
spark.sql("use tidb_catalog")
spark.sql("select from $database.table")

https://github.com/pingcap/tispark/wiki/Getting-Started#use-with-spark_catalog

[Error] java.lang.NoSuchMethodError: scala.Function1.$init$(Lscala/Function1;)V

Check the scala version and choose the right version of TiSpark image https://github.com/pingcap/tispark/wiki/Getting-TiSpark#choose-the-version-of-tispark

[Error]Batch scan are not supported / Table does not support reads

It may occur when you forget to configure the following configuration

spark.sql.extensions  org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses  ${your_pd_adress}

Netty OutOfDirectMemoryError

Netty's PoolThreadCache may hold some unused memory, which may cause the following error.

Caused by: shade.io.netty.handler.codec.DecoderException: shade.io.netty.util.internal.OutOfDirectMemoryError

The following configurations can be used to avoid the error.

--conf "spark.driver.extraJavaOptions=-Dshade.io.netty.allocator.type=unpooled"
--conf "spark.executor.extraJavaOptions=-Dshade.io.netty.allocator.type=unpooled"

Chinese characters are garbled

The following configurations can be used to avoid the garbled chinese characters problem.

--conf "spark.driver.extraJavaOptions=-Dfile.encoding=UTF-8"
--conf "spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8"

GRPC message exceeds maximum size error

The maximum message size of GRPC java lib is 2G. The following error will be thrown if there is a huge region in TiKV whose size is more than 2G.

Caused by: shade.io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 2147483647

Use SHOW TABLE [table_name] REGIONS [WhereClauseOptional] to check whether there is a huge region in TiKV.

Others

How to upgrade from Spark 2.1 to Spark 2.3/2.4

For the users of Spark 2.1 who wish to upgrade to the latest TiSpark version on Spark 2.3/2.4, download or install Spark 2.3+/2.4+ by following the instructions on Apache Spark Site and overwrite the old spark version in $SPARK_HOME.

Clone this wiki locally