-
Notifications
You must be signed in to change notification settings - Fork 247
Getting Started
shiyuhang0 edited this page May 7, 2022
·
16 revisions
TiSpark is a third-party jar package for Spark that provides the ability to read/write TiKV
The latest version of TiSpark is 2.5.0. You can get TiSpark jar from maven central
TiSpark version | TiDB、TiKV、PD version | Spark version | Scala version |
---|---|---|---|
2.4.x-scala_2.11 | 5.x,4.x | 2.3.x,2.4.x | 2.11 |
2.4.x-scala_2.12 | 5.x,4.x | 2.4.x | 2.12 |
2.5.x | 5.x,4.x | 3.0.x,3.1.x | 2.12 |
Take the use of spark-shell for example
TO use Tispark in Spark shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
spark.sql.catalog.tidb_catalog.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
You can use Spark SQL to read from TiKV
spark.sql("use tidb_catalog")
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root",
"spark.tispark.pd.addresses" -> "127.0.0.1:2379"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.
You can use Spark SQL to delete from TiKV (Tispark master support)
spark.sql("use tidb_catalog")
spark.sql("delete from ${database}.${table} where xxx").show
See here for more details.
Take the use of spark-shell for example
TO use Tispark in Spark shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
spark.sql.catalog.tidb_catalog.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
You can use Spark SQL to read from TiKV
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root",
"spark.tispark.pd.addresses" -> "127.0.0.1:2379"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.