A Spark driver (aka an application’s driver process) is a JVM process that hosts SparkContext for a Spark application.
It is the cockpit of jobs and tasks execution (using DAGScheduler and Task Scheduler). It hosts Web UI for the environment.
It splits a Spark application into tasks and schedules them to run on executors.
A driver is where the task scheduler lives and spawns tasks across workers.
A driver coordinates workers and overall execution of tasks.
Note
|
Spark shell is a Spark application and the driver. It creates a SparkContext that is available as sc .
|
Driver requires the additional services (beside the common ones like ShuffleManager, MemoryManager, BlockTransferService, BroadcastManager, CacheManager):
-
Listener Bus
-
MapOutputTrackerMaster with the name MapOutputTracker
-
BlockManagerMaster with the name BlockManagerMaster
-
MetricsSystem with the name driver
-
OutputCommitCoordinator with the endpoint’s name OutputCommitCoordinator
Caution
|
FIXME Diagram of RpcEnv for a driver (and later executors). Perhaps it should be in the notes about RpcEnv? |
-
High-level control flow of work
-
Your Spark application runs as long as the Spark driver.
-
Once the driver terminates, so does your Spark application.
-
-
Creates
SparkContext
, `RDD’s, and executes transformations and actions -
Launches tasks
It can be set first using spark-submit’s --driver-memory
command-line option or spark.driver.memory and falls back to SPARK_DRIVER_MEMORY if not set earlier.
Note
|
It is printed out to the standard error output in spark-submit’s verbose mode. |
It can be set first using spark-submit’s --driver-cores
command-line option for cluster
deploy mode.
Note
|
In client deploy mode the driver’s memory corresponds to the memory of the JVM process the Spark application runs on.
|
Note
|
It is printed out to the standard error output in spark-submit’s verbose mode. |
spark.driver.appUIAddress
is only used in Spark on YARN. It is set when YarnClientSchedulerBackend starts to run ExecutorLauncher (and register ApplicationMaster for the Spark application).
spark.driver.extraClassPath
is an optional setting that is used to…FIXME
spark.driver.cores
(default: 1
) sets the number of CPU cores assigned for the driver in cluster deploy mode.
Note
|
When Client is created (for Spark on YARN in cluster mode only), it sets the number of cores for ApplicationManager using spark.driver.cores .
|
Read Driver’s Cores for a closer coverage.
spark.driver.memory
(default: 1g
) sets the driver’s memory size (in MiBs).
Read Driver’s Memory for a closer coverage.