Skip to content

Latest commit

 

History

History
135 lines (91 loc) · 4.19 KB

spark-standalone-worker.adoc

File metadata and controls

135 lines (91 loc) · 4.19 KB

Standalone Worker

Standalone Worker (aka standalone slave) is a logical node in a Spark Standalone cluster.

Worker is a ThreadSafeRpcEndpoint that uses Worker for the RPC endpoint name when registered.

You can have one or many standalone workers in a standalone cluster. They can be started and stopped using management scripts.

Worker is created when…​FIXME

When started, Worker…​FIXME

Table 1. Worker’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

workDir

Working directory of the executors that the Worker manages

Initialized when Worker is requested to createWorkDir (when Worker RPC Endpoint is requested to start on a RPC environment).

Used when Worker is requested to handleRegisterResponse and receives a WorkDirCleanup message.

Used when Worker is requested to onStart (to create a WorkerWebUI), receives LaunchExecutor or LaunchDriver messages.

receive Method

receive: PartialFunction[Any, Unit]
Note
receive is part of RpcEndpoint Contract to process messages.

receive…​FIXME

handleRegisterResponse Internal Method

handleRegisterResponse(msg: RegisterWorkerResponse): Unit

handleRegisterResponse…​FIXME

Note
handleRegisterResponse is used when…​FIXME

Launching Worker Standalone Application — main Method

main(argStrings: Array[String]): Unit

main…​FIXME

Starting RPC Environment And Registering Worker RPC Endpoint — startRpcEnvAndEndpoint Method

startRpcEnvAndEndpoint(
  host: String,
  port: Int,
  webUiPort: Int,
  cores: Int,
  memory: Int,
  masterUrls: Array[String],
  workDir: String,
  workerNumber: Option[Int] = None,
  conf: SparkConf = new SparkConf): RpcEnv

startRpcEnvAndEndpoint…​FIXME

startRpcEnvAndEndpoint creates a RpcEnv for the input host and port.

startRpcEnvAndEndpoint creates a Worker RPC endpoint (for the RPC environment and the input webUiPort, cores, memory, masterUrls, workDir and conf).

startRpcEnvAndEndpoint requests the RpcEnv to register the Worker RPC endpoint under the name Worker.

Note

startRpcEnvAndEndpoint is used when:

  • Worker is launched from a command line

  • LocalSparkCluster is requested to start

Creating Worker Instance

Worker takes the following when created:

  • RpcEnv

  • Port of the administrative web UI

  • Number of cores

  • Amount of memory

  • standalone Master’s RpcAddresses

  • RPC endpoint name

  • Path to the working directory

  • SparkConf

  • SecurityManager

Worker initializes the internal registries and counters.

createWorkDir Internal Method

createWorkDir(): Unit

createWorkDir sets workDir to be either workDirPath if defined or sparkHome with work subdirectory.

In the end, createWorkDir creates workDir directory (including any necessary but nonexistent parent directories).

createWorkDir reports…​FIXME

Note
createWorkDir is used exclusively when Worker RPC Endpoint is requested to start on a RPC environment.

onStart Method

onStart(): Unit
Note
onStart is part of RpcEndpoint Contract to activate an endpoint and start accepting messages.

onStart…​FIXME