Application : User program built on Spark. Consists of a driver program and executors on the cluster.
Application jar : A jar containing the user’s Spark application. In some cases users will want to create an “uber jar” containing their application along with its dependencies. The user’s jar should never include Hadoop or Spark libraries, however, these will be added at runtime.
Driver program : The process running the main() function of the application and creating the SparkContext
Cluster manager : An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Deploy mode : Distinguishes where the driver process runs. In “cluster” mode, the framework launches the driver inside of the cluster. In “client” mode, the submitter launches the driver outside of the cluster.
Worker node : Any node that can run application code in the cluster
Executor : A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Task: A unit of work that will be sent to one executor
Job : A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you’ll see this term used in the driver’s logs.
Stage : Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you’ll see this term used in the driver’s logs.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/05/16 21:33:56 INFO SparkContext: Running Spark version 1.6.1 16/05/16 21:33:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/16 21:33:56 WARN Utils: Your hostname, ustc resolves to a loopback address: 127.0.1.1; using 192.168.102.77 instead (on interface eth0) 16/05/16 21:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/05/16 21:33:56 INFO SecurityManager: Changing view acls to: yunfeng 16/05/16 21:33:56 INFO SecurityManager: Changing modify acls to: yunfeng 16/05/16 21:33:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yunfeng); u sers with modify permissions: Set(yunfeng) 16/05/16 21:33:56 INFO Utils: Successfully started service 'sparkDriver' on port 53174. 16/05/16 21:33:56 INFO Slf4jLogger: Slf4jLogger started 16/05/16 21:33:56 INFO Remoting: Starting remoting 16/05/16 21:33:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.102.77:57025] 16/05/16 21:33:56 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57025. 16/05/16 21:33:56 INFO SparkEnv: Registering MapOutputTracker 16/05/16 21:33:57 INFO SparkEnv: Registering BlockManagerMaster 16/05/16 21:33:57 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2ace648a-937b-4a4c-b984-6e4cd06b8273 16/05/16 21:33:57 INFO MemoryStore: MemoryStore started with capacity 511.5 MB 16/05/16 21:33:57 INFO SparkEnv: Registering OutputCommitCoordinator 16/05/16 21:33:57 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/05/16 21:33:57 INFO SparkUI: Started SparkUI at http://192.168.102.77:4040 16/05/16 21:33:57 INFO Utils: Copying /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py to /tmp/spark-6cb08b18-143f-42dc-88c3-2778646 0836b/userFiles-ae0a9fc0-65cf-467e-848b-4f3cf4e6e1c2/calc_pi.py 16/05/16 21:33:57 INFO SparkContext: Added file file:/home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py at file:/home/yunfeng/Download s/spark-1.6.1-bin-hadoop2.6/calc_pi.py with timestamp 1463405637243 16/05/16 21:33:57 INFO Executor: Starting executor ID driver on host localhost 16/05/16 21:33:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43770. 16/05/16 21:33:57 INFO NettyBlockTransferService: Server created on 43770 16/05/16 21:33:57 INFO BlockManagerMaster: Trying to register BlockManager 16/05/16 21:33:57 INFO BlockManagerMasterEndpoint: Registering block manager localhost:43770 with 511.5 MB RAM, BlockManagerId(driver, localhost , 43770) 16/05/16 21:33:57 INFO BlockManagerMaster: Registered BlockManager 16/05/16 21:33:57 INFO SparkContext: Starting job: reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py:12 16/05/16 21:33:57 INFO DAGScheduler: Got job 0 (reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py:12) with 8 output partiti ons 16/05/16 21:33:57 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py:12) 16/05/16 21:33:57 INFO DAGScheduler: Parents of final stage: List() 16/05/16 21:33:57 INFO DAGScheduler: Missing parents: List() 16/05/16 21:33:57 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_ pi.py:12), which has no missing parents 16/05/16 21:33:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.3 KB, free 4.3 KB) 16/05/16 21:33:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.8 KB, free 7.1 KB) 16/05/16 21:33:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:43770 (size: 2.8 KB, free: 511.5 MB) 16/05/16 21:33:57 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/05/16 21:33:57 INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/yunfeng/Downloads/spark-1.6. 1-bin-hadoop2.6/calc_pi.py:12) 16/05/16 21:33:57 INFO TaskSchedulerImpl: Adding task set 0.0 with 8 tasks 16/05/16 21:33:57 WARN TaskSetManager: Stage 0 contains a task of very large size (486 KB). The maximum recommended task size is 100 KB. 16/05/16 21:33:57 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 497894 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, partition 3,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, partition 4,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, partition 5,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, partition 6,PROCESS_LOCAL, 629219 bytes) 16/05/16 21:33:57 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, partition 7,PROCESS_LOCAL, 632117 bytes) 16/05/16 21:33:57 INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 16/05/16 21:33:57 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 16/05/16 21:33:57 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 16/05/16 21:33:57 INFO Executor: Running task 6.0 in stage 0.0 (TID 6) 16/05/16 21:33:57 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 16/05/16 21:33:57 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 16/05/16 21:33:57 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) 16/05/16 21:33:57 INFO Executor: Running task 4.0 in stage 0.0 (TID 4) 16/05/16 21:33:57 INFO Executor: Fetching file:/home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py with timestamp 1463405637243 16/05/16 21:33:57 INFO Utils: /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py has been previously copied to /tmp/spark-6cb08b18-143 f-42dc-88c3-27786460836b/userFiles-ae0a9fc0-65cf-467e-848b-4f3cf4e6e1c2/calc_pi.py 16/05/16 21:33:58 INFO PythonRunner: Times: total = 340, boot = 226, init = 1, finish = 113 16/05/16 21:33:58 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 998 bytes result sent to driver 16/05/16 21:33:58 INFO PythonRunner: Times: total = 353, boot = 222, init = 2, finish = 129 16/05/16 21:33:58 INFO PythonRunner: Times: total = 359, boot = 230, init = 1, finish = 128 16/05/16 21:33:58 INFO PythonRunner: Times: total = 360, boot = 225, init = 3, finish = 132 16/05/16 21:33:58 INFO PythonRunner: Times: total = 358, boot = 224, init = 1, finish = 133 16/05/16 21:33:58 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 998 bytes result sent to driver 16/05/16 21:33:58 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 998 bytes result sent to driver 16/05/16 21:33:58 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 998 bytes result sent to driver 16/05/16 21:33:58 INFO PythonRunner: Times: total = 373, boot = 248, init = 0, finish = 125 16/05/16 21:33:58 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 998 bytes result sent to driver 16/05/16 21:33:58 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 998 bytes result sent to driver 16/05/16 21:33:58 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 420 ms on localhost (1/8) 16/05/16 21:33:58 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 427 ms on localhost (2/8) 16/05/16 21:33:58 INFO PythonRunner: Times: total = 385, boot = 245, init = 0, finish = 140 16/05/16 21:33:58 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 998 bytes result sent to driver 16/05/16 21:33:58 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 431 ms on localhost (3/8) 16/05/16 21:33:58 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 439 ms on localhost (4/8) 16/05/16 21:33:58 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 437 ms on localhost (5/8) 16/05/16 21:33:58 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 430 ms on localhost (6/8) 16/05/16 21:33:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 455 ms on localhost (7/8) 16/05/16 21:33:58 INFO PythonRunner: Times: total = 390, boot = 246, init = 1, finish = 143 16/05/16 21:33:58 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 998 bytes result sent to driver 16/05/16 21:33:58 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 442 ms on localhost (8/8) 16/05/16 21:33:58 INFO DAGScheduler: ResultStage 0 (reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py:12) finished in 0.467 s 16/05/16 21:33:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/05/16 21:33:58 INFO DAGScheduler: Job 0 finished: reduce at /home/yunfeng/Downloads/spark-1.6.1-bin-hadoop2.6/calc_pi.py:12, took 0.569039 s *****result:pi is :3.140324***** 16/05/16 21:33:58 INFO SparkContext: Invoking stop() from shutdown hook 16/05/16 21:33:58 INFO SparkUI: Stopped Spark web UI at http://192.168.102.77:4040 16/05/16 21:33:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/05/16 21:33:58 INFO MemoryStore: MemoryStore cleared 16/05/16 21:33:58 INFO BlockManager: BlockManager stopped 16/05/16 21:33:58 INFO BlockManagerMaster: BlockManagerMaster stopped 16/05/16 21:33:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/05/16 21:33:58 INFO SparkContext: Successfully stopped SparkContext 16/05/16 21:33:58 INFO ShutdownHookManager: Shutdown hook called 16/05/16 21:33:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-6cb08b18-143f-42dc-88c3-27786460836b/pyspark-33d22309-ef12-45d6-9862-2 5ceb8beadac 16/05/16 21:33:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-6cb08b18-143f-42dc-88c3-27786460836b 16/05/16 21:33:58 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.