Deep Dive into Spark Cluster Managers

This blog aims to dig into the different Cluster Management modes in which you can run your spark application.

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContextobject in your main program which is called the Driver Program. Specifically, to run on a cluster, the SparkContext can connect to several types of Cluster Managers, which allocate resources across applications. Once the connection is established, Spark acquires executors on the nodes in the cluster to run its processes, do some computation and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.


