What is the purpose of Driver in Spark Architecture?

A Spark driver is a process that creates and owns an instance of SparkContext. It is your Spark application that launches the key method in which the instance of SparkContext is created.

  • Drive splits a Spark application into tasks and schedules them to run on executors.
  • A driver is where the task scheduler lives and spawns tasks across workers.
  • A driver coordinates workers and the overall execution of tasks.

Define Spark architecture?
What is checkpointing?
What is Shuffling?
Data is spread in all the nodes of cluster, how spark tries to process this data?
What is wide Transformations?