Hadoop Interview questions

Total available count: 27
Subject - Apache
Subsubject - Hadoop

What is checkpointing?

Checkpointing is a process of truncating the RDD(Resilient Distributed Datasets) lineage graph and saving it to a reliable distributed (HDFS) or local file system. RDD checkpointing that saves the actual intermediate Resilient Distributed Datasets(RDD) data to a reliable distributed file system.

You mark an RDD for checkpointing by calling RDD.checkpoint(). The Resilient Distributed Datasets will be saved to a file inside the checkpoint directory and all references to its parent RDDs(Resilient Distributed Datasets) will be removed. This function has to be called before any job has been executed on this RDD.

Next 5 interview question(s)

What is Shuffling?
Data is spread in all the nodes of cluster, how spark tries to process this data?
What is wide Transformations?
What is Narrow Transformations?
How many type of transformations exist?