Hadoop Interview questions

Total available count: 27
Subject - Apache
Subsubject - Hadoop

What is Shuffling?

Shuffling is a method of repartitioning (redistributing) data across partitions and may cause moving it across JVMs or even network when it is redistributed among executors. Avoid shuffling at all costs. Think about methods to leverage existing partitions. Leverage partial aggregation to reduce or lessen the data transfer.

Next 5 interview question(s)

Data is spread in all the nodes of cluster, how spark tries to process this data?
What is wide Transformations?
What is Narrow Transformations?
How many type of transformations exist?
What is Preferred Locations?