Hadoop Interview questions

Subject - Apache
Subsubject - Hadoop

What is Shuffling?

Shuffling is a method of repartitioning (redistributing) data across partitions and may cause moving it across JVMs or even network when it is redistributed among executors. Avoid shuffling at all costs. Think about methods to leverage existing partitions. Leverage partial aggregation to reduce or lessen the data transfer.

