Hadoop Interview questions


Total available count: 27
Subject - Apache
Subsubject - Hadoop

Why both Spark and Hadoop needed?

Spark is often called a cluster computing engine or simply an execution engine. Spark uses many concepts from Hadoop MapReduce. Both Hadoop and Spark work together well. Spark with YARN and HDFS gives better performance and also simplifies the work distribution on clusters. HDFS is a storage engine for storing huge volumes of data and Spark as a processing engine (In memory as well as more efficient data processing). 

HDFS: HDFS is used as a Storage engine for Spark as well as Hadoop. 
YARN: YARN is a framework to manage Cluster using a pluggable scheduler.  

Run other than MapReduce: With Spark, you can run the MapReduce algorithm as well as other higher levels of operators for instance filter(), map(),  reduceByKey(), groupByKey(), etc.




Next 1 interview question(s)

1
Why Spark, even Hadoop exists?