Hadoop Interview questions

Total available count: 27
Subject - Apache
Subsubject - Hadoop

How do you define RDD?

A Resilient Distributed Dataset (RDD), the primary abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large or huge clusters in a fault-tolerant manner.

Resilient: Fault-tolerant and so able to recomputed damaged partitions or missing on node failures with the help of the RDD lineage graph.
Distributed: across clusters.
Dataset: The collection of partitioned data is known as Dataset.

Next 4 interview question(s)

How can you define SparkConf?
How do you define SparkContext?
Why both Spark and Hadoop needed?
Why Spark, even Hadoop exists?