Hadoop Interview questions


Total available count: 27
Subject - Apache
Subsubject - Hadoop

How do you define RDD?

A Resilient Distributed Dataset (RDD), the primary abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large or huge clusters in a fault-tolerant manner.

Resilient: Fault-tolerant and so able to recomputed damaged partitions or missing on node failures with the help of the RDD lineage graph.
Distributed: across clusters.
Dataset: The collection of partitioned data is known as Dataset.




Next 4 interview question(s)

1
How can you define SparkConf?
2
How do you define SparkContext?
3
Why both Spark and Hadoop needed?
4
Why Spark, even Hadoop exists?