A Resilient Distributed Dataset (RDD), the primary abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large or huge clusters in a fault-tolerant manner.
Resilient: Fault-tolerant and so able to recomputed damaged partitions or missing on node failures with the help of the RDD lineage graph.
Distributed: across clusters.
Dataset: The collection of partitioned data is known as Dataset.