RDD Interface

 RDD is the basic abstraction in Spark.


Internally it is characterized by:

  1. Dependencies
  2. Partitions
  3. Compute
  4. Partitioner for Key Value RDD's
  5. List of Preferred locations.
4 and 5 are Optional.

RDD achieves Resilience with the dependencies. 


Comments

Popular posts from this blog

Out Of Memory in Spark(OOM) - Typical causes and resolutions

map vs flatMap in Spark

Spark Persistence(Caching)