When to use RDD?

1.       Want to precisely instruct Spark how to do a query i.e., controlling the low-level operations.

2.       Can forgo the code optimization, efficient space utilization and performance benefits available with DF’s and DS’.

3.       If the data is unstructured such as media streams or streams of text.

4.       Not imposing the schema while processing or accessing the attributes by name or column.

5.       Want to manipulate the data with functional programming constructs than domain specific expressions.

6.       Existing dependent third-party package is written using RDD’s.


Comments

Popular posts from this blog

Out Of Memory in Spark(OOM) - Typical causes and resolutions

map vs flatMap in Spark

Spark Persistence(Caching)