map vs flatMap in Spark

August 03, 2018

map	flatMap
Returns single element based on the function/custom business logic/algorithm	Returns zero or more elements based on the function/custom business logic/algorithm
Single element as return type	Iterator with our return values, but we don’t return an iterator of RDD’s, return RDD that consists of the elements from all the iterators.
e.g. val list = 1 to 5 toList list: List[Int] = List(1, 2, 3, 4, 5) list.map(_.to(3)) List[scala.collection.immutable.Range.Inclusive] = List(Range(1, 2, 3), Range(2, 3), Range(3), Range(), Range())	E.g. val list = 1 to 5 toList list: List[Int] = List(1, 2, 3, 4, 5) list.flatMap(_.to(3)) List[Int] = List(1, 2, 3, 2, 3, 3)
Returns RDD of lists; refer to the example	Returns RDD of elements
map operation only	Equivalent to map followed by flatten
Returns the elements of the RDD	Returns the elements of the Iterators returned as

Another good example:
val myse = Seq("India","China")
myse: Seq[String] = List(India, China)

val myseqrdd = sc.parallelize(myse)
myseqrdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:26

myseqrdd.flatMap(_.toUpperCase)
res2: org.apache.spark.rdd.RDD[Char] = MapPartitionsRDD[1] at flatMap at <console>:26

myseqrdd.flatMap(_.toUpperCase).collect
res3: Array[Char] = Array(I, N, D, I, A, C, H, I, N, A)

myseqrdd.map(_.toUpperCase).collect
res5: Array[String] = Array(INDIA, CHINA)

Comments

UnknownAugust 15, 2018 at 2:51 AM
Liked your writing style...keep writing....
ReplyDelete
Replies

Add comment

Search This Blog

SparkScalaNotes

map vs flatMap in Spark

Comments

Post a Comment

Popular posts from this blog

Out Of Memory in Spark(OOM) - Typical causes and resolutions

When to use RDD?