Posts

Showing posts from October, 2022

Out Of Memory in Spark(OOM) - Typical causes and resolutions

   Spark OOM – Can occur at Driver/Executor/node manager. The error that we see is: java.lang.OutOfMermoryError. This will happen when the required memory exceeds the available memory for a particular operation. 1.        Driver OOM - possible causes: a.        Collect For e.g. in the program, we have    val data = inDF.collect()                 Collect operation, collects the data from all executors and send it to the driver. The driver will try to merge into the single object, then there is a possibility that it may be too big to fit into the driver memory. This problem can be solved in two ways: Ø   Setting proper limit using spark.driver.maxResultSize Ø   Repartitioning before saving the result to the output file. §   data.repartition(1).write… §   Repartition uses dedicated executor to collect. b.        Broadcast Join ¨        The table will be materialized at the driver and then broadcasted to the exectuors. c.        Low driver memory configurati