Out Of Memory in Spark(OOM) - Typical causes and resolutions
Spark OOM – Can occur at Driver/Executor/node manager. The error that we see is: java.lang.OutOfMermoryError. This will happen when the required memory exceeds the available memory for a particular operation. 1. Driver OOM - possible causes: a. Collect For e.g. in the program, we have val data = inDF.collect() Collect operation, collects the data from all executors and send it to the driver. The driver will try to merge into the single object, then there is a possibility that it may be too big to fit into the driver memory. This problem can be solved in two ways: Ø Setting proper limit using spark.driver.maxResultSize Ø Repartitioning before saving the result to the output file. § data.repartition(1).write… § Repartition uses dedicated executor to collect. b. Broadcast Join ¨ The table will be materialized at the driver and then broadcasted to the exectuors. c. Low driver memory configurati