Limitations of Spark
1. Not real time processing, it is near real time processing engine. 2. Expensive, due to in-memory computation 3. High latency and less throughput when compared to Flink 4. Small file problem, with S3. It is possible to compress zip files only when the complete file is present at one core. It requires lot of time to unzip files in sequence. For efficient processing, it needs immense shuffling of data. 5. Window criteria based on Time only not on the basis of number of records. 6. Not having own File processing System. 7. Few algorithms in ML 8. Iterative processing 9. Handling back pressure 10. Manual optimization.