Advantages of Immutability in distributed systems and in programming


Specific to Distributed systems:
1.      Performance simple, easy to share the RDD with multiple processing elements
2.      Fault tolerance re-creatable on failure
3.      Caching and in-memory processing - References can be cached as they are not going to change. 
4.      Multiple threads can access the partition
5.      Sharing - Safe to share across processes
6.      Processing is easy - makes it easy to parallelize, as there are no conflicts
7.      Replication - internal state will be in consistent even if you have an exception.
8.      Rules out the potential problems due to updates from multiple threads.
9.      Can easily live in memory as on disk, this makes it reasonable to easily move operations that hit disk to instead use data in memory, and again, adding memory is easy than I/O bandwidth.

RDD significant wins, at cost of having to copy the data rather than mutate it in place.

General list of reasons to favor immutability in programming:
  • immutable objects are simpler to construct, test, and use
  • truly immutable objects are always thread-safe
  • they help to avoid temporal coupling
  • their usage is side-effect free (no defensive copies)
  • identity mutability problem is avoided
  • they always have failure atomicity
  • they are much easier to cache
  • they prevent NULL references, which are bad
Note: when you care a lot about performance, e.g. programming game, it may be necessary to use mutable object.

Comments

Popular posts from this blog

Out Of Memory in Spark(OOM) - Typical causes and resolutions

map vs flatMap in Spark

Spark Persistence(Caching)