Advantages of Immutability in distributed systems and in programming

August 08, 2018

Specific to Distributed systems:

1. Performance – simple, easy to share the RDD with multiple processing elements

2. Fault tolerance – re-creatable on failure

3. Caching and in-memory processing - References can be cached as they are not going to change.

4. Multiple threads can access the partition

5. Sharing - Safe to share across processes

6. Processing is easy - makes it easy to parallelize, as there are no conflicts

7. Replication - internal state will be in consistent even if you have an exception.

8. Rules out the potential problems due to updates from multiple threads.

9. Can easily live in memory as on disk, this makes it reasonable to easily move operations that hit disk to instead use data in memory, and again, adding memory is easy than I/O bandwidth.

RDD significant wins, at cost of having to copy the data rather than mutate it in place.

General list of reasons to favor immutability in programming:

immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
they prevent NULL references, which are bad

Note: when you care a lot about performance, e.g. programming game, it may be necessary to use mutable object.

Search This Blog

SparkScalaNotes

Advantages of Immutability in distributed systems and in programming

Comments

Post a Comment

Popular posts from this blog

Out Of Memory in Spark(OOM) - Typical causes and resolutions

When to use RDD?

map vs flatMap in Spark