27 Mar 2012
tl;dr: Eventual consistency is a liveness property—not a safety property—and is trivially satisfiable by itself. Liveness and safety properties should be taken together.
Safety and liveness are two important kinds of properties provided by all distributed systems. Informally, safety guarantees promise that nothing bad happens, while liveness guarantees promise that something good eventually happens. Every distributed system makes some form of safety and liveness guarantees, and some are stronger than others. For example, atomic consistency guarantees that operations will appear to happen instantaneously across the system (safety) but operations won’t always succeed in the presence of network partitions (liveness, in the form of availability).
Many of today’s distributed systems promise eventual consistency: after some period of time, all participants in the system agree on the same value. This is a useful property: good things will eventually happen without the need for intervention, even in the presence of partitions. However, under our definitions of safety and liveness, eventual consistency only provides liveness guarantees, not safety: Which value is eventually chosen? What values may be returned before participants “eventually” agree?
As recent work from UT Austin points out, it’s easy to satisfy liveness without being useful. If all replicas always return the initial state, the system is eventually consistent. If all replicas return the value 42 in response to every request (even if you didn’t write the value of 42), the system is eventually consistent. If replicas accept every thousandth write, the system is eventually consistent. These guarantees are somehow not what we would like, but they satisfy our definition of eventual consistency. Moreover, as the authors explain, accepting more read/write combinations doesn’t necessarily translate to stronger consistency. We’d like some notion of convergence that captures both agreement on a common shared state and exchanging of writes.1
Today’s eventually consistent systems do provide some form of safety properties, even if they don’t say so explicitly. For instance, in Riak, Cassandra, and DynamoDB, timestamp ordering is often used to decide which version of a data item to keep. Moreover, these data stores won’t return any values you haven’t written to them, and replicas will converge to the last written value for each key. In short, many “eventually consistent” stores really offer something like “eventually last-writer-wins, and read-the-last-observed-value in the meantime” consistency. This is both more descriptive and more useful than a vanilla “eventual consistency” guarantee.2
It’s worth noting that safety without convergence also leads to problems. Read-your-writes, PRAM/monotonic writes, and causal consistency guarantees are trivially achievable using only local storage and no communication: simply keep a local copy of every key that you update and read from for every operation. This is not a convergent implementation. However, it satisfies each of these consistency models because they make safety but not liveness guarantees. If we were to add in our liveness requirement of convergence, our implementation would have to propagate writes between replicas.
Next time someone tells you their system is “eventually consistent,” ask them two questions: What versions of a data item can be returned at any time? What version will the system eventually choose to return? And remember: consider safety and liveness properties together. Otherwise, you probably have a trivially satisfiable requirement.
- Causality is expensive (and what to do about it) (05 Feb 2014)
- Stickiness and client-server session guarantees (13 Jan 2014)
- On consistency and durability (10 Dec 2013)
- Non-blocking transactional atomicity (28 May 2013)
- Communication costs in real-world networks (17 May 2013)
- HAT, not CAP: Introducing Highly Available Transactions (05 Feb 2013)
- When is "ACID" ACID? Rarely. (22 Jan 2013)
- Using PBS in Cassandra 1.2.0 (14 Jan 2013)
- Doing redundant work to speed up distributed queries (20 Sep 2012)
- A running list: Writing, speaking, and research advice (17 Mar 2012)