12 Nov 2014
My coauthors and I recently published a paper (that will appear at VLDB 2015) answering one of my longest standing research questions: when does consistency require coordination? It’s well known that many “strong” properties like “ACID” serializability and linearizability are not achievable without coordination, or synchronous communication between concurrent operations. But why is it that we can still implement reliable distributed counters and shopping carts that don’t lose writes, build indexes that are “consistent” with base data, and ensure useful properties like read-your-writes—all without coordination? Why do some operations require coordination while others don’t, and what’s the fundamental difference at play?
In the paper, we present a property, called invariant confluence (I-confluence), that precisely answers this question. I-confluence is necessary and sufficient for safe, coordination-free, available, and convergent execution (think CAP “AP”, without breaking your app). That is, if I-confluence holds, there exists a coordination-free execution strategy that preserves these properties. If it does not hold, no such strategy exists, so don’t waste your time looking for a coordination-free algorithm or concurrency control mechanism that’ll work in all cases.
The intuition behind I-confluence is pretty simple. We capture “consistency” (or safety) using invariants, or declarative correctness criteria about database states (e.g., no user has a negative account balance). Then, given a user’s transactions and a merge procedure used to reconcile divergent states (e.g., last-write-wins, set union, or a convergent datatype), we can check for I-confluence. Simply put, under I-confluence: all local commit decisions must be globally invariant-preserving. If I commit an operation on my local copy of database state, I must make sure that no other concurrent operation would invalidate my commit decision upon merge. If no such state exists, I don’t have to coordinate to commit. That’s it.
“That’s it? That sounds too simple to be true!” you protest.
It’s true, formalizing this property requires some care,1 but the basic idea is pretty simple.2 The key is to specify correctness in terms of invariants rather than reads and writes.3 Without knowledge of what “correctness” means to your app (e.g., the invariants used in I-confluence), the best you can do to preserve correctness under a read/write model is serializability.4 (Hopefully your database offers it.)
“So what?” you ask.
We applied I-confluence to a range of integrity constraints found in database systems, including foreign key, uniqueness, and row-level check constraints as well as some invariants over abstract datatypes.5 The results show that, while coordination can’t always be avoided, in many cases, it can (even when serializability would indicate otherwise). By applying these results to the industry standard TPC-C Benchmark, we achieve a 25-fold improvement over the prior best result (12.7M New-Order transactions per second on 200 servers—I’ll write about the details in another blog post, but, for now, the paper has details). We’ve also been looking at open source applications and applying I-confluence analysis (the subject of another paper and post) and have seen similar wins. Our experience indicates I-confluence works.
Consistency doesn’t always require coordination. In fact, in practice, it seldom does. I’m excited to finally have a piece of work that exactly characterizes this trade-off, backed by some serious experimental data that shows why it matters.
The paper provides actual formalism, which, if you’re a footnote reader, you should consider checking out. In a little more detail, the set of states reachable via invariant-preserving executions must be closed under merge. We also formalize coordination-freedom, availability, and convergence more carefully. If you absolutely love formalism, you can also check out the extended version of the paper on the arXiv. ↩
When I was doing this work, I was convinced someone else had come up with a property like this already. There’s a huge amount of work on semantics-based concurrency control (mostly from the 1980s) and even more on concurrent program analysis. But, for a bunch of reasons, we actually found the basic property we were seeking in the literature on rule-based rewriting systems rather than database concurrency control or classic program analysis. In 2007, some very smart folks working on rule-based rewriting wanted a more expressive way to talk about “correct” rewrites without resorting to a formulation based on determinism, or confluence (see below), so they came up with the idea of using invariants to talk about the validity of intermediate and final steps in the rewriting process. Our use of I-confluence is, if you squint, similar in nature to what they proposed (e.g., treat transactions as rules, and merge as “meet”), and so we adopted the name. I’d love to spend more time digging into that literature and taking greater advantage of rule-based rewriting techniques in I-confluence analysis.
To go a little deeper on the rest of the literature: we have a pretty extensive related work section in the paper, but there were two main problems I ran into in the literature. First, much of the work on semantics-based concurrency control in databases was helpful but wasn’t directly focused on necessary conditions for coordination-free execution. Second, in the program analysis literature, many techniques assume linearizable update to shared state. For example, if you declare pre- and post-conditions (which we don’t), you can often determine the safety of executing each of these operations in a linearizable manner. Not only is this linearizability already a non-starter in a coordination-free environment, but also if you’re in a shared-memory setting, any writes to the same item conflict and can cause terrible problems in many analysis techniques. In contrast, in a distributed setting (or even single-node environment with the ability to rewrite/curry writes to local program variables, then play with them later), writes to the same data item can be reconciled via potentially complex logic (e.g., merge). I’d be shocked if you couldn’t find some encoding of merge and the general I-confluence rules in these program analysis techniques, but given that the rewriting literature made this basically “free”, we stuck with that. ↩
Another (more subtle) benefit along these lines is that I-confluence allows us to reason about arbitrary, abstract implementations and deployments of database systems rather than any particular system implementation or deployment. We effectively “lift” the partitioning arguments typified by Gilbert and Lynch’s proof of the CAP Theorem to the level of arbitrary program logic (rather than read/write logic executed in parallel on different, adversarially partitioned servers). If that’s inscrutable, compare the paper’s I-confluence proofs to those in the HAT paper or those used by Gilbert and Lynch. This higher-level abstraction is, in our experience, more intuitive than reasoning about “partition” behavior—and also applies to single-node systems as well. Basically, instead of having to cook up a scenario where servers become partitioned, forcing “inconsistency,” we can rely on a basic property of application logic instead. ↩
Those of you who live and breathe this stuff (like me) will probably be asking: “what about commutativity?” or “what about monotonicity?”
Commutativity, or, in effect, order insensitivity of operations, is useful and is usually sufficient for safe, concurrent execution. But it’s not always necessary for correctness: for example, reads don’t commute with writes, but, if all we care about is that no one writes the value 0xDEADBEEF, who cares about reads? We provide some more examples in the paper, and so do Clements et al. Bonus: commutativity isn’t always safe. For example, if we we use a commutative counter to ensure we don’t “lose” the effects of increment or decrement operations, there’s no guarantee that our counter will obey the invariant that the counter is non-negative. (You might argue that this wasn’t really a “commutative” operation, at which we could have a fun conversation.)
Monotonicity, or, in effect, “grow-only” program logic, is also useful and is necessary and sufficient to guarantee confluence, or determinism, of outcomes despite nondeterministic ordering of program inputs. This underlies languages like Bloom, Bloom^L, and LVars. Determinism is useful, but it’s not necessary for all forms of correctness. As a classic example, consensus protocols allow any proposed value to be accepted insofar as all processes agree on that value—the outcome is, moreover, nondeterministic due to network delays in many common protocols like Paxos. That is, Paxos is not confluent (nor monotonic), but it’s still useful. In the coordination-free domain, the read operation in our commutativity example above wouldn’t be monotonic (we could do a blocking threshold read, which might work in some cases, but then we’ll have to block). Moreover, if we only decremented a shared counter, the decrement operations would be monotonic, but, if we had the same non-negative invariant, we could end up with a deterministic (negative) but incorrect state. So, in short, confluence and invariant preservation are both useful properties, but we care about the latter in this paper.
The silver lining here is that—despite false positives for coordination (and, depending on your definitions, false negatives)—both commutativity and monotonicity are in some cases easier to analyze. Commutativity, monotonicity, and I-confluence are all undecidable properties. But the first two depend only on the program logic, while I-confluence requires users to also specify invariants. (However, note that these invariants are really only one-per-database—or one conjunction per database—rather than one per operation, as in Hoare-style program analyses.) The open source applications we’ve been studying actually declare these surprisingly often, but there’s still a trade-off between user participation and precision in determining coordination requirements. ↩
If you’re interested in how we actually apply the test, there are details in the paper. In brief, our current approach is to pre-classify invariant-operation pairs (by manual proof—check out the arXiv for a full walk-through; these aren’t too hard) and then use simple static analysis to check pairs in code (I hesitate to really call this “program analysis”).
My current goal is to use I-confluence as a basis for principled systems optimizations like RAMP, the TPC-C results, and some of the open source work I’ve been doing. That said, I see a huge amount of low-hanging fruit in improving and further automating this analysis. ↩
- How To Make Fossils Productive Again (30 Apr 2016)
- You Can Do Research Too (24 Apr 2016)
- Lean Research (20 Feb 2016)
- I Loved Graduate School (01 Jan 2016)
- NSF Graduate Research Fellowship: N=1 Materials for Systems Research (03 Sep 2015)
- Worst-Case Distributed Systems Design (03 Feb 2015)
- Data Integrity and Problems of Scope (20 Oct 2014)
- Linearizability versus Serializability (24 Sep 2014)
- MSR Silicon Valley Systems Projects I Have Loved (19 Sep 2014)
- Understanding Weak Isolation Is a Serious Problem (16 Sep 2014)