When is "ACID" ACID? Rarely.

22 Jan 2013

tl;dr: ACID and NewSQL databases rarely provide true ACID guarantees by default, if they are supported at all. See the table.

Many databases today differentiate themselves from their NoSQL counterparts by claiming to support “100% ACID” transactions or by “guaranteeing strong consistency (ACID).” In reality, few of these databases—including traditional “big iron” systems like Oracle—provide formal ACID guarantees, even when they claim to do so.

The textbook definition of ACID Isolation is serializability (e.g., Architecture of a Database System, Section 6.2), which states that the outcome of executing a set of transactions should be equivalent to some serial execution of those transactions. This means that each transaction gets to operate on the database as if it were running by itself, which ensures database correctness, or consistency. A database with serializability (“I” in ACID), provides arbitrary read/write transactions and guarantees consistency (“C” in ACID), or correctness, of the database. Without serializability, ACID, particularly consistency, is generally1 not guaranteed

Nevertheless, most publicly available databases (often claiming to provide “ACID” transactions) do not provide serializability. I’ve compiled the isolation guarantees provided by 18 popular databases below (sources hyperlinked). Only three of 18 databases provide serializability by default, and only nine provide serializability as an option at all (shaded):

DatabaseDefault IsolationMaximum Isolation
Actian Ingres 10.0/10SSS
AerospikeRCRC
Akiban PersistitSISI
Clustrix CLX 4100RR?
Greenplum 4.1RCS
IBM DB2 10 for z/OSCSS
IBM Informix 11.50DependsRR
MySQL 5.6RRS
MemSQL 1bRCRC
MS SQL Server 2012RCS
NuoDBCRCR
Oracle 11gRCSI
Oracle Berkeley DBSS
Oracle Berkeley DB JERRS
Postgres 9.2.2RCS
SAP HANARCSI
ScaleDB 1.02RCRC
VoltDBSS
Legend RC: read committed, RR: repeatable read, S: serializability,
SI: snapshot isolation, CS: cursor stability, CR: consistent read

Instead of providing serializability, many these databases provide one of several weaker variants,2 often when marketing material and documentation claim otherwise.3 There is no fundamental reason why a database shouldn’t support serializability—we have the algorithms, and we’ve made great strides in improving ACID scalability.4 So why not provide serializability by default, or, at the least, provide serializability as an option at all? One key factor is performance: serializable isolation can limit concurrency; traditional techniques such as two-phase locking are expensive compared to, say, taking short read locks on data items. Additionally, it is impossible to simultaneously achieve high availability and serializability (though most of these database implementations are not highly available anyway, even when providing weaker models). A third reason is that transactions may be less likely to deadlock or abort due to conflicts under weaker isolation. However, these benefits aren’t free: the consistency anomalies that arise from the weak levels shown above are well-understood and quantifiable.

Where’s the silver lining? We can get real ACID in some of our databases (if not by default). And, despite the fact that many other “ACID” databases don’t provide ACID properties—at least according to decades of research and development and formally proven guarantees regarding database correctness (although perhaps marketing has rewritten the books)—we can still reserve travel tickets, use our bank accounts, and fight crime. How? One possibility is that anomalies are rare and the performance benefits of weak isolation outweigh the cost of inconsistencies. Another possibility is that applications are performing their own concurrency control external to the database; database programmers can use commands like SELECT FOR UPDATE, manual LOCK TABLE, and UNIQUE constraints to manually perform their own synchronization. The answer is likely a mix of each, but, stepping back, these strategies should remind you of what’s often done today in NoSQL-style data infrastructure: “good enough” consistency and some hand-rolled, application-specific concurrency control. Perhaps there’s a better question: when is “ACID” NoSQL?

This is Part One of a two part series on Transactions and Consistency.
Part Two: recent research on Highly Available Transactions (HATs).

Thanks to Neil Conway, Ali Ghodsi, and Alan Fekete for early feedback on this post.


Footnotes

[1] There’s a considerable amount of research focusing on how to provide ACID consistency without serializability. As an example, we can restrict the types of operations that transactions can perform, as in escrow and read-only transactions and with monotonic logic. We can also consider hypothetical databases that introduce dummy transactions to fill in anomalous behavior in the serial schedule, which would be silly but technically serializable. The systems in question don’t (usually) provide these sorts of “special-case” ACID-compliant transactions as features.

[2] There are a bunch of different weak isolation models to consider, but their definitions often vary depending on where you look. In this table, when necessary, I’ve mapped the stated guarantees back to a known model (e.g., Aerospike); in any event, only the databases marked as such provide serializability. The best vendor documentation will tell you exactly what is implemented, even if the description doesn’t match the name (see Footnote 3). If you like database theory, the best description of these levels I’ve seen, describing both multi-version and lock-based databases, is Atul Adya’s MIT Ph.D. thesis from 1999.

[3] As a detailed example of what can happen, consider Oracle 11g. (Admittedly, I’m picking on Oracle, due mostly to the wealth of available information.) 11g’s strongest isolation level is called “serializable,” while its description matches snapshot isolation. This behavior is well-documented in both the academic literature and by practitioners. For more fun, try to figure out what can happen when you execute distributed transactions.

[4] As an example, check out Michael Stonebraker and Andy Pavlo’s research on the HStore project (commercialized via VoltDB) or Google’s Spanner. Each of these systems makes trade-offs (e.g., Spanner still uses two-phase locking for read-write transactions, which is expensive over wide-area networks, and doesn’t support transaction-level read-your-write semantics) but is pushing the limits of true ACID scalability.

You can follow me on Twitter here.