Nov 6, 2016 - Effective Serializability for Eventual Consistency

This post is a collection of my notes on Lucas Brutschy et al’s paper “Effective Serializability for Eventual Consistency”. A later version of this paper has been accepted to POPL’17.

Introduction

Serializability is a well-understood criterion to reason about concurrent transactions. Enforcing serializability via pessimistic concurrency control techniques, such as S2PL, is very expensive. Papadimitriou shows that checking a transaction history for serializability is NP-complete, hence optimistic concurrency control methods are no good either. However, Papadimitriou also shows that it is possible to construct efficient algorithms to detect violations of stronger variants of serializability. One such well-known variant is conflict-serializability. A history is conflict-serializable iff it has no conflicts among write-read, write-write and program-order dependencies. Conflict serializability, being a syntactic check, can be implemented efficiently by algorithms that detect cycles in the dependency graphs. However, conflict-serializability is stronger than serializability in the sense that it is possible for a history to exhibit conflicts in its dependency graph, yet be observably equivalent to a serial schedule. This may happen if the conflicts are not semantically relevant. An example is the following history of three concurrent transactions (T1, T2, and T3):

w1(A), w2(A), w2(B), w1(B), w3(B)

There is a write-write dependency from T1 to T2 w.r.t A, and another write-write dependency from T2 to T1 w.r.t B. The cycle in the dependency graph means that the history is not serializable. However, considering that later writes to an object absorb the effects of earlier writes, the schedule is observationally equivalent to the following serial schedule:

w1(A), w1(B), w2(A), w2(B), w3(B)

Hence, the schedule is effectively serializable, although it is not conflict-serializable. In the context of relational databases, where reads and writes are the only atomic actions, overlooking absorption is the main reason why serializable schedules are classified as non-serializable. But as we raise the level of abstraction to more sophisticated atomic actions, e.g., add or listLength, commutativity also becomes a reason to relax the definition of conflict-serializability. The main contribution of the paper is therefore to relax the definition of conflict-serializability to account for commutativity and absorption. The paper also generalizes the definition of (relaxed) conflict serializability to be applicable for histories generated by non-SC stores. Such histories, unlike the totally ordered SC histories shown above, are only partially ordered.

Technical Development

Actions are atomic actions with possible return values, such as x.set(0), x.add(2), x.get():2 etc. Action semantics is given as prefix-closed set of legal action sequences. For example, x.set(0), x.add(2), x.get():2 is a legal action sequence. The specification of an action is a possibly-infinite set of such action sequences. Two action sequences are equivalent () iff they are legal in exactly the same contexts. Thus x.set(0), y.set(1) and y.set(1), x.set(0) are equivalent because whenever the former appears in a legal action sequence, it can be replaced by the latter preserving the legality, and vice-versa. Observe that the definition of the legality is cleverly crafted to subsume the notion of observational equivalence; two legal action sequences containing the same set of actions are observationally equivalent. Thus, any transformation done on a sequence while preserving legality results in a sequence that is observationally equivalent.

Commutativity and absorption can now be specified in terms of equivalent action sequences. Two actions u and v are commutative iff uv ≡ vu. Likewise, v absorbs u iff uv ≡ v.

Since there is no need to consider order between adjacent commutative actions, we can relax such orders in an action sequence. The resultant partial order is called a trace. We furthermore divide the set of actions () into updates (), for which return values are not relevant, and queries (), for which only the return values are relevant. Authors assume that updates have not preconditions whatsoever, hence any action sequence containing only the updates is legal.

Let us now consider histories. As described above, SC histories are linear action sequences. However, under weaker levels of consistency, histories are only partially ordered w.r.t , the program order, , the visibility order, and , the arbitration trace. Visibility describes what updates are visible to what queries, and arbitration totally orders non-commutative actions (e.g., via timestamps). Since is a trace, commutative actions are not directly related by arbitration, but they may be related transitively via a non-commutative action. As with an SC history, the set of actions is partitioned into multiple transactions . Thus, a weakly consistent history (or schedule) is the tuple (note: we conflate the notions of history and schedule, although the paper doesn’t). For a weakly consistent schedule to be considered well-formed, it must satisfy the following properties:

  1. must be lower-finite and acyclic.
  2. Let be the action sequence of updates visible to a query order as per the arbitration order. Then, must be a legal action sequence.
  3. The set of updates invisible to a query must be finite (the liveness requirement of eventual consistency).

As an example, consider a query q that witnesses updates {u1,u2,u3,u4,v1,v2}. Their collective arbitration trace might look like the following:

TraceOrderEg

Solid arrows denote and dotted arrows denote absorption. The two updates v1 and v2 are not dependencies of q because they do not precede it in the trace order (maybe v1 and v2 are y.set(0) and y.add(1), whereas q is x.get()), and therefore can be “moved past” q. The update u2 is not a dependency of q either as it gets absorbed by the adjacent update u4. But after that u1 and u3 become adjacent, and so u1 can get absorbed by u3. What remains are the two dependencies u3 and u4 of q.

The algorithm to find dependencies of a query q is described below with help of the above example:

  1. Start with a set of all updates visible to q that directly precede q in .
    • For q in the above figure, the set is {u3,u4}
    • Updates v1 and v2 are thus already eliminated.
  2. Lets call the set as . In each iteration, expand the boundaries of by including updates that precede those in the set in the order without being absorbed by them (i.e., related to those in the set by a solid arrow, but not a dashed arrow). If an update (u) is absorbed by another update in the set, remove u, and relate all updates that precede u (in ) to all updates that succeed u.
    • For the current example, the only update preceding the elements in the set = {u3,u4} is u2. However, u2 is absorbed by u4, so we remove it from the trace and relate u1 to both u3 and u4.
    • In the next iteration, u1 precedes the elements in , but it is absorbed by u3 in , hence it too must be removed.
  3. We iterate until there are no more updates left to be included in .
    • The final for the current example is thus {u3,u4}.

The set is the dependency set of q, and contains all the updates that must be ordered before q in any serialization of the original weakly consistent schedule. It is informative to note that if we do not take commutativity and absorption into account, the dependency set would have been equal to the visibility set of q.

The notion of dependency has a natural counterpart called anti-dependency. An anti-dependency of a query in a given schedule is an update not visible to the query but such that making it visible would turn it into a dependency.

AntiDepDef

Thus, the set , the anti-dependency set of q, contains all the updates that must be ordered after q in any serialization of the original weakly consistent schedule. Computing the set of q is simple: find an update u not visible to q in the original schedule, and make it visible. Now, run the above algorithm to compute set of q. If contains u, the u is an anti-dependency of q, otherwise not. Again, it is informative to note that if we do not consider commutativity and absorption, all updates not visible to q are its anti-dependencies.

Once we compute and for all the queries, a serial order can be obtained by topologically ordering the graph . If the graph contains a cycle, then we have a conflict, and the schedule is not serializable.

Evaluation

The evaluation is notable for its breadth. 33 TouchDevelop applications have been analyzed. TouchDevelop uses global sequence protocol (gsp) to implement a replication system with prefix consistency (stronger than causal). Nonetheless, even if the store is SC, serializability violations are possible because of dependency violations among transactions (groups of atomic actions). As our experience with RDBMS tranasactions informs, serializability violations are abound under weak isolation, and requiring serializability for all transactions is a giant overkill. As a thumb rule, authors assume that SER violations involving query transactions (transactions issued by the display code) as harmless, hence exclude them from the evaluation. Among the remaining transactions, many SER violations have been found. Some examples discussed in the paper:

  1. Tetris code where high-score is being locally updated (mutable integer). This is obviously incorrect. Fix is to basically turn the high score integer into an op-based data structure.
  2. Event management code where the app sorts the cloud-backed list of events by removing the events, sorting them locally, and re-inserting them. Fix is to either do it in a SER transaction, or to simply stop doing that. Sort locally whenever event list is being queried.
  3. A quiz game that allows a user to overwrite an existing user account due to incorrect uniqueness check. Fix is to use CAS.
  4. A Tic-Tac-Toe game that uses mutable datatypes all over. The fix is to use a data model that is op-based.

Another round of evaluation is done with TPC-C implementation on top of Riak (an EC store with no strengthening). Some violations:

  1. A DELIVERY transaction that updates a mutable balance. Obviously, one of the updates is lost when two DELIVERY transactions run concurrently. The fix is to use a CRDT counter for balance.
  2. DELIVERY transaction also has a subsequent getBalance operation, so even if the balance is a CRDT, SER violations still exist. However, this violation is ignored since it is harmless.
  3. The percieved foreign-key violation between ORDERS and NEW_ORDERS tables manifests as an SER violation. The fix proposed is denormalization of data, although enforcing monotonic reads via QUORUM writes would fix the issue.
  4. Two NEW_ORDER transactions assign same id to a transaction. The behaviour is not serializable, and the only fix is to use CAS.

Observations and Questions

  1. It is remarkable that testing could uncover uniqueness bugs and foreign key bugs since they require concurrent transactions to issue operations on very specific keys and ids (uniqueness for example requires to randomly issued NEW_ORDER transactions to insert an order with exact same id See comments).
  2. Moreover, uncovering these bugs requires conflicting transactions to overlap in a specific manner. Such concurrent schedules are hard to trigger, aren’t they?
  3. This leads to the question: what is the test harness and testing methodology?
  4. Perceived foreign key violation has been uncovered by analyzing a query transaction. It is for this reason that I don’t feel comfortable excluding query transactions.
  5. With mutable data, lost updates are inevitable. That is, the presence of UPDATE ... SET queries in weakly isolated transactions is itself a sufficient condition to witness SER violations. 4 of the 8 bugs discussed above fall into this category. The fix proposed in all the cases is to basically turn the mutable data into an op-based RDT. Assuming that we start we such an implementation, how many new bugs would be uncovered? What are the fixes?
  6. It would be interesting to apply effective serializability criterion to replicated versions of stacks and queues. Experience tells us that replicating these data structures in a coordination-free fashion is quite tricky.
Continue reading

Sep 20, 2016 - Dynamo and DynamoDB

In this post, I discuss DeCandia et al’s Dynamo paper, and Amazon’s DynamoDB service based on the paper.

Dynamo

DeCandia et al’s Dyanamo is a distributed key-value store remarkable for it’s entirely decentralized architecture, SLAs that focus on 99.9th percentile latency, emphasis on never losing writes, and the notorious sloppy quorums. Supporting decentralized architecture requried several innovations, such as anti-entropy protocols like hinted handoff and read repair. Dynamo was originally built as an infrastructure rather than a service. To quote from the paper: “Each service that uses Dynamo runs its own Dyanamo instances”, “[Dyamo] is built for a trusted environment”.

Dynamo is a simple key-value store with flat keyspace. The keyspace is assumed to be quite large, hence partitioned across multiple nodes. The partitioning is done by a consistent hashing algorithm that envisions the keyspace as a ring, and makes a node responsible for an arc (usually an adjacent one) on the ring. Hence, each instance is called a Dynamo ring. For durability and high availability, each key is replicated across multiple nodes (called it’s replication factor, which is usually set to 3), often spanning multiple data centers. Thus, a ring spans multiple geographic locations. Each node in the ring maintains a ring membership information, capturing the current node’s view of the ring. This information is regularly updated via gossip. Any node in the ring accepts reads and writes for any key in the keyspace. If the node is not one of the replicas of the key, it forwards the request to a replica. For this purpose, each node maintains a preference list of nodes for each partition of the keyspace, which is consulted to decide which node should serve a request. Even when all nodes have a consistent view of ring membership, preference lists maintained by different nodes can still be different. For instance, if nodes A and B are both replicas for key k, then preference list for k in A has A at the top, whereas in B’s list has B at the top. In general, preference list for a partition at a node is ordered so as to minimize the latency to serve requests on key k starting from that node.

Like all distributed databases, the durability of each read (R) and write (W) is configurable. Under normal circumstances, if R+W N, where N is the replication factor, we get a system with quorum consistency that, for example, supports Read-My-Writes (RMW) guarantee. However, Dynamo’s quorums are not traditional (Lamport) quorums. In the event of a network partition, even when none of the replicas for a key are reachable, Dynamo still accepts writes for the key, allowing the reachable nodes to act as makeshift replicas. While this behavior in itself is not suprising, considering that data stores are often designed to be available even if one node is reachable, Dynamo allows the reachable nodes to organize themselves into a ring, and form quorums on each side of the network partition! Such quorums are called sloppy quorums. Thus even if writes and reads from a session are successfully executed with quorum consistency, we still may not get RMW.

Network partitions or otherwise, concurrent updates to a key are possible. Dynamo uses vector clocks to identify multiple versions of an object corresponding to a key. If vector clocks of two versions are totally ordered, then conflict resolution is trivial. If they are not, then Dynamo keeps both the versions and lets the application handle conflicts. As I shall demonstrate later, keeping multiple versions is particularly important in case of Dynamo, otherwise it may lose the entire set of writes submitted on one side of a network partition after conflict resolution.

DynamoDB

Present day DynamoDB is a multi-tenant hosted service offered by Amazon. The data model is more-or-less flat key-value, with some new additions (souce: core components of DynamoDB):

  • While the unique primary key of a data item also used to be its partition key, the primary key can now be defined as a combination of partition key (also called the hash key) and a sort key (also called the range key). Their combination needs to be unique for a data item.
  • The value needs to be a JSON with any number of attributes. While the primary key schema of each data item is fixed in a table, the schema of values is not. However, DynamoDB takes cognizance of attributes when a JSON is being stored, allowing secondary indexes to be created, and the attributes to be queried.

Some more relevant points from its documentation:

  • Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability.
  • Consistency across all copies of data is usually reached within a second.
  • A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.

It is not clear how strongly consistent read is implemented in DynamoDB. If it has to return the value of previous write regardless of the write consistency, then its consistency level has to be ALL. Alternatively, if it only returns values of previous strong writes, then both read and write can be QUORUM (strict quorum; sloppy won’t do). UPDATE: A stackoverflow user suggests that (a). DynamoDB only has only strict quorums (no sloppy quorums and hinted handoffs), (b). All writes in DynamoDB are written to a quorum of replicas, (c). By default, reads are served by a single replica, and (d). strong reads are served by a quorum of replicas. This makes sense.

Apart from regular writes, DynamoDB supports atomic in-place updates of attributes of a data item. For example, we can update user.name, user.address, and user.telephone for a given user.id in a single update api call to DynamoDB (one round-trip). The update can also be conditional, in which case it is applied if and only if the current value of the data item meets certain conditions. Conditional update is presumably implemented via paxos, giving it CAS semantics. To help construct in-place conditional update operations, DynamoDB defines a fairly expressive language for conditional expressions and update expressions (A primer on reading and writing items using expressions is here). The documentation says that conditional update is idempotent, because CAS is idempotent, but DynamoDB’s conditional update is more general than CAS. In general, conditional update is idempotent only if the update negates the condition.

Through conditional updates, DynamoDB already offers serializable transactions on (multiple attributes of) a single data item. But, this is only the beginning! As it turns out, DynamoDB also implements full-fledged multi-object transactions with variable isolation levels! (more here). DynamoDB currently defines three different isolation levels, without making any reference to the ANSI SQL standard. As described by the documentation:

  • Fully isolatated reads are performed through obtaining locks during a transaction, just like writes.
  • Committed reads are provide a consistency guarantee similar to eventually consistent reads, and are performed by reading the old copy of the item if a lock is detected.
  • Uncommitted reads (also known as dirty reads) are the cheapest, but are the most dangerous, since they may return data that will later be rolled back.

Roughly, they correspond to ANSI SQL Serializable, Read Committed, and Read Uncommitted isolation levels, respectively. Note that, unlike relational databases, where isolation level is set per-transaction, DynamoDB allows isolation level to be set per-read in a transaction. This is why there is no isolation level corresponding to Repeatable Read. Nonetheless, more analysis is needed to determine the exact guarantees offered by each of these isolation levels.

So, to summarize, DynamoDB offers quorum writes and weak reads by default. Application can request strong reads to get RMW, but they are twice as expensive. An update operation is a quorum write that lets (multiple attributes of) a data item to be upated atomically. It consumes one write capacity unit. So does a conditional update, that goes beyond CAS semantics to enable serializable transactions on a single data item. However, conditional updates may often fail, and retries consume more write units, making it expensive. Multi-item transactions with variable isolation levels are possible, and writes from transactions are very expensive. As per the documentation, a write from a transaction consumes roughtly 7N+4 write capacity units, where N is the size of the transaction. The cost model for each isolation level is not known, but fully isolated transactions are most expensive because it comes “at the cost of turning reads into relatively expensive writes.”

Sample Applications

  • This blog describes a simple game, where two players advance their positions via conditional updates.
  • Product catalogue case study is described here.
  • Tic-Tac-Toe game developed via conditional updates and multi-key transactions is described in this blog. The example is also discussed in this AWS re:Invent 2013 video. The same video also describes Dropcam’s experience using DynamoDB. The
  • Session logouts due to the lack of RMW is described here.
  • Manu transaction examples here.
Continue reading

Aug 19, 2016 - Disciplined Inconsistency

Today, in our reading group, we read an interesting paper titled “Disciplined Inconsistency” by Brandon Holt et al from UW CSE. This post is my notes on the paper.

Background

Modern day web-services often trade consistency for availability and performance. However, there exist some data and operations for which stronger forms of consistency are needed. Recognizing this, some data stores, including Quelea, implement stronger consistency guarantees, such as Causal Consistency. However, there are two major problems with the lattice-of-consistency-levels model espoused by Quelea:

  • In real applications, invariants over the data are almost never strict. Applications can tolerate certain amount of error, especially if it improves the performance. For example, Amazon can afford to oversell a discounted item by 2% of the available stock, but no more. Unfortunately, Quelea’s model is insensitive to such error bounds. Quelea expects applications to have exact invariants, such as non-negative balance or monotonicity, and attempts to statically determine appropriate consistency levels that can enforce these invariants without fail. In reality, applications execute under moderate-to-medium loads during most of their lifetimes, under which weaker consistency levels are more-or-less sufficient to maintain application invariants. Quelea’s consistency model lacks dynamic monitoring, thus cannot adapt consistency to the changing load conditions.
  • The flow of data between operations executing at various consistency levels is not controlled. Consequently, data being maintained at EC can make its way into the data being maintained at SC.

Holt et al propose a system called IPA (Inconsistent, Performance-bound, Approximate storage system) that addresses the above problems via a combination of static type system and runtime enforcement. In the following, I explain various aspects of IPA in greater detail.

IPA’s Programming Model

IPA is modeled as a data structure store, like Redis, Riak, and Hyperdex, and its Scala-based programming framework allows programmers to implement ADTs, such as sets and counters, as distributed data structures. To better understand IPA’s approach, let us distinguish between library writers, who implement ADTs, and app developers, who use ADTs. IPA expects library writers to define various versions of an ADT, each with a different combination of consistency policies for ADT operations. In Scala’s terminology, each combination is a trait of the ADT. For instance, Counter ADT can be defined with, say, two traits: a WeakOps trait that implements counter reads and increments using Cassandra’s weak (i.e., consistency level ONE) reads and writes, and a StrongOps trait that implements them via Cassandra’s QUORUM reads and writes, or via ONE writes and ALL reads. For more sophisticated ADTs, more combinations of consistency levels for operations can be implemented as traits. App developers use the ADTs defined by library writers by selecting a version with desired trait. For instance, an app developer may use the Counter ADT with StrongOps trait to obtain a strongly consistent counter, or with WeakOps to obtain a weakly consistent counter.

From IPA’s perspective, Strong and Weak are examples of static consistency policies. IPA also provides support for library writers to implement traits for dynamic consistency policies, such as LatencyBound and ErrorBound, which admit bounds on the latency of an operation and the error tolerance on its result (resp.). For instance, LatencyBound trait for Counter ADT can be implemented using the rush primitive offered by IPA that executes a read or a write under strongest consistency level possible within the specified time limit. Likewise, ErrorBound trait can be implemented via IPA’s support for escrow reservations. Both these mechanisms will be explained in detail later. An app developer, however, does not need to know about these implementation mechanisms. The developer can choose Counter with LatencyBound(50) trait to obtain a counter whose operations have a 50ms latency bound, or Counter with ErrorBound(10) trait to obtain a counter whose value can be off by at most 10% of the actual count.

IPA thus exposes different levels of abstractions to library writers and app developers. While app developers can think in terms of ADTs and high-level consistency policies, library writers have to face the ugliness of the underlying data store as they reason in terms of reads, writes, and low-level consistency policies, such as ONE, QUORUM etc. While IPA does provide some tool support (via rush and reservations), it does not raise the level of abstraction insofar as library writers are concerned. This is in contrast to Quelea where RDT writers also reason in terms of high-level consistency policies and RDT-specific effects.

Static Type System

The stated purpose of IPA’s consistency type system is to enforce consistency safety Informally, it “ensures that values from weakly consistent operations cannot flow into stronger consistency operations without explicit endorsement form the programmer”. Thus, the type system helps developers trade consistency for performance in a disciplined manner.

IPA’s implementation defines four consistency types:

  • Consistent[T]: A value of type T resulting from a strongly consistent operation (the StrongOps trait),
  • Inconsistent[T]: A value of type T resulting from a weakly consistent operation (the WeakOps trait),
  • Rushed[T]: A value of type T resulting from an operation executed with a latency bound (the LatencyBound trait), and
  • Interval[T]: A range of values of type T, each of which is equally likely to be the result of an operation executed with an error bound (the ErrorBound trait).

The return type of an ADT operation is either defined by the library writer while implementing the trait, or follows from the use of IPA primitives, such as rush. Thus, more consistency types can be defined by library writers. For example, consider a Facebook-like deployment where an object is only ever stored in a single data center. If an ADT is tailor-made for such deployment, its writer can implement its operations via LOCAL_QUORUM reads and writes, and claim per-object sequential consistency by annotating the return types of operations with a new Consistent[PoSC] type.

A subtype lattice over consistency types can also be defined by relying on Scala’s type system. For instance, here is a lattice defined by the IPA implementation, and presented in the paper:

IPATypeLattice

The result of annotating return values with consistent types is that app developers must now explicitly handle non-Consistent[T] values:

  • If a value is Inconsistent[T], the developer must explicitly endorse the value for it to be treated on par with Consistent[T] values.
  • If it is Interval[T], the result of an ErrorBound operation, then the developer has to handle an interval of values.
  • Rushed[T] is implemented as “a sum type with one variant per consistency level available to the implementation of LatencyBound” (It’s not quite clear to me how this is done). Thus, a Rushed[T] value has to be destructed and matched against Consistent[T], or Inconsistent[T], or any other user-defined (static) consistency type before it is used.

Thus consistency types let application be sensitive to the consistency level of a value. Alternatively, they force the application to adopt a disciplined approach while dealing with inconsistent values.

Run-time enforcement of consistency

Strong consistency is implemented using QUORUM reads and writes. As we describe in Representation without Taxation paper, such an implementation is likely to be buggy since Cassandra’s LWW conflict resolution drops writes due to inevitable clock drift across servers.

LatencyBound reads are implemented in IPA by issuing parallel read requests at different (Cassandra) consistency levels, and returning the strongest available result at the specified time limit. One drawback with this approach is that it floods network with read requests, which is undesirable esp. when system is operating at high loads. Therefore, the paper proposes an alternative approach that is based on monitoring latency. The approach first computes the likely latency (e.g., 90th percentile latency) of a read operation for each consistency level based on the recent history, and then issues the read at an appropriate level. For this purpose, the system monitors latencies of reads, grouped by operation and consistency level. “The monitor uses an exponentially decaying reservoir to compute running percentiles weighted toward recent measurements, ensuring that its predictions continually adjust to current conditions”.

IPA allows ADTs to implement error bounds making use escrow and reservation. The paper summarizes this concept well:

The idea is to set aside a pool of permissions to perform certain update operations (we’ll call them reservations or tokens), essentially treating operations as a manageable resource. If we have a counter that should never go below zero, there could be a number of decrement tokens equal to the current value of the counter. When a client wishes to decrement, it must first acquire sufficient tokens before performing the update operation, whereas increments produce new tokens. The insight is that the coordination needed to ensure that there are never too many tokens can be done off the critical path: tokens can be produced lazily if there are enough around already, and most importantly for this work, they can be distributed among replicas. This means that replicas can perform some update operations safely without coordinating with any other replicas.

Consider an increment-only counter X. Let us say X = 100 initially, and the error bound on X is 10%. Consider a case when 3 replicas are independently modifying X. Then, replicas can be issued 5 tokens each, capturing the fact that each can perform 5 increments before they need to synchronize in order to not violate error bounds. For instance, if all replicas perform 5 increments each without synchronizing, then each will return 105 as the value of X when the actual value is 115. However, 105 is still within 10% of 115, so this is alright. Thus, error bounds provide applications with some slack in implementing strongly consistent operations by allowing synchronization to be moved off the critical path.

IPA provides infrastructure support (via reservations server) for escrow reservations, and exposes its functionality via an API. The library writer is expected to make use of this functionality to implement ErrorBound consistency policy for his/her ADT.

An Aside

The escrow method was proposed in 1986 (and rehashed later), but there seem to be many recent papers on applying variants of escrow in various contexts:

  • Liu et al, Warranties for Faster Strong Consistency, NSDI’14 (paper).
  • Balegas et al, Putting Consistency Back into Eventual Consistency, EuroSys’15 (paper)
  • Balegas et al, Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants, SRDS’15 (paper)

I have the intuition that escrow methods would be useful to implement merge operation on data structures that support non-commutative and non-idempotent operations. Basically, if we allocate n tokens to each replica to perform a non-idempotent operation, such as a dequeue operation on a queue, n times (where n is the size of the queue), then merge operation becomes simple because a dequeue only dequeues an item present on all replicas. We also get the read at least once guarantee on the messages in the queue.

Experiments

The experimental methodology of the paper is interesting, and results are insightful. Experiments are performed for two different setups: simulation and real. In simulation experiments, they use multiple docker containers on the same physical node connected via the Ethernet to simulate multiple replicas. Network conditions are controlled via Linux’s tc netmem utility that introduces packet delay and loss at the OS level. In real experiments, they deploy IPA on Google’s Compute Engine with geo-distributed replicas.

Three kinds of experiments were performed: 1. With uniform 5ms network latency between replicas (simulating a local network in a data center under medium-to-high load), 2. With one replica being slower than other replicas (modeling variable network latency and hardware problems), and 3. With real geo-distributed replicas. Some highlights:

  • In experiments with a single slow replica, 50ms bound on the 95th percentile latency overloads the slow replica with double the requests (why?), and causes it to exceed the latency bound 5% of times.
  • In geo-distributed case, if all operations are SC, then the mean latency is 800ms, but if 1% error is tolerated (on the Counter), then the latency reduces significantly to 200ms. The latency difference between weak consistency policy and 5% ErrorBound consistency policy is negligible. This experiment demonstrates that it is possible to guarantee error bounds on the data without having to forego availability and performance.
Continue reading

May 31, 2016 - Extraction in Coq

Extraction erases Props

Extraction in Coq works by erasing Props. For example, consider the following definition of div:

    Definition div (m n : nat)(H : n<>0): nat :=
      NPeano.div m n.

div expects a proof that its second argument is non-zero. Indeed, in coq, it is impossible for div to divide by zero. However, when this code is extracted to OCaml, the n <>0 prop is erased (Coq renames our div to div0 to avoid name clash with NPeano’s div):

let div0 m n = div m n

Sumbool

Coq’s sumbool type is another good example to demonstrate the proof erasing nature of extraction. sumbool is defined in Specif module as following:

(** [sumbool] is a boolean type equipped with the justification of
    their value *)

Inductive sumbool (A B:Prop) : Set :=
  | left : A -> {A} + {B}
  | right : B -> {A} + {B}
 where "{ A } + { B }" := (sumbool A B) : type_scope.

sumbool is usually the return type of equality decision procedures of various types. For example, the string_dec, the string equality function has the type:

  forall s1 s2 : string, {s1 = s2} + {s1 <> s2}

Consider a type id defined as:

  Inductive id : Type :=
    T : string -> t.

A decision procedure ids can be constructed from string_dec as following:

 Theorem eq_id_dec : forall id1 id2 : id, {id1 = id2} + {id1 <> id2}.
  Proof.
     intros id1 id2.
     destruct id1 as [n1]. destruct id2 as [n2].
     destruct (string_dec n1 n2) as [Heq | Hneq].
     Case "n1 = n2".
       left. rewrite Heq. reflexivity.
     Case "n1 <> n2".
       right. intros contra. inversion contra. apply Hneq. apply H0.
  Defined 

Extracting the sumbool and eq_id_dec generates the following:

      type sumbool =
      | Left
      | Right
      (** val eq_id_dec : t -> t -> sumbool **
      let eq_id_dec id1 id2 =
        string_dec id1 id2

OCaml’s sumbool has no arguments because coq’s sumbool has only Prop arguments. The advantage of using sumbool instead of bool is that it can be used seamlessly in proofs and computations. When used in a computation, sumbool simply tells whether a property is true or false, but when used in a proof, sumbool also tells why a property is true or false.

Theorems can also be extracted

Observe that eq_id_dec has been written as a theorem. Theorem can be used to assert the existence of a witness to, not only Props, but also Sets and Types. For example:

  Theorem there_is_a_nat : nat.
  Proof.  apply 0.  Defined.

  Extraction there_is_a_nat.
  (* 
    let there_is_a_nat = O
  *)

Why do we say Defined instead of Qed whenever we are doing extraction? Qed defines proof/definition as opaque, whereas Defined defines it as transparent. If we used Qed instead of Defined extraction would produce:

(** val there_is_a_nat : nat **)

let there_is_a_nat =
  failwith "AXIOM TO BE REALIZED"

Note that theorems can only be extracted if the statement is either Set or a Type, not Prop. The following two examples should demonstrate this point. Example 1:

  Theorem there_is_plus: forall (n1 n2 : nat), exists (n :nat), n = n1 + n2.
  Proof.  intros.  exists (n1+n2).  reflexivity.
  Defined.

  Check (forall (n1 n2 : nat), exists (n :nat), n = n1 + n2). (* Prop *)

  Extraction there_is_plus.
  (*
      (** val there_is_plus : __ **)

      let there_is_plus =
        __
  *)

Contrast Example 1 with the following Example 2:

  Inductive plus_defined (n1 n2: nat) : Set :=
    | PlusT : forall(n:nat), (n=n1+n2) -> plus_defined n1 n2.

  Extraction plus_defined.
  (*
      type plus_defined =
        nat (* singleton inductive, whose constructor was PlusT *)
  *)

  Theorem there_is_plus_t: forall (n1 n2 : nat), plus_defined n1 n2.
  Proof.  intros.
  apply PlusT with (n:=n1+n2); reflexivity.
  Defined.

  Extraction there_is_plus_t.
  (*
    let there_is_plus_t n1 n2 = plus n1 n2
  *)

Why would anyone want to write a Set or a Type term as a proof of a theorem, rather than a Definition or a Fixpoint? If the Set or Type term expects Prop witnesses (like sumbool), then its better to write it as a proof. Sometimes, it may not even be possible to write the term otherwise. For example, here is a failed attempt at defining eq_id_dec as a Definition:

  Inductive id : Type :=
    T : string -> t.

  Theorem eq1 : forall (s1 s2: string), s1=s2 -> (T s1) = (T s2).
  Proof.
    intros.
    subst. reflexivity.
  Qed.

  Theorem neq1 : forall (s1 s2 : string), s1<>s2 -> (T s1) <> (T s2).
  Proof.
  unfold not.
  intros.
  apply H; inversion H0. subst; reflexivity.
  Qed.

  Definition id_eq (id1 id2 : id) : ({id1 = id2}+{id1 <> id2}) :=
    match (id1,id2) with
    | (T s1,T s2) => match string_dec s1 s2 with
                     | left A => left (eq1 s1 s2 A)
                     | right B => right (neq1 s1 s2 B)
                     end
    end.

The approach fails because the term left (eq1 s1 s2 A), which has the type {T s1 = T s2} + {?66}, fails to type check agains the required type {id1 = id2} + {id1 <> id2}. The problem is that while pattern matching under a definition, coq does not populate the context with equalities between scrutinees and patterns. Although we know that id1 = T s1 and id2 = T s2, we have no way of obtaining it from the context. Recall that we did not face this problem when proving eq_id_dec. This is why we sometimes prefer writing Set or Type terms as proofs.

Continue reading

Dec 5, 2015 - Notes - Terry's Session Guarantees

Comments

This post is the compilaton of my notes on Terry et al’s PDIS’94 paper: Session Guarantees for Weakly Consistent Replicated Data.

System Model

From what I understand, the paper is the first to describe the abstract system model of a weakly consistent replicated database that now serves as a frame of reference to describe KV stores (eg., Cassandra and Riak) and mobile computational nodes. The abstract system model is very similar to Quelea’s system model described in Sec.2 of the PLDI paper. Like in Quelea, where the state of an application at a replica \(r\) is the set of application-specific effects present at that replica (i.e., \({\sf vis}\) set at \(r\)), the application state at server \(S\) in Terry’s model the collection of app-specific Write’s at \(S\) (denoted \(DB(S)\)). Following is a more detailed comparison:

  • A Write in Terry’s model is an abstraction of creation, updatation, or deletion of concrete data items (collectively called CUD operations). Hence, multiple Write operations need not commute. The \({\sf WriteOrder}\) therefore assumes significance. A Write in Quelea is an abstraction of a single atomic effect. Quelea’s model in general encourages commutative Writes, and the model does not have a built-in notion of a \({\sf WriteOrder}\). Application state in Quelea is a set of effects, and the result of a Read operation is obtained by folding over this set. Nonetheless, Quelea model does not preempt CUD operations; it is possible to write applications with Read, Write and Delete operations, such as registers, in Quelea. However, applications have to implement their own mechanisms (eg., tagging effects with local timestamps) and conflict resolution semantics (eg., Last-Writer-Wins) to ensure convergence.
  • Both models rely on total propagation and consistent ordering to ensure eventual consistency. Total propagation can be achieved via anti-entropy mechanisms such as hinted-handoff and read-repair (more about these in this paper). In Quelea’s implementation, this is taken care by Cassandra. No further assumptions about total propagation are made. Consistent ordering for non-commutative writes is essential for convergence. Both models rely on application semantics to consistently order writes at all replicas. However, while Terry’s abstract model takes cognisance of the write order via the binary \({\sf WriteOrder}\) predicate, Quelea’s model (intriguingly) does not.
  • A Write in Terry’s model is an abstraction of multiple CUD operations, which appear to take place atomically. Hence, Terry’s Writes are in fact transactions with RC isolation level.

Session Guarantees

In Terry’s and Quelea’s perspective, a session is an abstraction of a sequence of Read and Write operations performed during the execution of an application. Note that operations from the same client session can be dispatched to different replicas, hence the result of a read operation in a session may not be consistent with the write operations performed previously in the same session. This could be very annoying to the end user. The purpose of the paper is to equip the aforedescribed abstract system model with additional guarantees that collectively ensure that the results of operations performed in a session will be consistent with the model of a single centralized server, possibly being read and updated concurrently by multiple clients.

Terry et al propose four session guarantees for their model, and, due to the close correspondence between both models, they can also be (nearly) stated as specifications in Quelea’s spec language. Below, I present both versions (Note: what Terry calls \({\sf RelevantWrites}\), Quelea calls\({\sf Visibility}\)):

  • Read Your Writes: If Read \(R\) follows Write \(W\) in a session and \(R\) is performed at server \(S\) at time \(t\), then \(W\) is included in \(DB(S,t)\).
    • \(\forall a,b.\, {\sf soo}(a,b) \Rightarrow {\sf vis}(a,b)\)
  • Monotonic Reads: If Read \(R_1\) occurs before \(R_2\) in a session and \(R_1\) accesses server \(S_1\) at time \(t_1\) and \(R_2\) accesses server \(S_2\) at time \(t_2\), then \({\sf RelevantWrites(S_1,t_1,R_1)}\) is a subset of \(DB_{}(S_2,t_2)\).
    • \(\forall a,b,c.\, {\sf vis}(a,b) ~\wedge~ {\sf soo}(b,c) \Rightarrow {\sf vis}(a,b)\)
  • Writes Follow Reads: If Read \(R_1\) precedes Write \(W_2\) in a session and \(R_1\) is performed at server \(S_1\) at time \(t_1\), then, for any server \(S_2\), if \(W_2\) is in \(DB(S_2)\) then any \(W_1\) in \({\sf RelevantWrites}(S_1,t_1,R_1)\) is also in \(DB(S_2)\) and \({\sf WriteOrder_{}}(W_1,W_2)\).
    • \(\forall a,b,c,d.\, {\sf vis}(a,b) ~\wedge~ {\sf soo}(b,c) ~\wedge~ {\sf vis}(c,d) \Rightarrow {\sf vis}(a,d) ~\wedge~ ?{\sf WriteOrder}(a,c) \).
  • Monotonic Writes: If Write \(W_1\) precedes Write \(W_2\) in a session, then, for any server \(S_2\), if \(W_2\) in \(DB(S_2)\) then \(W_1\) is also in \(DB(S_2)\) and \({\sf WriteOrder_{}}(W_1,W_2)\).
    • \(\forall a,b,c.\, {\sf soo}(a,b) ~\wedge~ {\sf vis}(b,c) \Rightarrow {\sf vis}(a,c) ~\wedge~ ?{\sf WriteOrder}(a,b)\).

Observe that Terry’s \({\sf WriteOrder}\) becomes \(?{\sf WriteOrder}\) in Quelea. The punctuation is to explicitly denote the fact that Quelea is unaware of \({\sf WriteOrder}\). To make the matters worse, Quelea does not expose \({\sf happens-before}\) order to application semantics. Consequently, a straightforward port of a Bayou (Terry’s system) application, which requires WFR or MW session guarantee, onto Quelea does not preserve application semantics.

Example Alice tweets “The chicken laid an egg”. Bob sees Alice’s tweet and tweets “A chicken hatched out of the egg”. If GetTweets is tagged with WFR spec in Quelea, then Cheryl, who sees Bob’s tweet is also guaranteed to see Alice’s tweet. However, she canot figure out the causal order among the tweets making her wonder what comes first: Alice’s chicken, or Bob’s egg.

How do we overcome the handicap described above? Some thoughts:

  1. The state of an application that admits non-commutative operations is not an unordered set of effects, but a partially-ordered set of effects with respect to the \({\sf happens-before}\) relation. Accordingly, Read operations in Quelea must reduce over a graph representing the poset, but not a list representing a set.
  2. Is there a way to explicitly track causality by tagging effects with vector clocks? Since each component of the vector must represent the monotonically increasing clock local to a replica, and there is no way to know the currentReplicaId in Quelea, this approach seems implausible. However, getCurrentReplicaId is a simple extension to Quelea, hence supporting this approach is easy. Unfortunately, explicitly tracking vector clocks is tedious and complicates application logic. Hence, first approach may be preferable.
Continue reading

Oct 5, 2015 - Effing Package Management (FPM)

If you are ever into making debian packages to distribute your software, check out this great package management tool called fpm by @jordansissel. FPM is no-nonsense package manager that lets you create packages by simply specifying dependencies, and source and destination paths for your binaries, libraries and includes.

Example

Let us say you want to make a debian package to distribute the first version of your hello-dev application, that provides a binary (hello), a dynamic library (libhello.so) and an include file (hello.h), which you want to be installed in the usual locations under /usr. Also, let’s assume that your application requires gcc version 4.6.3 or above. Creating a debian package for your is as easy as:

 fpm -s dir -t deb -n "hello-dev" -v 1.0 --description "Hello development tools" -d 'gcc > 4.6.3' ./hello=/usr/bin/hello ./libhello.so=/usr/lib/libhello.so ./hello.h=/usr/include/hello.h ./LICENSE=/usr/share/doc/hello-dev/copyright
Continue reading

Sep 30, 2015 - Understanding Transactions in Rails

In my previous post I have noted that Rails encourages application developers to rely on feral mechanisms, such as validations and associations, to ensure application integrity. In this post, I first explore various feral mechanisms in Rails, and how they are being used by some sample applications. Next, I will throw some light on how Rails actually enforces these feral mechanisms.

ORM and Active Record

Quoting from Ruby on Rails guide:

Object Relational Mapping: Object-Relational Mapping, commonly referred to as its abbreviation ORM, is a technique that connects the rich objects of an application to tables in a relational database management system. Using ORM, the properties and relationships of the objects in an application can be easily stored and retrieved from a database without writing SQL statements directly and with less overall database access code.

Active Record as an ORM Framework: ActiveRecord is Rails’s implementation of ORM. Active Record gives us several mechanisms, the most important being the ability to:

  • Represent models and their data.
  • Represent associations between these models.
  • Represent inheritance hierarchies through related models.
  • Validate models before they get persisted to the database.
  • Perform database operations in an object-oriented fashion.

Active Record Models: A class that extends ActiveRecord is an Active Record model, or simply a model. Each model has a corresponding table in the underlying database. For instance, defining the Product model:

class Product < ActiveRecord::Base 
end

will create a table with the following schema:

CREATE TABLE products (
  id int(11) NOT NULL auto_increment,
  name varchar(255),
  PRIMARY KEY  (id)
);

CRUD: To write a record to the the database, we first create an Active Record object (note: arg: denotes an argument named arg. Think of Ruby methods as functions accepting named variants):

product = Product.create(name: "Milk")

or

product = Product.new do |p|
  p.name = "Milk"
end

To save, ActiveRecord offers save method:

product.save

Reading can be done via multiple querying methods. Example:

all_prods = Product.all
milk = Product.find_by(name: "Milk").order(id: :asc)

Active record querying webpage has more details.

Updatation can be done via read-modify-save sequence:

milks = Product.find_by(name: "Milk")
milks.each do |m| 
  m.name = "Whole Milk"
  m.save
end

Note: Active Record uses SQL INSERT for new record saves, and SQL UPDATE for updatations. It uses new_record? flag to keep track of new Active Record objects that are not yet saved into the database (source: Active Record Callbacks guide).

Deletion is done using destroy:

milks = Product.find_by(name: "Milk")
milks.each do |m| 
  m.destroy
end

The corresponding records will be deleted from the database.

Associations

Associations are useful to keep track of semantic associations between various models. For instance, using a belongs_to association, the developer can let Rails know that every order in an eCommerce application belongs to utmost one customer (Note: the :name syntax denotes a symbol called name. Meta-level functions accept symbols and generate new code, which would then be spliced into original code (somewhat like MetaOCaml and TemplateHaskell). Ruby, being a dynamic language, does not treat meta-level functions specially.).

class Order < ActiveRecord::Base
  belongs_to :customer
end 

The result is that the order table will now have a column named customer_id that is expected to refer to id of the customer table. Order objects will also have a field named customer referring to the Customer object to which the order belongs to (note: the field can be nil). When an order object is saved, the id of its customer object will be saved to the customer_id field of the corresponding record in the database.

The other end of belongs_to association is usually a has_many association or has_one association. Using a has_many association, the developer can specify that a customer can have one or many orders:

class Customer < ActiveRecord::Base
  has_many :orders
end

Other than adding a field named orders to a customer object, specifying the has_many association will have no tangible effect; there is no change in Rails’s interaction with order and customer tables. However, we can qualify the has_many association with an optional dependent argument set to :destroy that asks Rails to destroy all orders placed by a customer, when the customer is deleted.

class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy
end

This behaviour corresponds to ON DELETE CASCADE behaviour offered by SQL for foreign key dependents.

With belongs_to association from Order to Customer, and has_many association from Customer to Order, with a dependent: :delete qualification, we have something close to, but not exactly, a foreign key constraint from the order table to the customer table. To effectively enforce a foreign key constraint, we need a guarantee that an order cannot exist without the corresponding customer existing in the customer table. Currently, this invariant can be violated if, for example, we save an order object referring to a customer object, whose corresponding record was already destroyed. The effective foreign key enforcement that prevents this scenario (theoretically, at least) can be achieved using Model Validations, described in the next section.

It is also possible to specify many-to-many relationships between models in a similar way to how such relationships are defined in relational databases: via an intermediary model. For example, in a microblogging application, a user can follow many users, and can have many users as followers. This many-to-many relationship between users can be specified via a third model called, say, relationships. Each user has_many following relationships and follower relationships. Each relationship belongs_to two users: user who is being followed, and the user who is following. This scenario is described in the picture below (source: Michael Hartl’s book. Released under MIT License):

MicroblogAssns

Observe that there is a transitive has_many relationship between users, which can be made explicit via through: argument to has_many association: a user has_many followers through: relationships, and has_many followed_users through: relationships.

Besides belongs_to, has_many, and has_one, Rails defines few other associations. They are described here.

Validations

Quoting from Rails guide on validations: Validations are used to ensure that only valid data is saved into your database. For example, it may be important to your application to ensure that every user provides a valid email address and mailing address. Model-level validations are the best way to ensure that only valid data is saved into your database.

Validations are primarily declared using validates method. For example, to ensure that user provides an email address during during a new account creation (i.e., email field is not nil), and to ensure that there are no existing accounts associated with this email, we can perform presence and uniqueness validations, respectively.

class Account < ActiveRecord::Base
  validates :email, uniqueness: true, presence: true
end

Rails also provides variants of validates specialized for common validations:

class Account < ActiveRecord::Base
  validates_presence_of :email
  validates_uniqueness_of :email 
end

validates_uniqueness_of is clearly useful to ensure primary key property. Besides being useful to preempt nil values, validates_presence_of is also useful to enforce foreign key constraints, when used in conjunction with associations. For instance, in the eCommerce example, the foreign key constraint from Order to Customer can be enforced by validating the presence of customer field.

class Order < ActiveRecord::Base
  belongs_to :customer
  validates_presence_of :customer
end 
class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy
end

Another useful validation is validates_associated. You should use this helper when your model has associations with other models and they also need to be validated. When you try to save your object, valid? will be called upon each one of the associated objects. For instance, when the Customer class validates associated orders:

class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy
  validates_associated :orders
end

All the orders associated with the customer are validated when the customer object is being save‘d (in this case, this entails a validation for the presence of the customer object owning orders, which is trivially true).

validates_associated works with all of the association types. Although it is by default turned off for most associations, it is by default on for has_many association. So, for the above example we needn’t explicitly specify validates_associated.

Note: Ruby has a concept of virtual attributes, which, in the context of ActiveRecord are attributes that do not get recorded in the database. Validations can also be defined over virtual attributes.

Custom validations

Validations, such as validates_presence_of, are built into Rails. Sometimes, built-in validations are not sufficient. Rails allows developers to define custom validations for this purpose. Custom validations can be defined as methods, which are used once locally, or they can be defined by implementing Rails’s ActiveModel::Validator interface. Examples of custom validators include Spree’s AvailabilityValidator, which checks whether an eCommerce inventory as sufficient stock available to fulfill an order. More on custom validations here.

When are validations enforced?

When an Active Record object is being persisted to the database (save), although this is not entirely accurate:

Validations are typically run before these commands (INSERT/UPDATE) are sent to the database. If any validations fail, the object will be marked as invalid and Active Record will not perform the INSERT or UPDATE operation. This helps to avoid storing an invalid object in the database. You can choose to have specific validations run when an object is created, saved, or updated.

There are many ways to change the state of an object in the database. Some methods will trigger validations, but some will not. This means that it’s possible to save an object in the database in an invalid state if you aren’t careful. - ActiveRecord validations callbacks guide

Validations and any callbacks registered on the state changes of the model object are queued for execution. This queue will include all your model’s validations, the registered callbacks, and the database operation to be executed. The whole callback chain is wrapped in a transaction. If any validation/callback method returns false or raises an exception the execution chain gets halted and a ROLLBACK is issued (source).

Sample Applications

We will now examine some sample applications to understand how validations and associations are being used.

Microblog

First is based on a small microblogging application from the Ruby on Rails tutorial by Michael Hartl.

The app defines 3 models - micropost, user, and relationship with following associations and validations:

  • Micropost belongs_to a user, and validates user’s presence.
  • A user has an email address; validates its presence and uniqueness.
  • A user has_many follower relationships, and through: those relationships, has_many followers. Follower relationships need to be destroyed if this user is deleted.
  • A user has_many following relationships, and through: those relationships, has_many followed users. Following relationships need to be destroyed if this user is deleted.
  • A relationship jointly belongs_to a follower user and a followed user, and validates their presence.

Following are some interesting operations that the app defines:

Adding a User: Adds a user after validating that the email is present and unique.

addUser(u) = transaction do
  assert(u.email != nil);
  dups := SQL "SELECT * FROM users WHERE email = `u.email` LIMIT 1"
  assert (dups == []);
  SQL "INSERT INTO users VALUES (`freshId()`,`u.email`,`u.name`)";
  followUser(thisUser, thisUser); /* One must follow oneself */

Follow a User: Makes the current user follow another user.

followUser(thisUser,thatUser) = transaction do
  usr1 := SQL "SELECT * FROM users WHERE id = `thisUser.id` LIMIT 1";
  assert (usr1 != []);
  usr2 := SQL "SELECT * FROM users WHERE id = `thatUser.id` LIMIT 1";
  assert (usr2 != []);
  SQL "INSERT INTO Relationships VALUES (`thisUser.id`,`thatUser.id`)";

Unfollow a User: Makes the current user unfollow other user.

unfollowUser(thisUser,thatUser) = transaction do
  SQL "DELETE FROM Relationships WHERE follower_id = `thisUser.id`
                                 AND followed_id = `thisUser.id`";

getFollowers: Returns the list of users following the current user.

/* Implementation 1 : from the original sample app */
getFollowers(thisUser) = 
  SQL "SELECT * FROM users INNER JOIN relationships ON
      users.id = relationships.follower_id WHERE
      relationships.followed_id = `thisUser.id`"

/* Implementation 2: If the data store does not support joins
            (e.g: Cassandra) */
getFollowers(thisUser) = transaction do
  rels := SQL "SELECT * FROM relationships WHERE 
                  followed_id = `thisUser.id`";
  followers := [];
  rels.each |rel| do
    /* The following pattern-match must always pass because:
        1. Relationship model validates presence of follower
           before persisting.
        2. When a user is deleted, all the dependent
           relationships are deleted as well.
        3. User model validates uniqueness when persisiting. */
    [follower] = SQL "SELECT * FROM users WHERE 
                    id = `rel.follower_id`";
    followers := follower::followers;
  return followers;

getFollowedUsers: Returns the list of users that the current user is following. Implementation similar to getFollowers described above.

postBlog: Posts a microblog on behalf of the current user. Validates the presence of current user record before persisiting the microblog.

postBlog(thisUser,blog) = transaction do
  usr = SQL "SELECT * FROM users WHERE id = `thisUser.id` LIMIT 1"; 
  assert(usr != []);
  SQL "INSERT INTO microposts (content, user_id) 
       VALUES (`blog.content`, `thisUser.id`)";

getTimeLine: Get a list of microposts by a user (the call is: Micropost.includes(:user).from_users_followed_by(user)).

getTimeLine(user) = transaction do
  posts := SQL "SELECT * FROM microposts WHERE 
                  user_id = `user.id`";
  posts.each |post| do
    post.user := (SQL "SELECT * FROM users WHERE 
                  id = post.user_id LIMIT 1").first;
  return posts;

The SQL call to users table never returns an empty collection because:

  • A micropost belongs_to a user, and checks the presence of the user before persisting, and
  • When a user is deleted, the dependent microposts are also deleted.

Therefore, .first on the collection returned by SQL is always valid.

getFeed: Get a list of microposts by users being followed by the current user.

getFeed(user) = transaction do
  followed_ids := SQL "SELECT followed_id FROM relationships
                        WHERE follower_id = `user.id`";
  uids_of_interest := user.id :: followed_ids ;
  posts := SQL "SELECT * FROM microposts WHERE 
                  user_id IN `uids_of_interest`";
  posts.each |post| do
    post.user := (SQL "SELECT * FROM users WHERE 
                  id = post.user_id LIMIT 1").first;
  return posts;

deleteUser: Delete the current user. It has to enforce dependent: :destroy on microposts and relationships.

deleteUser(thisUser) = transaction do
  /* First, delete all microposts by this user */
  SQL "DELETE FROM microposts WHERE user_id = `thisUser.id`";
  /* Next, delete all relationships on this user */
  SQL "DELETE FROM relationships WHERE 
    follower_id = `thisUser.id` OR
    followed_id = `thisUser.id`";
  /* Finally, delete the user */
  SQL "DELETE FROM users WHERE id = `thisUser.id`";
Continue reading

Sep 28, 2015 - Understanding Transactions in Quelea

Quelea is our eventually consistent data store with an associated programming framework intended to simplify programming under eventual consistency. In this post, I describe how various applications written in Quelea employ a combination of highly available and serializable transactions to enforce application integrity. Three applications participate in this survey:

  • BankAccount: A highly available concurrent bank account application.
  • Microblog: A twitter-like microblogging site, modeled after Twissandra. The application allows adding new users, adding and replying to tweets, following, unfollowing and blocking users, and fetching a user’s timeline, userline, followers and following.
  • Rubis: An eBay-like auction site. The application allows users to browse items, bid for items on sale, and pay for items from a wallet modeled after a bank account.

First let me define what I mean by a transaction:

What is a transaction?

For all practical purposes, a transaction can be defined as a set of operations on some data, which, when executed, appear to have been executed simultaneously. We say that the set of operations has taken effect atomically. Atomicity implies that the set of effects generated by a transaction T is visible to operations in other transaction T' in its entirety, or it is not visible at all. From the perspective of the other transaction T', an operation op' in T' either sees all the effects of a transaction T, or it sees none of them. Note that this is the requirement of Read Committed (RC) isolation level. Hence, atomicity being a defining property of a transaction means that every transaction automatically experiences RC isolation level. Note that a set of effects that offers atomicity, but with stronger isolation properties than RC is also a transaction.

Transactions are used for in various applications towards different ends. Below, I describe some of the common purposes served by transactions in some sample applications:

To perform atomic updates

Transactions are primarily used in practice when we want to perform a series of actions on the data taking it through some intermediary states, where the integrity of data may be violated. In order to not expose these intermediary states, we want to wrap these actions inside a transaction so that, for observers, it appears as if all actions have been committed atomically. Typically, these actions perform updates to multiple tables or materialized views, or multiple rows in the same table. Some usecases:

  1. BankAccount:
  2. When a user saves money, then withdraw operation on the checking account, and deposit operation on the savings account should appear to have happened atomically. Intermediary state (after withdraw, but before deposit) may violate app-specific integrity constraints (e.g: user shown incorrect total bal, or user incorrectly penalized for insufficient bal etc).
  3. Microblog (Twitter):
  4. When a user (B) unfollows another user (A), then B has to be removed from A’s follower list, and A has to be removed from B’s following list. Both updates should appear to have happened atomically. Intermediate state violates app-specific integrity constraint that follows is a symmetric relation.
  5. When a user tweets, the tweet should be added to the table of all tweets, and its Id should be added to the userline materialized view against tweeter’s userId, and timeline materialized view against the userIds of all the followers. All insertions should appear to have happened simultaneously. Intermediary states may violate (a) app-specific integrity constraints (e.g: user B sees a tweet by A in his timeline, but doesn’t find the tweet in A’s userline), and (b) referential integrity (happens if data store can reorder operations, like in the case of EC stores).
  6. When user (A) blocks user (B), then B should be forced to unfollow A (the unfollow operation needs to be performed in the same way as above). Furthermore, to prevent B from re-following A, B needs to be added to A’s Blocks list, and A needs to be added to B’s IsBlockedBy list, both in the user table itself. All changes must commit atomically.
  7. Rubis (Auction site):
  8. When a user (A) bids for an item (I), then following updates need to happen atomically:
    • The bid information needs to be added to the Bids table against a new bidId.
    • bidId needs to be added against I’s itemId in ItemBids materialized view.
    • bidId needs to be added against A’s userId in UserBids materialized view.
    • I’s maxBid needs to be updated against I’s itemId in the Item table.
      The intermediate states may violate (a) app-specific integrity constraints (e.g: bid appears as top bid on the item page, but doesn’t appear in the list of bids by the bidder), and (b) referential integrity (happens if data store can reorder operations, like in the case of EC stores).
  9. When a user cancels his bid, all the above insertions need to be deleted atomically. Intermediate states may violate referential integrity (under reordering of operations).
  10. When a user (A) offers an item (I) for auction, then I needs to be added to the Items table, and its itemId against A’s userId in the UserItems table/materialized view, simultaneously.
  11. When the auction concludes, the above insertions need to be deleted atomically. Intermediate states may violate referential integrity (under reordering of operations).

To ensure consistent reads

Atomicity only guarantees that a transaction’s effects, if visible, are visible in their entirety. It does not prevent effects from becoming visible to only a subset of operations in a different transaction. Therefore, in many cases, atomicity of a write transaction itself is not sufficient to ensure that all reads witness consistent version of data; we need certain isolation gurantees on reads. Applications use transactions to achieve isolation. Usecases:

  1. BankAccount:
  2. When a user (A) saves money (an atomic transfer from checking to savings account), and immediately checks the total balance in her accounts by issuing two getBalance operations, one on each of her accounts, she might see an inconsistent balance. This can happen if first getBalance witnesses the effects of save transaction, but second does not, or vice versa. To prevent this from happening, both getBalance operations should be wrapped inside a transactions, which needs to be executed in isolation with respect to a consisten snapshot of the database.
  3. Microblog:
  4. A read operation on user A’s followers list might tell us that user B follows A, but a read of user B’s following list might return an inconsistent result. This happens if first read witnessed the followUser transaction whereas second read did not. This situation can be prevented by wrapping both reads in a transaction and insisting that this transaction be executed in isolation with respect to a consistent version of the database.
  5. When retrieving a user(A)’s profile using username, we perform two reads - one to fetch user’s uuid from his username, and other to fetch the profile details indexed by uuid. When the user A tries to view her profile immediately after registering, the first read may succeed but second read may fail due to the absence of relevant record. This happens if the first read witnesses the effects of addUser transactions, whereas the second read does not. This situation can also be avoided by wrapping both reads in a transaction and running it under an appropriate isolation.
  6. Rubis:
  7. When displaying the list of all bids on an item, we might encounter an instance of referential integrity violation, although none actually exists in the data. This can happen if a read operation on ItemBids table reads latest version of the table, whereas the subsequent read on the Bids table reads an earlier version, which may not contain certain bids. Fix for this case is same as above.

To perform consistent (integrity-respecting) updates

Making atomic updates to the data in complete isolation can still leave the data in an invalid state. This happens when multiple such atomic updates succeed in complete isolation, but the merging resultant states results in an inconsistent state. Applications use serializable transactions to preempt the possibility of a concurrent conflicting update. Usecases:

  1. BankAccount
  2. A transfer transaction from account A to account B has to be serializable with respect to all other transfer transactions from account A to gaurantee application invariant of non-zero balance.
  3. Microblog:
  4. When deactivating the account of user (A), the userId has to be removed from the following list of all her followers, and subsequently from the user table. All operations should appear to have happened simultaneously, so they have to be wrapped in a transaction. Furthermore, to prevent violation of referential integrity, the transaction has to be serializable with respect to all addFollower transactions adding followers to A.
  5. Rubis:
  6. concludeAuction and cancelBid transactions both can independently succeed possibly resulting in a state, where a bid is simultaneously a winning bid and a canceled bid. To avoid this inconsistency, cancelBid transaction needs to be serializable with concludeAuction transaction.

Notes

  • Circular referencing.
  • Let a transaction T contain an SC operation. If a transaction T' requests RR isolation w.r.t T, then T and T' are automatically serializable.
Continue reading

Sep 24, 2015 - Notes - Feral Concurrency Control

Comments

This post is a compilation of my notes on Peter Bailis et al’s SIGMOD’15 paper: Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity.

Background

Modern relational DBMs offer a range of primitives to help the developer ensure application integrity, even under the presence of concurrency: built-in integrity constraints (e.g: UNIQUE, NOT NULL, FOREIGN KEY), triggers, stored procedures to implement application-specific integrity constraints, optimistic/pessimistic locking of rows, serializable transactions etc. However, modern web application frameworks (e.g: Ruby on Rails), which promote ORM-based programming, refrain from using most of the primitives provided by the DB layer, instead choosing to reimplement them in the application layer. Consider Ruby on Rails for instance, which is an MVC web framework. Ruby on Rails lets application developers specify their data model (M in MVC) as a set of classes that implement Rails’ native ActiveRecord class. The data model can also contain application integrity constraints in form of validations and associations. For example, presence and uniqueness are a couple of validations, and belongs_to and has_many are a couple of associations supported natively by Rails. Rails implements object-relational mapping (ORM) by automatically mapping the classes (resp. objects) that constitute the data model (resp. data) to relations (resp. tuples) in the underlying DB, provided that the application code adheres to some pre-defined naming conventions. However, validations and associations are not mapped to corresponding itegrity constraints at the DB level. For instance, uniqueness validation is not mapped to MySQL UNIQUE primitive, but instead implemented by rails by performing a SELECT query and making sure that the record being inserted is not already present (all this is done within a transaction). Likewise, instead of mapping belongs_to association to FOREIGN KEY constraint native to DB, rails implements it by running a SELECT WHERE query to check that the key being referred to is indeed present.

Eschewing native database integrity and concurrency control solutions, Rails has developed a set of primitives for handling application integrity at the application level itself, thereby building, from the perspective of the underlying DB, a feral concurrency control system. The motivation behind this apparent irrationality is to enhance maintainability of the system, and to facilitate testing. Let me elaborate:

  • Maintainability: The set of available integrity and concurreny control mechanisms at the database layer depends on the data model employed by the underlying database, and also, in some cases, on the vendor. For instance, while relational data model supports foreign key constraints as a means to implement referential integrity, data models employed by weakly consistent data stores do not. Even among relational data stores, some (e.g: PostgresSQL) support foreign key constraints, while some other (e.g: MySQL’s MyISAM and NDB storage engines) do not. Likewise, the CHECK constraint used to check domain-specific integrity constraints, such as bal ≥ 0, is supported by PostgresSQL, but not supported by MySQL. Futhermore, besides standard validations and associations, such as uniqueness and foreign key constraints, Rails allows developers to define application-specific validations and associations, for which the corresponding native primitives may not exist, and even if they exist, it is not clear how to do the mapping. To make matters worse, DB systems usually silently ignore any constraint that they do not support. Therefore, an application that relies on DB’s native constraints to enfore integrity risks correctness, is not portable, and is consequently difficult to maintain.
  • Testing: Database’s referential integrity constraints “get in the way of” testing by insisting that test data be inserted/deleted into/from the database in a specific order such that the integrity is never violated. In many cases, developers want to simply dump the sample data (known to be consistent) into the database and get along with testing their application. Requiring that such data still be inserted/deleted respecting referential integrity might be an overkill.

The aforementioned reasons, along with some personal convictions of its creators, has motivated Rails to eschew concurrency and integrity controlling mechanisms at the database layer, and view it simply as a table storage. This approach of Rails has been hugely successful, as evident from its large-scale adoption.

Replacing database native primitives with feral mechanisms may improve maintainability of the application, but does it really work? Are the feral invariants correctly enforced in Rails? Do they work in practice? This paper performs theoretical analysis and emperical studies to answer these questions.

How prevalent are feral mechanisms in practice?

They are quite prevalent. Analyzing a corpus of 67 popular opensource web applications, authors found that applications, on average, used just 3.8 transactions against 52.4 validations and 92.8 associations (median could’ve been a better metric). The overwhelming number of associations relative to transactions indicates that Rails developers use associations to perform referential integrity preserving insertions into multiple tables, which is otherwise performed in a transaction.

Rails Concurrency Control Mechanisms

Implementing validations (e.g: uniqueness) and associations (e.g: belongs_to) in the application layer is trivially safe in the sequential setting. But, in the presence of concurrent client requests attempting to modify the database simultaneosly, some concurrency control mechanisms are needed in order to ensure that validations and associations are indeed being enforced correctly. Otherwise, data integrity is jeopardized. Towards this end, Rails does the following:

  • When a model state is being updated (for e.g., during an insertion), Rails runs all the validations on the model sequentially, in a (database-backed) transaction. The rationale is to rely on ACID guarantees of the transaction to ensure the correctness of validation. For instance, uniqueness validation (by issuing a SELECT query to the database) is performed within a transaction so as to preempt concurrent insertions that may result in duplicates.
  • Associations are validated in the similar way: by enclosing the the corresponding validations inside a transaction.

If transactions are serializable as expected, then validations are indeed safe. However, databases do not offer serializable transactions by default (the default isolation level in PostgresSQL is RC. In MySQL (InnoDB), it is RR.), and, in some cases, they do not offer serializability at all. Given the possibility of concurrent transactions under weaker isolation levels (for e.g., RC ensures that visible transactions are entirely visible. It doesn’t guarantee total order among transactions.), Rails validations may not really ensure the validity of data. Examples:

  • Concurrent uniqueness validations can independently succeed, thus allowing concurrent duplicate insertions.
  • Association (e.g: foreign key) checking in one transaction, and record deletion in a concurrent transaction can independently succeed, thus compromising referential integrity.

Rails acknowledges these anomalies. The documentation warns that uniqueness validation can still occassionally admit duplicates, and association validation can occassionally result in violation of referential integrity.

How likely are these anomalies in practice?

Not very likely. For instance, in LinkBench workload capturing Facebook’s MySQL record access pattern, with 64 concurrent clients issuing 100 insert requests per second on 1000 different keys, an average of 10 duplicate records were observed when uniqueness validation is turned on. In less adversarial production workload, we may get much less number of duplicates, or maybe no duplicates. Therefore, it may be argued that validations are “good enough” for many web applications, where correctness is not a top priority.

However, concurrency is not the only reason for incorrect application behaviour. Quite often, anomalies might also result because of the incorrect implementation of a validation/association in Rails framework, or because of some non-trivial interactions between various application-level validations that Rails developers haven’t foreseen. For instance, in Rails, delete_all operation, unlike the delete operation, doesn’t create objects for rows it is deleting, thus failing to perform validations on the data being deleted (note: validations are performed only when a data model object is updated). This could result in the violation of referential integrity, resulting in http error messages being shown to user, as in the case of thoughtbot, a popular blogging platform. Thoughtbot has since started relying on PostgresSQL’s foreign key constraints to enforce referential integrity. Similar experiences have prompted few other Rails users to start a forum called Rails devs for data integrity (ref) that advocates strengthening Rails validations with database-backed checks.

What can be done about the anomalies?

Most databases natively support UNIQUE constraint, which ensures absence of duplicates at no extra expense. It is therefore a shame if one is forced to choose an incorrect implementation instead of a similar performing correct implementation just for software engineering reasons. Can something be done about this?

One solution is to insist that all validations are done inside serializable transactions (i.e., choose serializability instead of the default RC or RR isolation level at the database layer). This trivially ensures correctness of all feral implementations. Unfortunately, serializability comes at the cost of availability, which is a more important in the context of web applications. Moreover, the study finds that 75% of application invariants do not need serializability for correct enforcement. Imposing serializability by default is therefore unjustified.

The paper concludes that there is currently no database-backed solution that “respects and assists with application programmer desires for a clean, idiomatic way means of expressing correctness criteria in domain logic”. Authors believe that “there is an opportunity and pressing need to build systems that provide all three criteria: performance, correctness, and programmability.” To domesticate feral mechanisms, the authors argue, application users and framework authors need a new database interface that will enable them to:

  1. Express correctness criteria in the language of their domain model, with minimal friction, while permitting their automatic enforcement. Any solution to domestication must respect ORM application patterns and programming style, including the ability to specify invariants in each framework’s native language. An ideal solution for domestication should provide universal support to applications’ feral invariants with no additional overhead for application developers.
  2. Only pay the price of coordination when necessary. As already discussed, serializable transactions only when needed.
  3. Easily deploy the framework/application across multiple database backends. For example, the interface must allow Rails to talk to a relational store, as well as a key-value store.

Notes

“Rails can’t be trusted to maintain referential integrity, but you know what’s really good at doing that? Our relational database.” - thoughtbot

“Even when both systems are configured to one of the strict levels of transaction locking, the differences between the two implementations are subtle enough that which implementation will work better for a particular application is hard to state definitively. “- postgresql wiki.

Continue reading

Jul 31, 2015 - Atomicity vs Isolation

From the perspective of a transaction,

  • Isolation: How should I see effects of other transactions.
  • Atomicity: How other transactions see my effects.