Nov 25, 2019 - Vagaries of Git Merge

Git’s merge algorithm seems to have inexplicable semantics leading to some interesting cases. I describe a couple of examples below. The bottom line is that:

  1. Git merge is inconsistent: Merge result seems to depend not just on what versions were merged, but also how they were merged. Thus two branches that pulled the same set of user commits may end up with different versions.
  2. Git merge is unintuitive: The result of a successful (auto) merge may not necessarily be what we expect.

Note that I distinguish between user commits and merge commits in this post. Assuming that all merges are automerges (i.e., there are no merge conflicts), I expect two branches with same history of user commits to be consistent, i.e., have the same final version.

Example 1

This example shows that Git automerging leads to inconsistent results. Thus if you and your colleague are working on parallel branches with exact same set of user commits, your code may still have vulnerabilities and bugs that your colleague’s code doesn’t. So you have to independently review your code notwithstanding your colleague’s assurances.

To demonstrate this point, let’s start with a file foo.js containing Javascript/JQuery pseudo-code:

/*some computation on data happens here */

  type: "POST",
  url: url,
  data: encrypt(data),
  success: function(response) {
  dataType: "text"

Let’s commit it to master:

git init; git add foo.js; git commit -m "foo.js added"

Now fork two branches, temp1 and temp2, from master:

git checkout -b temp1; git checkout master
git checkout -b temp2; git checkout master

With three branches in place, let us manifest the version control graph shown in the below figure. Vertices in the graph are distinct versions. The diff of each version w.r.t its parent is highlighted. Edges leading to a version are labeled with the Git actions that led to the version. In case of a git commit, only the commit message is shown (in quotes). The actions are numbered, and each action is explained in greater detail below the figure. Each of the three branches is assigned a different color background.

Now take a look at the version graph, and follow the explanation below the figure to understand it.


Actions 1 and 2 fork branches temp1 and temp2 as described previously. Actions 3-9 do the following:

  • Action 3: On the master branch, we might decide to remove encryption as data contains nothing sensitive.
  • Action 4: Your colleague Alice starts development on temp2 by fast-forwarding it to current master (your latest version).
  • Action 5: Independently on temp1 branch, your other colleague Bob decides to add sensitive information (password) to data since it is anyway sent encrypted.
  • Action 6: You pull the latest changes from Bob. Git automerges, and says everything is fine. You don’t do anything.
  • Action 7: On temp1, Bob may decide to commit a version that sends the data twice, for whatever reason.
  • Action 8: Alice pulls Bob’s latest version from temp1 onto temp2. She decides to review the code and sees two AJAX POSTS - first without data encryption and second with encryption. Since password is only sent the second time, she finds nothing wrong with the code, hence signs it off.
  • Action 9: Now you pull Bob’s latest version from temp1. You want to review the code, but learn that Alice has already signed off her code on temp2 which has the same history of user commits as yours (see below). Relying on Alice’s judgment, you sign off your code and deploy in production.
  • Your code now sends passwords unencrypted although Alice’s doesn’t.

You can see the this example on Github with few minor changes: I used branch main instead of master, and sha256 instead of encrypt.

The final commit histories of main and temp2 are shown below. Observe that both branches have same set of user-generated commits (3e3256, 31001b, 73a645, and 8004f4). The merge commits are all result of Git’s automerge.

$git branch                    
* temp2                                             
$git log                       
commit 291329350e5ede0163ebdf9cb31ab0ad207ebb18 (HEAD -> temp2)
Merge: 31001b0 8004f40                              
Author: GitUser <>             
Date:   Mon Nov 25 07:25:45 2019 -0500                          
    Merge branch 'temp1' into temp2                                         
commit 8004f40b718538d53458f2218b52f81c8c7e43a1 (temp1)
Author: GitUser <>                                                                  
Date:   Sun Nov 24 20:06:16 2019 -0500                                                                   
    two ajaxes now                                  
commit 73a645e9803b74dd9eac212ec706eb01fa328c5e                                                          
Author: GitUser <>                                              
Date:   Sun Nov 24 20:03:01 2019 -0500                  
    pwd added                                       
commit 31001b08380cad566d83b14ec1adecbb2836e399                                                          
Author: GitUser <>                                                                  
Date:   Sun Nov 24 19:52:49 2019 -0500              
    removed sha256                                  
commit 3e325647d435895b8b1f526fbde9bd9208a555e7 (master)
Author: GitUser <>             
Date:   Sun Nov 24 19:50:52 2019 -0500              
    adding foo.js 

$git checkout main                                                                  
Switched to branch 'main'                           
$git log                                                                            
commit a52b2517b747d7804fe02ae5bb7425ea57b5908f (HEAD -> main)                                           
Merge: b2c3403 8004f40                                                                                   
Author: GitUser <>             
Date:   Sun Nov 24 20:06:29 2019 -0500              
    Merge branch 'temp1' into main                                                                       
commit 8004f40b718538d53458f2218b52f81c8c7e43a1 (temp1)                                                  
Author: GitUser <>                                                                  
Date:   Sun Nov 24 20:06:16 2019 -0500              
    two ajaxes now                                                                                       
commit b2c3403b4254bc01553871d06526c438c53f3952     
Merge: 31001b0 73a645e                                                                                   
Author: GitUser <>             
Date:   Sun Nov 24 20:03:23 2019 -0500                                                                   
    Merge branch 'temp1' into main                  
commit 73a645e9803b74dd9eac212ec706eb01fa328c5e                                                          
Author: GitUser <>                                                                  
Date:   Sun Nov 24 20:03:01 2019 -0500                                                                   
    pwd added                                          
commit 31001b08380cad566d83b14ec1adecbb2836e399                                                          
Author: GitUser <>             
Date:   Sun Nov 24 19:52:49 2019 -0500              
    removed sha256                                                                                       
commit 3e325647d435895b8b1f526fbde9bd9208a555e7 (master)
Author: GitUser <>
Date:   Sun Nov 24 19:50:52 2019 -0500

    adding foo.js

Example 2

In the previous example, branches main and temp2 obtained the same set of user commits, but with different number of pulls/merges. main pulled from temp1 in two steps – first merging the intermediate version, and then merging the final version, whereas temp2 pulled only once – the final version. The inconsistency shows that the granularity of merges matter. I will now describe another example, which demonstrates that the merge result also depends on the order of merging branches. That is, two branches pull the same set of user commits, but still end up with different versions because they pull the commits in different orders. The example also involves a scenario where automerge succeeds, but result doesn’t agree with our intuition.

Similar to the previous example, let’s start with a file foo.js containing Javascript/JQuery pseudo-code:

/*some computation on data */

  type: "POST",
  data: encrypt(data),
  url: url,
  success: function(response) {
  dataType: "text"

Let’s commit it to master:

git init; git add foo.js; git commit -m "foo.js added"

Fork four branches, a, b, c, and d:

git checkout -b a; git checkout master
git checkout -b b; git checkout master
git checkout -b c; git checkout master
git checkout -b d; git checkout master

On each of the four branches, we make distinct changes to foo.js and commit. The four new commits are visualized in the figure below.


Let us now fork one more branch, temp, from master:

git checkout master; git checkout -b temp

We will now merge branches a, b, c, and d into master and temp in different orders and show that the resultant versions are different.

  • On master, the merge order is a → c → b → d
  • On temp, the merge order is d → c → b → a

Merges on master branch are visualized in the following figure. Each labeled edge represents a merge from the corresponding branch (branch heads are shown in the previous figure). As usual, the diff of each version w.r.t its previous version is highlighted.


As an aside, consider the merge from branch b. I would have expected the merge result to be a sequence of encrypted AJAX, followed by data + pwd logic, and finally unencrypted AJAX. A merge conflict would have been fine too! But the actual result has data + pwd logic twice interspersed by AJAX calls with unencrypted and encrypted data (resp.). This is puzzling.

Let us now focus on the temp branch. The merges on temp are visualized below:


Here again, the result of last merge (from a) is not what I expected. Given that the contents of a are contained inside the the merging temp version in the same order, I expected the merge to be the same as the previous version on temp. The actual result replicates data + pwd logic twice, and I don’t understand why.

Leaving aside the concerns of intuitiveness, one can clearly see that that master and temp have different head versions even as both pulled the exact same versions albeit in different order. This shows that order in which merges are done does matter, which is a bit disconcerting.

This example too is available on Github with minor variations.


The examples above demonstrate that Git merge could lead to inconsistent and unintuitive results. People who understand Git’s merge algorithm can perhaps foresee such behaviors and work their way around them, however that’s not a satisfactory justification for a well-designed system must have predictable behavior based on its interface, not its internals.

PS: If you know a definitive and concise description of Git’s merge algorithm, I’d be grateful if you can drop me a note here in comments or on twitter

Continue reading

Sep 25, 2017 - First-Class Modules and Modular Implicits in OCaml

Note: If you are reading this on, please read it on my blog, instead; it’s much nicer.

I was pleasantly surprised to discover that OCaml has been supporting modules as first-class objects since v3.12 (2011). Intuition suggests that first-class modules should be expressive enough to simulate Haskell-style typeclasses in OCaml. Turns out this is the route taken by Leo White et al to introduce adhoc polymorphism via Modular Implicits to OCaml. White et al’s paper describes their principled approach in detail, and is definitely worth a read. In this blog, I present a series of OCaml examples involving (first-class) modules that gradually build up to modular implicits. The full code (with directions to run) is available here.

First lets recall the basics of modules and module types (signatures) in OCaml. Lets define a module type T1:

  module type T1 = sig
    type a 
    val to_string: a -> string
    val eg: a

Lets also define a couple of modules of type T1:

  module IntT1 : (T1 with type a = int) = struct
    type a = int
    let to_string = string_of_int
    let eg = 0
(* module IntT1 : sig type a = int val to_string : a -> string end *)

  module BoolT1 : (T1 with type a = bool) = struct
    type a = bool
    let to_string = string_of_bool
    let eg = false

Below is a functor that creates a module of type T1.

  module XT1(X:sig 
                type t 
                val to_string: t -> string
                val eg: t
            end) : (T1 with type a = X.t) = struct
    type a = X.t
    let to_string = X.to_string 
    let eg =

The type of XT1 is:

  module XT1 :
    functor (X : sig type t val to_string : t -> string end) ->
        sig type a = X.t val to_string : a -> string end

OCaml has first class modules. A first-class module is created by “packing” a module with a signature. Done using the “module” keyword.

  let i = (module IntT1: T1)
  let b = (module BoolT1: T1)

It is possible to create a list of first-class modules, each encapsulating a different type:

  let mod_list = [i; b]

However, i and b cannot be used as modules themselves; they need to be unpacked first. Unpacking is done using the “val” keyword:

  module I = (val i : T1)
  module B = (val b : T1)

Note that the information that type a in I is int is lost while packing it into a module of type T1. Thus is an abstract value of type I.a, as confirmed by utop:

      utop #;;
      - : I.a = <abstr>

Likewise, is an <abstr> value of type B.a.

Alternatively, we could’ve packed IntT1 into a module of type (T1 with type a = int), but that makes it impossible to create [i;b].

It would have been clear by now that first-class modules are simply variables with existential types. We pack a module to introduce an existential type, and unpack it to eliminate the existential. Like the values of other types, values of existential types are first class. The t2_of_t1 function shown below takes a value of type module T1, which is an existential type, and returns a value of type module T2, which is another existential type:

  module type T2 = sig
    include T1
    val print: a -> unit
    (** val t2_of_t1: (module T1) -> (module T2) *)
  let t2_of_t1 (x:(module T1)) = 
    let module X = (val x : T1) in
    let module Y = struct 
                    include X
                    let print a = print_string @@ to_string a;;
                   end in
    let y = (module Y: T2) in

Unpacking an existential (i.e., conversion from a first-class module to a normal module) is done automatically by OCaml during pattern matches, so t2_of_t1 can be simplified as following:

  let t2_of_t1 (module X: T1) = 
    let module Y = struct 
                    include X
                    let print a = print_string @@ to_string a;;
                   end in
    let y = (module Y: T2) in

We can now construct modules at run-time:

  module I2 = (val (t2_of_t1 i):T2)

This feature lets us choose between various implementations of a signature at run-time, depending on the execution parameters. Another big advantage of first-class modules is that it helps us achieve qualified/bounded/adhoc polymorphism, and lets us obtain Haskell-style typeclasses in ML. For instance, lets consider a SERIALIZABLE signature that characterizes all serializable types:

  module type SERIALIZABLE = sig
    type t
    val to_string: t -> string
    val of_string: string -> t
  (* For convenience *)
  exception ConversionError 

Abstract type t is supposed to be instantiated with any concrete serializable type. Base types such as int, float etc., are obviously serializable:

  module SInt : SERIALIZABLE with type t = int = struct
    type t = int
    let to_string = string_of_int
    let of_string = int_of_string
  module SFloat : SERIALIZABLE with type t = float = struct
    type t = float
    let to_string = string_of_float
    let of_string = float_of_string

Polymorphic types like lists are serializable iff their type parameters are serializable:

  module SList(A: SERIALIZABLE) 
    : (SERIALIZABLE with type t = A.t list) = struct
    type t = A.t list
    let to_string l = "["^(String.concat ";" @@ 
                  A.to_string l)^"]"
    let of_string s = 
      let l = String.length s in
      let _ = if l>=2 && s.[0]='[' && s.[l-1]=']' then ()
              else raise ConversionError in
      let mid = String.sub s 1 (l-2) in
      let tokens = Str.split (Str.regexp ";") mid in
      let join p s = match p with "" -> s | _ -> p^";"^s in
      let els = List.rev @@ fst @@ List.fold_left 
                  (fun (els,p) s  -> 
                      let el = A.of_string @@ join p s in
                    with ConversionError -> (els, join p s)) 
                  ([],"") tokens in

An aside: we implemented a serializable list as a functor above. An alternative approach is to implement it as a function. The second approach is more general than the first because functions can have polymorphic types but functors cannot (in OCaml).

   * val mk_SList :
   *   (module SERIALIZABLE with type t = 'a) ->
   *     (module SERIALIZABLE with type t = 'a list)
  let mk_SList (type a)
               (module A : SERIALIZABLE with type t = a) =
    let module SList = struct
          module S = SList(A)
          include S
        end in
      (module SList : SERIALIZABLE with type t = a list)

We now have multiple to_string and of_string functions from SInt, SFloat, SList, and their compositions. Using first-class modules, we can write to_string and of_string wrappers for these functions that accept module parameters:

  let to_string (type a)
                (module S: SERIALIZABLE with type t = a) = S.to_string
  let of_string (type a)
                (module S: SERIALIZABLE with type t = a) = S.of_string
  let "2" = to_string (module SInt) 2
  let 3 = of_string (module SInt) "3"
  let "[2;3]" = to_string (module SList(SInt)) [2;3];;
  let [2;3] = of_string (module SList(SInt)) "[2;3]";;
  let "[[2;3];[4;5]]" = to_string (module SList(SList(SInt))) [[2;3];[4;5]];;
  let [[2;3];[4;5]] = of_string (module SList(SList(SInt))) "[[2;3];[4;5]]";;

Observe that in the above examples, single to_string/of_string function is being used for various SERIALIZABLE types. This is exactly the behavior that typeclasses offer in Haskell! There is, however, one significant difference. While GHC infers typeclasses automatically, here we have to explicitly pass the module parameters to to_string/of_string functions. This seems redundant, given that such module parameters can be inferred from the context. For example, when the to_string function is applied on [[2;3];[4;5]], the only module parameter that lets this expression typecheck is SList(SList(SInt)), which must be inferable. This is infact the sort of reasoning performed by Modular Implicits (Leo White et al) OCaml extension, which lets us mark module parameters to a function as implicit, and later elide them when applying the function. This is demonstrated below:

We first mark the modules that serve as typeclass instances with the keyword implicit:

implicit module SInt : SERIALIZABLE with type t = int = struct
  type t = int
  let to_string = string_of_int
  let of_string = int_of_string

implicit module SFloat : SERIALIZABLE with type t = float = struct
  type t = float
  let to_string = string_of_float
  let of_string = float_of_string

Functors can also be marked implicit, as long as their arguments are implicit (marked via the curly braces).

implicit module SList{A: SERIALIZABLE} 
  : (SERIALIZABLE with type t = A.t list) = struct
  type t = A.t list
  let to_string l = "["^(String.concat ";" @@ 
                A.to_string l)^"]"
  let of_string s = 
    let l = String.length s in
    let _ = if l>=2 && s.[0]='[' && s.[l-1]=']' then ()
            else raise ConversionError in
    let mid = String.sub s 1 (l-2) in
    let tokens = Str.split (Str.regexp ";") mid in
    let join p s = match p with "" -> s | _ -> p^";"^s in
    let els = List.rev @@ fst @@ List.fold_left 
                (fun (els,p) s  -> 
                    let el = A.of_string @@ join p s in
                  with ConversionError -> (els, join p s)) 
                ([],"") tokens in

Lastly, we mark the module parameters of functions (for which adhoc polymorphism is intended) as implicit (again using curly braces):

let to_string (type a)
              {S: SERIALIZABLE with type t = a} = S.to_string
let of_string (type a)
              {S: SERIALIZABLE with type t = a} = S.of_string

Module parameters can now be elided:

let "2" = to_string 2
let 3 = of_string "3"
let "[2;3]" = to_string [2;3];;
let [2;3] = of_string "[2;3]";;
let "[[2;3];[4;5]]" = to_string [[2;3];[4;5]];;
let [[2;3];[4;5]] = of_string "[[2;3];[4;5]]";;

Observe that to_string and of_string do not take module parameters. The code behaves as if both the functions are part of a typeclass, for which int, float and list instances exist.

So, to conclude, First-class modules + implicits = typeclasses in ML!.

Continue reading

Nov 6, 2016 - Effective Serializability for Eventual Consistency

This post is a collection of my notes on Lucas Brutschy et al’s paper “Effective Serializability for Eventual Consistency”. A later version of this paper has been accepted to POPL’17.


Serializability is a well-understood criterion to reason about concurrent transactions. Enforcing serializability via pessimistic concurrency control techniques, such as S2PL, is very expensive. Papadimitriou shows that checking a transaction history for serializability is NP-complete, hence optimistic concurrency control methods are no good either. However, Papadimitriou also shows that it is possible to construct efficient algorithms to detect violations of stronger variants of serializability. One such well-known variant is conflict-serializability. A history is conflict-serializable iff it has no conflicts among write-read, write-write and program-order dependencies. Conflict serializability, being a syntactic check, can be implemented efficiently by algorithms that detect cycles in the dependency graphs. However, conflict-serializability is stronger than serializability in the sense that it is possible for a history to exhibit conflicts in its dependency graph, yet be observably equivalent to a serial schedule. This may happen if the conflicts are not semantically relevant. An example is the following history of three concurrent transactions (T1, T2, and T3):

w1(A), w2(A), w2(B), w1(B), w3(B)

There is a write-write dependency from T1 to T2 w.r.t A, and another write-write dependency from T2 to T1 w.r.t B. The cycle in the dependency graph means that the history is not serializable. However, considering that later writes to an object absorb the effects of earlier writes, the schedule is observationally equivalent to the following serial schedule:

w1(A), w1(B), w2(A), w2(B), w3(B)

Hence, the schedule is effectively serializable, although it is not conflict-serializable. In the context of relational databases, where reads and writes are the only atomic actions, overlooking absorption is the main reason why serializable schedules are classified as non-serializable. But as we raise the level of abstraction to more sophisticated atomic actions, e.g., add or listLength, commutativity also becomes a reason to relax the definition of conflict-serializability. The main contribution of the paper is therefore to relax the definition of conflict-serializability to account for commutativity and absorption. The paper also generalizes the definition of (relaxed) conflict serializability to be applicable for histories generated by non-SC stores. Such histories, unlike the totally ordered SC histories shown above, are only partially ordered.

Technical Development

Actions are atomic actions with possible return values, such as x.set(0), x.add(2), x.get():2 etc. Action semantics is given as prefix-closed set of legal action sequences. For example, x.set(0), x.add(2), x.get():2 is a legal action sequence. The specification of an action is a possibly-infinite set of such action sequences. Two action sequences are equivalent () iff they are legal in exactly the same contexts. Thus x.set(0), y.set(1) and y.set(1), x.set(0) are equivalent because whenever the former appears in a legal action sequence, it can be replaced by the latter preserving the legality, and vice-versa. Observe that the definition of the legality is cleverly crafted to subsume the notion of observational equivalence; two legal action sequences containing the same set of actions are observationally equivalent. Thus, any transformation done on a sequence while preserving legality results in a sequence that is observationally equivalent.

Commutativity and absorption can now be specified in terms of equivalent action sequences. Two actions u and v are commutative iff uv ≡ vu. Likewise, v absorbs u iff uv ≡ v.

Since there is no need to consider order between adjacent commutative actions, we can relax such orders in an action sequence. The resultant partial order is called a trace. We furthermore divide the set of actions () into updates (), for which return values are not relevant, and queries (), for which only the return values are relevant. Authors assume that updates have not preconditions whatsoever, hence any action sequence containing only the updates is legal.

Let us now consider histories. As described above, SC histories are linear action sequences. However, under weaker levels of consistency, histories are only partially ordered w.r.t , the program order, , the visibility order, and , the arbitration trace. Visibility describes what updates are visible to what queries, and arbitration totally orders non-commutative actions (e.g., via timestamps). Since is a trace, commutative actions are not directly related by arbitration, but they may be related transitively via a non-commutative action. As with an SC history, the set of actions is partitioned into multiple transactions . Thus, a weakly consistent history (or schedule) is the tuple (note: we conflate the notions of history and schedule, although the paper doesn’t). For a weakly consistent schedule to be considered well-formed, it must satisfy the following properties:

  1. must be lower-finite and acyclic.
  2. Let be the action sequence of updates visible to a query order as per the arbitration order. Then, must be a legal action sequence.
  3. The set of updates invisible to a query must be finite (the liveness requirement of eventual consistency).

As an example, consider a query q that witnesses updates {u1,u2,u3,u4,v1,v2}. Their collective arbitration trace might look like the following:


Solid arrows denote and dotted arrows denote absorption. The two updates v1 and v2 are not dependencies of q because they do not precede it in the trace order (maybe v1 and v2 are y.set(0) and y.add(1), whereas q is x.get()), and therefore can be “moved past” q. The update u2 is not a dependency of q either as it gets absorbed by the adjacent update u4. But after that u1 and u3 become adjacent, and so u1 can get absorbed by u3. What remains are the two dependencies u3 and u4 of q.

The algorithm to find dependencies of a query q is described below with help of the above example:

  1. Start with a set of all updates visible to q that directly precede q in .
    • For q in the above figure, the set is {u3,u4}
    • Updates v1 and v2 are thus already eliminated.
  2. Lets call the set as . In each iteration, expand the boundaries of by including updates that precede those in the set in the order without being absorbed by them (i.e., related to those in the set by a solid arrow, but not a dashed arrow). If an update (u) is absorbed by another update in the set, remove u, and relate all updates that precede u (in ) to all updates that succeed u.
    • For the current example, the only update preceding the elements in the set = {u3,u4} is u2. However, u2 is absorbed by u4, so we remove it from the trace and relate u1 to both u3 and u4.
    • In the next iteration, u1 precedes the elements in , but it is absorbed by u3 in , hence it too must be removed.
  3. We iterate until there are no more updates left to be included in .
    • The final for the current example is thus {u3,u4}.

The set is the dependency set of q, and contains all the updates that must be ordered before q in any serialization of the original weakly consistent schedule. It is informative to note that if we do not take commutativity and absorption into account, the dependency set would have been equal to the visibility set of q.

The notion of dependency has a natural counterpart called anti-dependency. An anti-dependency of a query in a given schedule is an update not visible to the query but such that making it visible would turn it into a dependency.


Thus, the set , the anti-dependency set of q, contains all the updates that must be ordered after q in any serialization of the original weakly consistent schedule. Computing the set of q is simple: find an update u not visible to q in the original schedule, and make it visible. Now, run the above algorithm to compute set of q. If contains u, the u is an anti-dependency of q, otherwise not. Again, it is informative to note that if we do not consider commutativity and absorption, all updates not visible to q are its anti-dependencies.

Once we compute and for all the queries, a serial order can be obtained by topologically ordering the graph . If the graph contains a cycle, then we have a conflict, and the schedule is not serializable.


The evaluation is notable for its breadth. 33 TouchDevelop applications have been analyzed. TouchDevelop uses global sequence protocol (gsp) to implement a replication system with prefix consistency (stronger than causal). Nonetheless, even if the store is SC, serializability violations are possible because of dependency violations among transactions (groups of atomic actions). As our experience with RDBMS tranasactions informs, serializability violations are abound under weak isolation, and requiring serializability for all transactions is a giant overkill. As a thumb rule, authors assume that SER violations involving query transactions (transactions issued by the display code) as harmless, hence exclude them from the evaluation. Among the remaining transactions, many SER violations have been found. Some examples discussed in the paper:

  1. Tetris code where high-score is being locally updated (mutable integer). This is obviously incorrect. Fix is to basically turn the high score integer into an op-based data structure.
  2. Event management code where the app sorts the cloud-backed list of events by removing the events, sorting them locally, and re-inserting them. Fix is to either do it in a SER transaction, or to simply stop doing that. Sort locally whenever event list is being queried.
  3. A quiz game that allows a user to overwrite an existing user account due to incorrect uniqueness check. Fix is to use CAS.
  4. A Tic-Tac-Toe game that uses mutable datatypes all over. The fix is to use a data model that is op-based.

Another round of evaluation is done with TPC-C implementation on top of Riak (an EC store with no strengthening). Some violations:

  1. A DELIVERY transaction that updates a mutable balance. Obviously, one of the updates is lost when two DELIVERY transactions run concurrently. The fix is to use a CRDT counter for balance.
  2. DELIVERY transaction also has a subsequent getBalance operation, so even if the balance is a CRDT, SER violations still exist. However, this violation is ignored since it is harmless.
  3. The percieved foreign-key violation between ORDERS and NEW_ORDERS tables manifests as an SER violation. The fix proposed is denormalization of data, although enforcing monotonic reads via QUORUM writes would fix the issue.
  4. Two NEW_ORDER transactions assign same id to a transaction. The behaviour is not serializable, and the only fix is to use CAS.

Observations and Questions

  1. It is remarkable that testing could uncover uniqueness bugs and foreign key bugs since they require concurrent transactions to issue operations on very specific keys and ids (uniqueness for example requires to randomly issued NEW_ORDER transactions to insert an order with exact same id See comments).
  2. Moreover, uncovering these bugs requires conflicting transactions to overlap in a specific manner. Such concurrent schedules are hard to trigger, aren’t they?
  3. This leads to the question: what is the test harness and testing methodology?
  4. Perceived foreign key violation has been uncovered by analyzing a query transaction. It is for this reason that I don’t feel comfortable excluding query transactions.
  5. With mutable data, lost updates are inevitable. That is, the presence of UPDATE ... SET queries in weakly isolated transactions is itself a sufficient condition to witness SER violations. 4 of the 8 bugs discussed above fall into this category. The fix proposed in all the cases is to basically turn the mutable data into an op-based RDT. Assuming that we start we such an implementation, how many new bugs would be uncovered? What are the fixes?
  6. It would be interesting to apply effective serializability criterion to replicated versions of stacks and queues. Experience tells us that replicating these data structures in a coordination-free fashion is quite tricky.
Continue reading

Sep 20, 2016 - Dynamo and DynamoDB

In this post, I discuss DeCandia et al’s Dynamo paper, and Amazon’s DynamoDB service based on the paper.


DeCandia et al’s Dyanamo is a distributed key-value store remarkable for it’s entirely decentralized architecture, SLAs that focus on 99.9th percentile latency, emphasis on never losing writes, and the notorious sloppy quorums. Supporting decentralized architecture requried several innovations, such as anti-entropy protocols like hinted handoff and read repair. Dynamo was originally built as an infrastructure rather than a service. To quote from the paper: “Each service that uses Dynamo runs its own Dyanamo instances”, “[Dyamo] is built for a trusted environment”.

Dynamo is a simple key-value store with flat keyspace. The keyspace is assumed to be quite large, hence partitioned across multiple nodes. The partitioning is done by a consistent hashing algorithm that envisions the keyspace as a ring, and makes a node responsible for an arc (usually an adjacent one) on the ring. Hence, each instance is called a Dynamo ring. For durability and high availability, each key is replicated across multiple nodes (called it’s replication factor, which is usually set to 3), often spanning multiple data centers. Thus, a ring spans multiple geographic locations. Each node in the ring maintains a ring membership information, capturing the current node’s view of the ring. This information is regularly updated via gossip. Any node in the ring accepts reads and writes for any key in the keyspace. If the node is not one of the replicas of the key, it forwards the request to a replica. For this purpose, each node maintains a preference list of nodes for each partition of the keyspace, which is consulted to decide which node should serve a request. Even when all nodes have a consistent view of ring membership, preference lists maintained by different nodes can still be different. For instance, if nodes A and B are both replicas for key k, then preference list for k in A has A at the top, whereas in B’s list has B at the top. In general, preference list for a partition at a node is ordered so as to minimize the latency to serve requests on key k starting from that node.

Like all distributed databases, the durability of each read (R) and write (W) is configurable. Under normal circumstances, if R+W N, where N is the replication factor, we get a system with quorum consistency that, for example, supports Read-My-Writes (RMW) guarantee. However, Dynamo’s quorums are not traditional (Lamport) quorums. In the event of a network partition, even when none of the replicas for a key are reachable, Dynamo still accepts writes for the key, allowing the reachable nodes to act as makeshift replicas. While this behavior in itself is not suprising, considering that data stores are often designed to be available even if one node is reachable, Dynamo allows the reachable nodes to organize themselves into a ring, and form quorums on each side of the network partition! Such quorums are called sloppy quorums. Thus even if writes and reads from a session are successfully executed with quorum consistency, we still may not get RMW.

Network partitions or otherwise, concurrent updates to a key are possible. Dynamo uses vector clocks to identify multiple versions of an object corresponding to a key. If vector clocks of two versions are totally ordered, then conflict resolution is trivial. If they are not, then Dynamo keeps both the versions and lets the application handle conflicts. As I shall demonstrate later, keeping multiple versions is particularly important in case of Dynamo, otherwise it may lose the entire set of writes submitted on one side of a network partition after conflict resolution.


Present day DynamoDB is a multi-tenant hosted service offered by Amazon. The data model is more-or-less flat key-value, with some new additions (souce: core components of DynamoDB):

  • While the unique primary key of a data item also used to be its partition key, the primary key can now be defined as a combination of partition key (also called the hash key) and a sort key (also called the range key). Their combination needs to be unique for a data item.
  • The value needs to be a JSON with any number of attributes. While the primary key schema of each data item is fixed in a table, the schema of values is not. However, DynamoDB takes cognizance of attributes when a JSON is being stored, allowing secondary indexes to be created, and the attributes to be queried.

Some more relevant points from its documentation:

  • Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability.
  • Consistency across all copies of data is usually reached within a second.
  • A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.

It is not clear how strongly consistent read is implemented in DynamoDB. If it has to return the value of previous write regardless of the write consistency, then its consistency level has to be ALL. Alternatively, if it only returns values of previous strong writes, then both read and write can be QUORUM (strict quorum; sloppy won’t do). UPDATE: A stackoverflow user suggests that (a). DynamoDB only has only strict quorums (no sloppy quorums and hinted handoffs), (b). All writes in DynamoDB are written to a quorum of replicas, (c). By default, reads are served by a single replica, and (d). strong reads are served by a quorum of replicas. This makes sense.

Apart from regular writes, DynamoDB supports atomic in-place updates of attributes of a data item. For example, we can update, user.address, and user.telephone for a given in a single update api call to DynamoDB (one round-trip). The update can also be conditional, in which case it is applied if and only if the current value of the data item meets certain conditions. Conditional update is presumably implemented via paxos, giving it CAS semantics. To help construct in-place conditional update operations, DynamoDB defines a fairly expressive language for conditional expressions and update expressions (A primer on reading and writing items using expressions is here). The documentation says that conditional update is idempotent, because CAS is idempotent, but DynamoDB’s conditional update is more general than CAS. In general, conditional update is idempotent only if the update negates the condition.

Through conditional updates, DynamoDB already offers serializable transactions on (multiple attributes of) a single data item. But, this is only the beginning! As it turns out, DynamoDB also implements full-fledged multi-object transactions with variable isolation levels! (more here). DynamoDB currently defines three different isolation levels, without making any reference to the ANSI SQL standard. As described by the documentation:

  • Fully isolatated reads are performed through obtaining locks during a transaction, just like writes.
  • Committed reads are provide a consistency guarantee similar to eventually consistent reads, and are performed by reading the old copy of the item if a lock is detected.
  • Uncommitted reads (also known as dirty reads) are the cheapest, but are the most dangerous, since they may return data that will later be rolled back.

Roughly, they correspond to ANSI SQL Serializable, Read Committed, and Read Uncommitted isolation levels, respectively. Note that, unlike relational databases, where isolation level is set per-transaction, DynamoDB allows isolation level to be set per-read in a transaction. This is why there is no isolation level corresponding to Repeatable Read. Nonetheless, more analysis is needed to determine the exact guarantees offered by each of these isolation levels.

So, to summarize, DynamoDB offers quorum writes and weak reads by default. Application can request strong reads to get RMW, but they are twice as expensive. An update operation is a quorum write that lets (multiple attributes of) a data item to be upated atomically. It consumes one write capacity unit. So does a conditional update, that goes beyond CAS semantics to enable serializable transactions on a single data item. However, conditional updates may often fail, and retries consume more write units, making it expensive. Multi-item transactions with variable isolation levels are possible, and writes from transactions are very expensive. As per the documentation, a write from a transaction consumes roughtly 7N+4 write capacity units, where N is the size of the transaction. The cost model for each isolation level is not known, but fully isolated transactions are most expensive because it comes “at the cost of turning reads into relatively expensive writes.”

Sample Applications

  • This blog describes a simple game, where two players advance their positions via conditional updates.
  • Product catalogue case study is described here.
  • Tic-Tac-Toe game developed via conditional updates and multi-key transactions is described in this blog. The example is also discussed in this AWS re:Invent 2013 video. The same video also describes Dropcam’s experience using DynamoDB. The
  • Session logouts due to the lack of RMW is described here.
  • Manu transaction examples here.
Continue reading

Aug 19, 2016 - Disciplined Inconsistency

Today, in our reading group, we read an interesting paper titled “Disciplined Inconsistency” by Brandon Holt et al from UW CSE. This post is my notes on the paper.


Modern day web-services often trade consistency for availability and performance. However, there exist some data and operations for which stronger forms of consistency are needed. Recognizing this, some data stores, including Quelea, implement stronger consistency guarantees, such as Causal Consistency. However, there are two major problems with the lattice-of-consistency-levels model espoused by Quelea:

  • In real applications, invariants over the data are almost never strict. Applications can tolerate certain amount of error, especially if it improves the performance. For example, Amazon can afford to oversell a discounted item by 2% of the available stock, but no more. Unfortunately, Quelea’s model is insensitive to such error bounds. Quelea expects applications to have exact invariants, such as non-negative balance or monotonicity, and attempts to statically determine appropriate consistency levels that can enforce these invariants without fail. In reality, applications execute under moderate-to-medium loads during most of their lifetimes, under which weaker consistency levels are more-or-less sufficient to maintain application invariants. Quelea’s consistency model lacks dynamic monitoring, thus cannot adapt consistency to the changing load conditions.
  • The flow of data between operations executing at various consistency levels is not controlled. Consequently, data being maintained at EC can make its way into the data being maintained at SC.

Holt et al propose a system called IPA (Inconsistent, Performance-bound, Approximate storage system) that addresses the above problems via a combination of static type system and runtime enforcement. In the following, I explain various aspects of IPA in greater detail.

IPA’s Programming Model

IPA is modeled as a data structure store, like Redis, Riak, and Hyperdex, and its Scala-based programming framework allows programmers to implement ADTs, such as sets and counters, as distributed data structures. To better understand IPA’s approach, let us distinguish between library writers, who implement ADTs, and app developers, who use ADTs. IPA expects library writers to define various versions of an ADT, each with a different combination of consistency policies for ADT operations. In Scala’s terminology, each combination is a trait of the ADT. For instance, Counter ADT can be defined with, say, two traits: a WeakOps trait that implements counter reads and increments using Cassandra’s weak (i.e., consistency level ONE) reads and writes, and a StrongOps trait that implements them via Cassandra’s QUORUM reads and writes, or via ONE writes and ALL reads. For more sophisticated ADTs, more combinations of consistency levels for operations can be implemented as traits. App developers use the ADTs defined by library writers by selecting a version with desired trait. For instance, an app developer may use the Counter ADT with StrongOps trait to obtain a strongly consistent counter, or with WeakOps to obtain a weakly consistent counter.

From IPA’s perspective, Strong and Weak are examples of static consistency policies. IPA also provides support for library writers to implement traits for dynamic consistency policies, such as LatencyBound and ErrorBound, which admit bounds on the latency of an operation and the error tolerance on its result (resp.). For instance, LatencyBound trait for Counter ADT can be implemented using the rush primitive offered by IPA that executes a read or a write under strongest consistency level possible within the specified time limit. Likewise, ErrorBound trait can be implemented via IPA’s support for escrow reservations. Both these mechanisms will be explained in detail later. An app developer, however, does not need to know about these implementation mechanisms. The developer can choose Counter with LatencyBound(50) trait to obtain a counter whose operations have a 50ms latency bound, or Counter with ErrorBound(10) trait to obtain a counter whose value can be off by at most 10% of the actual count.

IPA thus exposes different levels of abstractions to library writers and app developers. While app developers can think in terms of ADTs and high-level consistency policies, library writers have to face the ugliness of the underlying data store as they reason in terms of reads, writes, and low-level consistency policies, such as ONE, QUORUM etc. While IPA does provide some tool support (via rush and reservations), it does not raise the level of abstraction insofar as library writers are concerned. This is in contrast to Quelea where RDT writers also reason in terms of high-level consistency policies and RDT-specific effects.

Static Type System

The stated purpose of IPA’s consistency type system is to enforce consistency safety Informally, it “ensures that values from weakly consistent operations cannot flow into stronger consistency operations without explicit endorsement form the programmer”. Thus, the type system helps developers trade consistency for performance in a disciplined manner.

IPA’s implementation defines four consistency types:

  • Consistent[T]: A value of type T resulting from a strongly consistent operation (the StrongOps trait),
  • Inconsistent[T]: A value of type T resulting from a weakly consistent operation (the WeakOps trait),
  • Rushed[T]: A value of type T resulting from an operation executed with a latency bound (the LatencyBound trait), and
  • Interval[T]: A range of values of type T, each of which is equally likely to be the result of an operation executed with an error bound (the ErrorBound trait).

The return type of an ADT operation is either defined by the library writer while implementing the trait, or follows from the use of IPA primitives, such as rush. Thus, more consistency types can be defined by library writers. For example, consider a Facebook-like deployment where an object is only ever stored in a single data center. If an ADT is tailor-made for such deployment, its writer can implement its operations via LOCAL_QUORUM reads and writes, and claim per-object sequential consistency by annotating the return types of operations with a new Consistent[PoSC] type.

A subtype lattice over consistency types can also be defined by relying on Scala’s type system. For instance, here is a lattice defined by the IPA implementation, and presented in the paper:


The result of annotating return values with consistent types is that app developers must now explicitly handle non-Consistent[T] values:

  • If a value is Inconsistent[T], the developer must explicitly endorse the value for it to be treated on par with Consistent[T] values.
  • If it is Interval[T], the result of an ErrorBound operation, then the developer has to handle an interval of values.
  • Rushed[T] is implemented as “a sum type with one variant per consistency level available to the implementation of LatencyBound” (It’s not quite clear to me how this is done). Thus, a Rushed[T] value has to be destructed and matched against Consistent[T], or Inconsistent[T], or any other user-defined (static) consistency type before it is used.

Thus consistency types let application be sensitive to the consistency level of a value. Alternatively, they force the application to adopt a disciplined approach while dealing with inconsistent values.

Run-time enforcement of consistency

Strong consistency is implemented using QUORUM reads and writes. As we describe in Representation without Taxation paper, such an implementation is likely to be buggy since Cassandra’s LWW conflict resolution drops writes due to inevitable clock drift across servers.

LatencyBound reads are implemented in IPA by issuing parallel read requests at different (Cassandra) consistency levels, and returning the strongest available result at the specified time limit. One drawback with this approach is that it floods network with read requests, which is undesirable esp. when system is operating at high loads. Therefore, the paper proposes an alternative approach that is based on monitoring latency. The approach first computes the likely latency (e.g., 90th percentile latency) of a read operation for each consistency level based on the recent history, and then issues the read at an appropriate level. For this purpose, the system monitors latencies of reads, grouped by operation and consistency level. “The monitor uses an exponentially decaying reservoir to compute running percentiles weighted toward recent measurements, ensuring that its predictions continually adjust to current conditions”.

IPA allows ADTs to implement error bounds making use escrow and reservation. The paper summarizes this concept well:

The idea is to set aside a pool of permissions to perform certain update operations (we’ll call them reservations or tokens), essentially treating operations as a manageable resource. If we have a counter that should never go below zero, there could be a number of decrement tokens equal to the current value of the counter. When a client wishes to decrement, it must first acquire sufficient tokens before performing the update operation, whereas increments produce new tokens. The insight is that the coordination needed to ensure that there are never too many tokens can be done off the critical path: tokens can be produced lazily if there are enough around already, and most importantly for this work, they can be distributed among replicas. This means that replicas can perform some update operations safely without coordinating with any other replicas.

Consider an increment-only counter X. Let us say X = 100 initially, and the error bound on X is 10%. Consider a case when 3 replicas are independently modifying X. Then, replicas can be issued 5 tokens each, capturing the fact that each can perform 5 increments before they need to synchronize in order to not violate error bounds. For instance, if all replicas perform 5 increments each without synchronizing, then each will return 105 as the value of X when the actual value is 115. However, 105 is still within 10% of 115, so this is alright. Thus, error bounds provide applications with some slack in implementing strongly consistent operations by allowing synchronization to be moved off the critical path.

IPA provides infrastructure support (via reservations server) for escrow reservations, and exposes its functionality via an API. The library writer is expected to make use of this functionality to implement ErrorBound consistency policy for his/her ADT.

An Aside

The escrow method was proposed in 1986 (and rehashed later), but there seem to be many recent papers on applying variants of escrow in various contexts:

  • Liu et al, Warranties for Faster Strong Consistency, NSDI’14 (paper).
  • Balegas et al, Putting Consistency Back into Eventual Consistency, EuroSys’15 (paper)
  • Balegas et al, Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants, SRDS’15 (paper)

I have the intuition that escrow methods would be useful to implement merge operation on data structures that support non-commutative and non-idempotent operations. Basically, if we allocate n tokens to each replica to perform a non-idempotent operation, such as a dequeue operation on a queue, n times (where n is the size of the queue), then merge operation becomes simple because a dequeue only dequeues an item present on all replicas. We also get the read at least once guarantee on the messages in the queue.


The experimental methodology of the paper is interesting, and results are insightful. Experiments are performed for two different setups: simulation and real. In simulation experiments, they use multiple docker containers on the same physical node connected via the Ethernet to simulate multiple replicas. Network conditions are controlled via Linux’s tc netmem utility that introduces packet delay and loss at the OS level. In real experiments, they deploy IPA on Google’s Compute Engine with geo-distributed replicas.

Three kinds of experiments were performed: 1. With uniform 5ms network latency between replicas (simulating a local network in a data center under medium-to-high load), 2. With one replica being slower than other replicas (modeling variable network latency and hardware problems), and 3. With real geo-distributed replicas. Some highlights:

  • In experiments with a single slow replica, 50ms bound on the 95th percentile latency overloads the slow replica with double the requests (why?), and causes it to exceed the latency bound 5% of times.
  • In geo-distributed case, if all operations are SC, then the mean latency is 800ms, but if 1% error is tolerated (on the Counter), then the latency reduces significantly to 200ms. The latency difference between weak consistency policy and 5% ErrorBound consistency policy is negligible. This experiment demonstrates that it is possible to guarantee error bounds on the data without having to forego availability and performance.
Continue reading

May 31, 2016 - Extraction in Coq

Extraction erases Props

Extraction in Coq works by erasing Props. For example, consider the following definition of div:

    Definition div (m n : nat)(H : n<>0): nat :=
      NPeano.div m n.

div expects a proof that its second argument is non-zero. Indeed, in coq, it is impossible for div to divide by zero. However, when this code is extracted to OCaml, the n <>0 prop is erased (Coq renames our div to div0 to avoid name clash with NPeano’s div):

    let div0 m n = div m n


Coq’s sumbool type is another good example to demonstrate the proof erasing nature of extraction. sumbool is defined in Specif module as following:

(** [sumbool] is a boolean type equipped with the justification of
    their value *)

Inductive sumbool (A B:Prop) : Set :=
  | left : A -> {A} + {B}
  | right : B -> {A} + {B}
 where "{ A } + { B }" := (sumbool A B) : type_scope.

sumbool is usually the return type of equality decision procedures of various types. For example, the string_dec, the string equality function has the type:

forall s1 s2 : string, {s1 = s2} + {s1 <> s2}

Consider a type id defined as:

Inductive id : Type :=
  T : string -> t.

A decision procedure ids can be constructed from string_dec as following:

 Theorem eq_id_dec : forall id1 id2 : id, {id1 = id2} + {id1 <> id2}.
     intros id1 id2.
     destruct id1 as [n1]. destruct id2 as [n2].
     destruct (string_dec n1 n2) as [Heq | Hneq].
     Case "n1 = n2".
       left. rewrite Heq. reflexivity.
     Case "n1 <> n2".
       right. intros contra. inversion contra. apply Hneq. apply H0.

Extracting the sumbool and eq_id_dec generates the following:

      type sumbool =
      | Left
      | Right
      (** val eq_id_dec : t -> t -> sumbool **
      let eq_id_dec id1 id2 =
        string_dec id1 id2

OCaml’s sumbool has no arguments because coq’s sumbool has only Prop arguments. The advantage of using sumbool instead of bool is that it can be used seamlessly in proofs and computations. When used in a computation, sumbool simply tells whether a property is true or false, but when used in a proof, sumbool also tells why a property is true or false.

Theorems can also be extracted

Observe that eq_id_dec has been written as a theorem. Theorem can be used to assert the existence of a witness to, not only Props, but also Sets and Types. For example:

  Theorem there_is_a_nat : nat.
  Proof.  apply 0.  Defined.

  Extraction there_is_a_nat.
    let there_is_a_nat = O

Why do we say Defined instead of Qed whenever we are doing extraction? Qed defines proof/definition as opaque, whereas Defined defines it as transparent. If we used Qed instead of Defined extraction would produce:

(** val there_is_a_nat : nat **)

let there_is_a_nat =

Note that theorems can only be extracted if the statement is either Set or a Type, not Prop. The following two examples should demonstrate this point. Example 1:

  Theorem there_is_plus: forall (n1 n2 : nat), exists (n :nat), n = n1 + n2.
  Proof.  intros.  exists (n1+n2).  reflexivity.

  Check (forall (n1 n2 : nat), exists (n :nat), n = n1 + n2). (* Prop *)

  Extraction there_is_plus.
      (** val there_is_plus : __ **)

      let there_is_plus =

Contrast Example 1 with the following Example 2:

  Inductive plus_defined (n1 n2: nat) : Set :=
    | PlusT : forall(n:nat), (n=n1+n2) -> plus_defined n1 n2.

  Extraction plus_defined.
      type plus_defined =
        nat (* singleton inductive, whose constructor was PlusT *)

  Theorem there_is_plus_t: forall (n1 n2 : nat), plus_defined n1 n2.
  Proof.  intros.
  apply PlusT with (n:=n1+n2); reflexivity.

  Extraction there_is_plus_t.
    let there_is_plus_t n1 n2 = plus n1 n2

Why would anyone want to write a Set or a Type term as a proof of a theorem, rather than a Definition or a Fixpoint? If the Set or Type term expects Prop witnesses (like sumbool), then its better to write it as a proof. Sometimes, it may not even be possible to write the term otherwise. For example, here is a failed attempt at defining eq_id_dec as a Definition:

  Inductive id : Type :=
    T : string -> t.

  Theorem eq1 : forall (s1 s2: string), s1=s2 -> (T s1) = (T s2).
    subst. reflexivity.

  Theorem neq1 : forall (s1 s2 : string), s1<>s2 -> (T s1) <> (T s2).
  unfold not.
  apply H; inversion H0. subst; reflexivity.

  Definition id_eq (id1 id2 : id) : ({id1 = id2}+{id1 <> id2}) :=
    match (id1,id2) with
    | (T s1,T s2) => match string_dec s1 s2 with
                     | left A => left (eq1 s1 s2 A)
                     | right B => right (neq1 s1 s2 B)

The approach fails because the term left (eq1 s1 s2 A), which has the type {T s1 = T s2} + {?66}, fails to type check agains the required type {id1 = id2} + {id1 <> id2}. The problem is that while pattern matching under a definition, coq does not populate the context with equalities between scrutinees and patterns. Although we know that id1 = T s1 and id2 = T s2, we have no way of obtaining it from the context. Recall that we did not face this problem when proving eq_id_dec. This is why we sometimes prefer writing Set or Type terms as proofs.

Continue reading

Dec 5, 2015 - Notes - Terry's Session Guarantees


This post is the compilaton of my notes on Terry et al’s PDIS’94 paper: Session Guarantees for Weakly Consistent Replicated Data.

System Model

From what I understand, the paper is the first to describe the abstract system model of a weakly consistent replicated database that now serves as a frame of reference to describe KV stores (eg., Cassandra and Riak) and mobile computational nodes. The abstract system model is very similar to Quelea’s system model described in Sec.2 of the PLDI paper. Like in Quelea, where the state of an application at a replica \(r\) is the set of application-specific effects present at that replica (i.e., \({\sf vis}\) set at \(r\)), the application state at server \(S\) in Terry’s model the collection of app-specific Write’s at \(S\) (denoted \(DB(S)\)). Following is a more detailed comparison:

  • A Write in Terry’s model is an abstraction of creation, updatation, or deletion of concrete data items (collectively called CUD operations). Hence, multiple Write operations need not commute. The \({\sf WriteOrder}\) therefore assumes significance. A Write in Quelea is an abstraction of a single atomic effect. Quelea’s model in general encourages commutative Writes, and the model does not have a built-in notion of a \({\sf WriteOrder}\). Application state in Quelea is a set of effects, and the result of a Read operation is obtained by folding over this set. Nonetheless, Quelea model does not preempt CUD operations; it is possible to write applications with Read, Write and Delete operations, such as registers, in Quelea. However, applications have to implement their own mechanisms (eg., tagging effects with local timestamps) and conflict resolution semantics (eg., Last-Writer-Wins) to ensure convergence.
  • Both models rely on total propagation and consistent ordering to ensure eventual consistency. Total propagation can be achieved via anti-entropy mechanisms such as hinted-handoff and read-repair (more about these in this paper). In Quelea’s implementation, this is taken care by Cassandra. No further assumptions about total propagation are made. Consistent ordering for non-commutative writes is essential for convergence. Both models rely on application semantics to consistently order writes at all replicas. However, while Terry’s abstract model takes cognisance of the write order via the binary \({\sf WriteOrder}\) predicate, Quelea’s model (intriguingly) does not.
  • A Write in Terry’s model is an abstraction of multiple CUD operations, which appear to take place atomically. Hence, Terry’s Writes are in fact transactions with RC isolation level.

Session Guarantees

In Terry’s and Quelea’s perspective, a session is an abstraction of a sequence of Read and Write operations performed during the execution of an application. Note that operations from the same client session can be dispatched to different replicas, hence the result of a read operation in a session may not be consistent with the write operations performed previously in the same session. This could be very annoying to the end user. The purpose of the paper is to equip the aforedescribed abstract system model with additional guarantees that collectively ensure that the results of operations performed in a session will be consistent with the model of a single centralized server, possibly being read and updated concurrently by multiple clients.

Terry et al propose four session guarantees for their model, and, due to the close correspondence between both models, they can also be (nearly) stated as specifications in Quelea’s spec language. Below, I present both versions (Note: what Terry calls \({\sf RelevantWrites}\), Quelea calls\({\sf Visibility}\)):

  • Read Your Writes: If Read \(R\) follows Write \(W\) in a session and \(R\) is performed at server \(S\) at time \(t\), then \(W\) is included in \(DB(S,t)\).
    • \(\forall a,b.\, {\sf soo}(a,b) \Rightarrow {\sf vis}(a,b)\)
  • Monotonic Reads: If Read \(R_1\) occurs before \(R_2\) in a session and \(R_1\) accesses server \(S_1\) at time \(t_1\) and \(R_2\) accesses server \(S_2\) at time \(t_2\), then \({\sf RelevantWrites(S_1,t_1,R_1)}\) is a subset of \(DB_{}(S_2,t_2)\).
    • \(\forall a,b,c.\, {\sf vis}(a,b) ~\wedge~ {\sf soo}(b,c) \Rightarrow {\sf vis}(a,b)\)
  • Writes Follow Reads: If Read \(R_1\) precedes Write \(W_2\) in a session and \(R_1\) is performed at server \(S_1\) at time \(t_1\), then, for any server \(S_2\), if \(W_2\) is in \(DB(S_2)\) then any \(W_1\) in \({\sf RelevantWrites}(S_1,t_1,R_1)\) is also in \(DB(S_2)\) and \({\sf WriteOrder_{}}(W_1,W_2)\).
    • \(\forall a,b,c,d.\, {\sf vis}(a,b) ~\wedge~ {\sf soo}(b,c) ~\wedge~ {\sf vis}(c,d) \Rightarrow {\sf vis}(a,d) ~\wedge~ ?{\sf WriteOrder}(a,c) \).
  • Monotonic Writes: If Write \(W_1\) precedes Write \(W_2\) in a session, then, for any server \(S_2\), if \(W_2\) in \(DB(S_2)\) then \(W_1\) is also in \(DB(S_2)\) and \({\sf WriteOrder_{}}(W_1,W_2)\).
    • \(\forall a,b,c.\, {\sf soo}(a,b) ~\wedge~ {\sf vis}(b,c) \Rightarrow {\sf vis}(a,c) ~\wedge~ ?{\sf WriteOrder}(a,b)\).

Observe that Terry’s \({\sf WriteOrder}\) becomes \(?{\sf WriteOrder}\) in Quelea. The punctuation is to explicitly denote the fact that Quelea is unaware of \({\sf WriteOrder}\). To make the matters worse, Quelea does not expose \({\sf happens-before}\) order to application semantics. Consequently, a straightforward port of a Bayou (Terry’s system) application, which requires WFR or MW session guarantee, onto Quelea does not preserve application semantics.

Example Alice tweets “The chicken laid an egg”. Bob sees Alice’s tweet and tweets “A chicken hatched out of the egg”. If GetTweets is tagged with WFR spec in Quelea, then Cheryl, who sees Bob’s tweet is also guaranteed to see Alice’s tweet. However, she canot figure out the causal order among the tweets making her wonder what comes first: Alice’s chicken, or Bob’s egg.

How do we overcome the handicap described above? Some thoughts:

  1. The state of an application that admits non-commutative operations is not an unordered set of effects, but a partially-ordered set of effects with respect to the \({\sf happens-before}\) relation. Accordingly, Read operations in Quelea must reduce over a graph representing the poset, but not a list representing a set.
  2. Is there a way to explicitly track causality by tagging effects with vector clocks? Since each component of the vector must represent the monotonically increasing clock local to a replica, and there is no way to know the currentReplicaId in Quelea, this approach seems implausible. However, getCurrentReplicaId is a simple extension to Quelea, hence supporting this approach is easy. Unfortunately, explicitly tracking vector clocks is tedious and complicates application logic. Hence, first approach may be preferable.
Continue reading

Oct 5, 2015 - Effing Package Management (FPM)

If you are ever into making debian packages to distribute your software, check out this great package management tool called fpm by @jordansissel. FPM is no-nonsense package manager that lets you create packages by simply specifying dependencies, and source and destination paths for your binaries, libraries and includes.


Let us say you want to make a debian package to distribute the first version of your hello-dev application, that provides a binary (hello), a dynamic library ( and an include file (hello.h), which you want to be installed in the usual locations under /usr. Also, let’s assume that your application requires gcc version 4.6.3 or above. Creating a debian package for your is as easy as:

 fpm -s dir -t deb -n "hello-dev" -v 1.0 --description "Hello development tools" -d 'gcc > 4.6.3' ./hello=/usr/bin/hello ./ ./hello.h=/usr/include/hello.h ./LICENSE=/usr/share/doc/hello-dev/copyright
Continue reading

Sep 30, 2015 - Understanding Transactions in Rails

In my previous post I have noted that Rails encourages application developers to rely on feral mechanisms, such as validations and associations, to ensure application integrity. In this post, I first explore various feral mechanisms in Rails, and how they are being used by some sample applications. Next, I will throw some light on how Rails actually enforces these feral mechanisms.

ORM and Active Record

Quoting from Ruby on Rails guide:

Object Relational Mapping: Object-Relational Mapping, commonly referred to as its abbreviation ORM, is a technique that connects the rich objects of an application to tables in a relational database management system. Using ORM, the properties and relationships of the objects in an application can be easily stored and retrieved from a database without writing SQL statements directly and with less overall database access code.

Active Record as an ORM Framework: ActiveRecord is Rails’s implementation of ORM. Active Record gives us several mechanisms, the most important being the ability to:

  • Represent models and their data.
  • Represent associations between these models.
  • Represent inheritance hierarchies through related models.
  • Validate models before they get persisted to the database.
  • Perform database operations in an object-oriented fashion.

Active Record Models: A class that extends ActiveRecord is an Active Record model, or simply a model. Each model has a corresponding table in the underlying database. For instance, defining the Product model:

class Product < ActiveRecord::Base 

will create a table with the following schema:

CREATE TABLE products (
  id int(11) NOT NULL auto_increment,
  name varchar(255),

CRUD: To write a record to the the database, we first create an Active Record object (note: arg: denotes an argument named arg. Think of Ruby methods as functions accepting named variants):

product = Product.create(name: "Milk")


product = do |p| = "Milk"

To save, ActiveRecord offers save method:

Reading can be done via multiple querying methods. Example:

all_prods = Product.all
milk = Product.find_by(name: "Milk").order(id: :asc)

Active record querying webpage has more details.

Updatation can be done via read-modify-save sequence:

milks = Product.find_by(name: "Milk")
milks.each do |m| = "Whole Milk"

Note: Active Record uses SQL INSERT for new record saves, and SQL UPDATE for updatations. It uses new_record? flag to keep track of new Active Record objects that are not yet saved into the database (source: Active Record Callbacks guide).

Deletion is done using destroy:

milks = Product.find_by(name: "Milk")
milks.each do |m| 

The corresponding records will be deleted from the database.


Associations are useful to keep track of semantic associations between various models. For instance, using a belongs_to association, the developer can let Rails know that every order in an eCommerce application belongs to utmost one customer (Note: the :name syntax denotes a symbol called name. Meta-level functions accept symbols and generate new code, which would then be spliced into original code (somewhat like MetaOCaml and TemplateHaskell). Ruby, being a dynamic language, does not treat meta-level functions specially.).

class Order < ActiveRecord::Base
  belongs_to :customer

The result is that the order table will now have a column named customer_id that is expected to refer to id of the customer table. Order objects will also have a field named customer referring to the Customer object to which the order belongs to (note: the field can be nil). When an order object is saved, the id of its customer object will be saved to the customer_id field of the corresponding record in the database.

The other end of belongs_to association is usually a has_many association or has_one association. Using a has_many association, the developer can specify that a customer can have one or many orders:

class Customer < ActiveRecord::Base
  has_many :orders

Other than adding a field named orders to a customer object, specifying the has_many association will have no tangible effect; there is no change in Rails’s interaction with order and customer tables. However, we can qualify the has_many association with an optional dependent argument set to :destroy that asks Rails to destroy all orders placed by a customer, when the customer is deleted.

class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy

This behaviour corresponds to ON DELETE CASCADE behaviour offered by SQL for foreign key dependents.

With belongs_to association from Order to Customer, and has_many association from Customer to Order, with a dependent: :delete qualification, we have something close to, but not exactly, a foreign key constraint from the order table to the customer table. To effectively enforce a foreign key constraint, we need a guarantee that an order cannot exist without the corresponding customer existing in the customer table. Currently, this invariant can be violated if, for example, we save an order object referring to a customer object, whose corresponding record was already destroyed. The effective foreign key enforcement that prevents this scenario (theoretically, at least) can be achieved using Model Validations, described in the next section.

It is also possible to specify many-to-many relationships between models in a similar way to how such relationships are defined in relational databases: via an intermediary model. For example, in a microblogging application, a user can follow many users, and can have many users as followers. This many-to-many relationship between users can be specified via a third model called, say, relationships. Each user has_many following relationships and follower relationships. Each relationship belongs_to two users: user who is being followed, and the user who is following. This scenario is described in the picture below (source: Michael Hartl’s book. Released under MIT License):


Observe that there is a transitive has_many relationship between users, which can be made explicit via through: argument to has_many association: a user has_many followers through: relationships, and has_many followed_users through: relationships.

Besides belongs_to, has_many, and has_one, Rails defines few other associations. They are described here.


Quoting from Rails guide on validations: Validations are used to ensure that only valid data is saved into your database. For example, it may be important to your application to ensure that every user provides a valid email address and mailing address. Model-level validations are the best way to ensure that only valid data is saved into your database.

Validations are primarily declared using validates method. For example, to ensure that user provides an email address during during a new account creation (i.e., email field is not nil), and to ensure that there are no existing accounts associated with this email, we can perform presence and uniqueness validations, respectively.

class Account < ActiveRecord::Base
  validates :email, uniqueness: true, presence: true

Rails also provides variants of validates specialized for common validations:

class Account < ActiveRecord::Base
  validates_presence_of :email
  validates_uniqueness_of :email 

validates_uniqueness_of is clearly useful to ensure primary key property. Besides being useful to preempt nil values, validates_presence_of is also useful to enforce foreign key constraints, when used in conjunction with associations. For instance, in the eCommerce example, the foreign key constraint from Order to Customer can be enforced by validating the presence of customer field.

class Order < ActiveRecord::Base
  belongs_to :customer
  validates_presence_of :customer
class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy

Another useful validation is validates_associated. You should use this helper when your model has associations with other models and they also need to be validated. When you try to save your object, valid? will be called upon each one of the associated objects. For instance, when the Customer class validates associated orders:

class Customer < ActiveRecord::Base
  has_many :orders, dependent: :destroy
  validates_associated :orders

All the orders associated with the customer are validated when the customer object is being save‘d (in this case, this entails a validation for the presence of the customer object owning orders, which is trivially true).

validates_associated works with all of the association types. Although it is by default turned off for most associations, it is by default on for has_many association. So, for the above example we needn’t explicitly specify validates_associated.

Note: Ruby has a concept of virtual attributes, which, in the context of ActiveRecord are attributes that do not get recorded in the database. Validations can also be defined over virtual attributes.

Custom validations

Validations, such as validates_presence_of, are built into Rails. Sometimes, built-in validations are not sufficient. Rails allows developers to define custom validations for this purpose. Custom validations can be defined as methods, which are used once locally, or they can be defined by implementing Rails’s ActiveModel::Validator interface. Examples of custom validators include Spree’s AvailabilityValidator, which checks whether an eCommerce inventory as sufficient stock available to fulfill an order. More on custom validations here.

When are validations enforced?

When an Active Record object is being persisted to the database (save), although this is not entirely accurate:

Validations are typically run before these commands (INSERT/UPDATE) are sent to the database. If any validations fail, the object will be marked as invalid and Active Record will not perform the INSERT or UPDATE operation. This helps to avoid storing an invalid object in the database. You can choose to have specific validations run when an object is created, saved, or updated.

There are many ways to change the state of an object in the database. Some methods will trigger validations, but some will not. This means that it’s possible to save an object in the database in an invalid state if you aren’t careful. - ActiveRecord validations callbacks guide

Validations and any callbacks registered on the state changes of the model object are queued for execution. This queue will include all your model’s validations, the registered callbacks, and the database operation to be executed. The whole callback chain is wrapped in a transaction. If any validation/callback method returns false or raises an exception the execution chain gets halted and a ROLLBACK is issued (source).

Sample Applications

We will now examine some sample applications to understand how validations and associations are being used.


First is based on a small microblogging application from the Ruby on Rails tutorial by Michael Hartl.

The app defines 3 models - micropost, user, and relationship with following associations and validations:

  • Micropost belongs_to a user, and validates user’s presence.
  • A user has an email address; validates its presence and uniqueness.
  • A user has_many follower relationships, and through: those relationships, has_many followers. Follower relationships need to be destroyed if this user is deleted.
  • A user has_many following relationships, and through: those relationships, has_many followed users. Following relationships need to be destroyed if this user is deleted.
  • A relationship jointly belongs_to a follower user and a followed user, and validates their presence.

Following are some interesting operations that the app defines:

Adding a User: Adds a user after validating that the email is present and unique.

addUser(u) = transaction do
  assert( != nil);
  dups := SQL "SELECT * FROM users WHERE email = `` LIMIT 1"
  assert (dups == []);
  SQL "INSERT INTO users VALUES (`freshId()`,``,``)";
  followUser(thisUser, thisUser); /* One must follow oneself */

Follow a User: Makes the current user follow another user.

followUser(thisUser,thatUser) = transaction do
  usr1 := SQL "SELECT * FROM users WHERE id = `` LIMIT 1";
  assert (usr1 != []);
  usr2 := SQL "SELECT * FROM users WHERE id = `` LIMIT 1";
  assert (usr2 != []);
  SQL "INSERT INTO Relationships VALUES (``,``)";

Unfollow a User: Makes the current user unfollow other user.

unfollowUser(thisUser,thatUser) = transaction do
  SQL "DELETE FROM Relationships WHERE follower_id = ``
                                 AND followed_id = ``";

getFollowers: Returns the list of users following the current user.

/* Implementation 1 : from the original sample app */
getFollowers(thisUser) = 
  SQL "SELECT * FROM users INNER JOIN relationships ON = relationships.follower_id WHERE
      relationships.followed_id = ``"

/* Implementation 2: If the data store does not support joins
            (e.g: Cassandra) */
getFollowers(thisUser) = transaction do
  rels := SQL "SELECT * FROM relationships WHERE 
                  followed_id = ``";
  followers := [];
  rels.each |rel| do
    /* The following pattern-match must always pass because:
        1. Relationship model validates presence of follower
           before persisting.
        2. When a user is deleted, all the dependent
           relationships are deleted as well.
        3. User model validates uniqueness when persisiting. */
    [follower] = SQL "SELECT * FROM users WHERE 
                    id = `rel.follower_id`";
    followers := follower::followers;
  return followers;

getFollowedUsers: Returns the list of users that the current user is following. Implementation similar to getFollowers described above.

postBlog: Posts a microblog on behalf of the current user. Validates the presence of current user record before persisiting the microblog.

postBlog(thisUser,blog) = transaction do
  usr = SQL "SELECT * FROM users WHERE id = `` LIMIT 1"; 
  assert(usr != []);
  SQL "INSERT INTO microposts (content, user_id) 
       VALUES (`blog.content`, ``)";

getTimeLine: Get a list of microposts by a user (the call is: Micropost.includes(:user).from_users_followed_by(user)).

getTimeLine(user) = transaction do
  posts := SQL "SELECT * FROM microposts WHERE 
                  user_id = ``";
  posts.each |post| do
    post.user := (SQL "SELECT * FROM users WHERE 
                  id = post.user_id LIMIT 1").first;
  return posts;

The SQL call to users table never returns an empty collection because:

  • A micropost belongs_to a user, and checks the presence of the user before persisting, and
  • When a user is deleted, the dependent microposts are also deleted.

Therefore, .first on the collection returned by SQL is always valid.

getFeed: Get a list of microposts by users being followed by the current user.

getFeed(user) = transaction do
  followed_ids := SQL "SELECT followed_id FROM relationships
                        WHERE follower_id = ``";
  uids_of_interest := :: followed_ids ;
  posts := SQL "SELECT * FROM microposts WHERE 
                  user_id IN `uids_of_interest`";
  posts.each |post| do
    post.user := (SQL "SELECT * FROM users WHERE 
                  id = post.user_id LIMIT 1").first;
  return posts;

deleteUser: Delete the current user. It has to enforce dependent: :destroy on microposts and relationships.

deleteUser(thisUser) = transaction do
  /* First, delete all microposts by this user */
  SQL "DELETE FROM microposts WHERE user_id = ``";
  /* Next, delete all relationships on this user */
  SQL "DELETE FROM relationships WHERE 
    follower_id = `` OR
    followed_id = ``";
  /* Finally, delete the user */
  SQL "DELETE FROM users WHERE id = ``";
Continue reading

Sep 28, 2015 - Understanding Transactions in Quelea

Quelea is our eventually consistent data store with an associated programming framework intended to simplify programming under eventual consistency. In this post, I describe how various applications written in Quelea employ a combination of highly available and serializable transactions to enforce application integrity. Three applications participate in this survey:

  • BankAccount: A highly available concurrent bank account application.
  • Microblog: A twitter-like microblogging site, modeled after Twissandra. The application allows adding new users, adding and replying to tweets, following, unfollowing and blocking users, and fetching a user’s timeline, userline, followers and following.
  • Rubis: An eBay-like auction site. The application allows users to browse items, bid for items on sale, and pay for items from a wallet modeled after a bank account.

First let me define what I mean by a transaction:

What is a transaction?

For all practical purposes, a transaction can be defined as a set of operations on some data, which, when executed, appear to have been executed simultaneously. We say that the set of operations has taken effect atomically. Atomicity implies that the set of effects generated by a transaction T is visible to operations in other transaction T' in its entirety, or it is not visible at all. From the perspective of the other transaction T', an operation op' in T' either sees all the effects of a transaction T, or it sees none of them. Note that this is the requirement of Read Committed (RC) isolation level. Hence, atomicity being a defining property of a transaction means that every transaction automatically experiences RC isolation level. Note that a set of effects that offers atomicity, but with stronger isolation properties than RC is also a transaction.

Transactions are used for in various applications towards different ends. Below, I describe some of the common purposes served by transactions in some sample applications:

To perform atomic updates

Transactions are primarily used in practice when we want to perform a series of actions on the data taking it through some intermediary states, where the integrity of data may be violated. In order to not expose these intermediary states, we want to wrap these actions inside a transaction so that, for observers, it appears as if all actions have been committed atomically. Typically, these actions perform updates to multiple tables or materialized views, or multiple rows in the same table. Some usecases:

  1. BankAccount:
  2. When a user saves money, then withdraw operation on the checking account, and deposit operation on the savings account should appear to have happened atomically. Intermediary state (after withdraw, but before deposit) may violate app-specific integrity constraints (e.g: user shown incorrect total bal, or user incorrectly penalized for insufficient bal etc).
  3. Microblog (Twitter):
  4. When a user (B) unfollows another user (A), then B has to be removed from A’s follower list, and A has to be removed from B’s following list. Both updates should appear to have happened atomically. Intermediate state violates app-specific integrity constraint that follows is a symmetric relation.
  5. When a user tweets, the tweet should be added to the table of all tweets, and its Id should be added to the userline materialized view against tweeter’s userId, and timeline materialized view against the userIds of all the followers. All insertions should appear to have happened simultaneously. Intermediary states may violate (a) app-specific integrity constraints (e.g: user B sees a tweet by A in his timeline, but doesn’t find the tweet in A’s userline), and (b) referential integrity (happens if data store can reorder operations, like in the case of EC stores).
  6. When user (A) blocks user (B), then B should be forced to unfollow A (the unfollow operation needs to be performed in the same way as above). Furthermore, to prevent B from re-following A, B needs to be added to A’s Blocks list, and A needs to be added to B’s IsBlockedBy list, both in the user table itself. All changes must commit atomically.
  7. Rubis (Auction site):
  8. When a user (A) bids for an item (I), then following updates need to happen atomically:
    • The bid information needs to be added to the Bids table against a new bidId.
    • bidId needs to be added against I’s itemId in ItemBids materialized view.
    • bidId needs to be added against A’s userId in UserBids materialized view.
    • I’s maxBid needs to be updated against I’s itemId in the Item table.
      The intermediate states may violate (a) app-specific integrity constraints (e.g: bid appears as top bid on the item page, but doesn’t appear in the list of bids by the bidder), and (b) referential integrity (happens if data store can reorder operations, like in the case of EC stores).
  9. When a user cancels his bid, all the above insertions need to be deleted atomically. Intermediate states may violate referential integrity (under reordering of operations).
  10. When a user (A) offers an item (I) for auction, then I needs to be added to the Items table, and its itemId against A’s userId in the UserItems table/materialized view, simultaneously.
  11. When the auction concludes, the above insertions need to be deleted atomically. Intermediate states may violate referential integrity (under reordering of operations).

To ensure consistent reads

Atomicity only guarantees that a transaction’s effects, if visible, are visible in their entirety. It does not prevent effects from becoming visible to only a subset of operations in a different transaction. Therefore, in many cases, atomicity of a write transaction itself is not sufficient to ensure that all reads witness consistent version of data; we need certain isolation gurantees on reads. Applications use transactions to achieve isolation. Usecases:

  1. BankAccount:
  2. When a user (A) saves money (an atomic transfer from checking to savings account), and immediately checks the total balance in her accounts by issuing two getBalance operations, one on each of her accounts, she might see an inconsistent balance. This can happen if first getBalance witnesses the effects of save transaction, but second does not, or vice versa. To prevent this from happening, both getBalance operations should be wrapped inside a transactions, which needs to be executed in isolation with respect to a consisten snapshot of the database.
  3. Microblog:
  4. A read operation on user A’s followers list might tell us that user B follows A, but a read of user B’s following list might return an inconsistent result. This happens if first read witnessed the followUser transaction whereas second read did not. This situation can be prevented by wrapping both reads in a transaction and insisting that this transaction be executed in isolation with respect to a consistent version of the database.
  5. When retrieving a user(A)’s profile using username, we perform two reads - one to fetch user’s uuid from his username, and other to fetch the profile details indexed by uuid. When the user A tries to view her profile immediately after registering, the first read may succeed but second read may fail due to the absence of relevant record. This happens if the first read witnesses the effects of addUser transactions, whereas the second read does not. This situation can also be avoided by wrapping both reads in a transaction and running it under an appropriate isolation.
  6. Rubis:
  7. When displaying the list of all bids on an item, we might encounter an instance of referential integrity violation, although none actually exists in the data. This can happen if a read operation on ItemBids table reads latest version of the table, whereas the subsequent read on the Bids table reads an earlier version, which may not contain certain bids. Fix for this case is same as above.

To perform consistent (integrity-respecting) updates

Making atomic updates to the data in complete isolation can still leave the data in an invalid state. This happens when multiple such atomic updates succeed in complete isolation, but the merging resultant states results in an inconsistent state. Applications use serializable transactions to preempt the possibility of a concurrent conflicting update. Usecases:

  1. BankAccount
  2. A transfer transaction from account A to account B has to be serializable with respect to all other transfer transactions from account A to gaurantee application invariant of non-zero balance.
  3. Microblog:
  4. When deactivating the account of user (A), the userId has to be removed from the following list of all her followers, and subsequently from the user table. All operations should appear to have happened simultaneously, so they have to be wrapped in a transaction. Furthermore, to prevent violation of referential integrity, the transaction has to be serializable with respect to all addFollower transactions adding followers to A.
  5. Rubis:
  6. concludeAuction and cancelBid transactions both can independently succeed possibly resulting in a state, where a bid is simultaneously a winning bid and a canceled bid. To avoid this inconsistency, cancelBid transaction needs to be serializable with concludeAuction transaction.


  • Circular referencing.
  • Let a transaction T contain an SC operation. If a transaction T' requests RR isolation w.r.t T, then T and T' are automatically serializable.
Continue reading