Jul 25, 2015 - ML Type Inference

One of the most useful features of ML-family languages (OCaml, Standard ML, F# etc) is the type inference. This post contains the notes I took when I was trying to understand the foundational principles of ML type inference.

Damas-Milner Algorithm for Type Inference

Let us take a look at the original type inference algorithm produced by Damas and Milner. Their language is lambda calculus with let expression:

e ::= x | λx.e | e e | let x=e in e 

Type language composed of a set of type variables (α) and primitive types (ι).

τ = α | ι | τ → τ
σ = τ | ∀α.σ

A Type substitution is [τ1/α1, τ2/α2, ... , τn/αn]. It distinguished between an α, a type, and α, a type variable.

We define a relation (≥) that relates a more polymorphic type to a less polymorphic type. For eg, ∀α.α→α ≥ ∀β.β list → β list.

Typing judgment is Γ⊢e:σ.

Instantiation rule is also subtyping rule, where subtyping is defined in terms of “more polymorphic”:

Γ⊢e:σ₁   σ₁≥σ₂
--------------
    Γ⊢e:σ₂

Since there is no explicit big-lambda expression, generalization rule is important:

Γ⊢e:σ  α∈FV(σ)  α∉FV(Γ)
-----------------------
       Γ⊢e:∀α.σ

Basically, ∀α.σ is closure of σ w.r.t Γ (denoted \(\bar{\Gamma}(\sigma)\)).

Let Polymorphism

There is only one rule to introduce polymorphic type into the context Γ:

Γ⊢e:σ       Γ,x:σ⊢e':τ 
----------------------
  Γ⊢(let x=e in e'):τ
Continue reading

May 10, 2015 - SAT solving puzzles

Few months ago, Cheryl birthday puzzle has has been an internet phenomenon. If you found the puzzle tricky to solve, we are in the same boat. However, if you observe, the puzzle only requires us to apply simple logic; not number theory or complex arithmetic, just simple logic. What, then, makes the puzzle tricky?

I attribute its trickiness to the chain of reasoning involved - repeated application of simple logic to the existing knowledge to make new deductions until we arrive at the solution. Since the reasoning is quite involved, it would be great if we can enlist the services of a computer to help us in the process. But, how can a computer assist us in making deductions? The answer lies in the well-known SAT problem.

Knights and Knaves Puzzle

Consider the following simple instance of the famous Knights and Knaves class of puzzles:

There is an island in which certain inhabitants, called knights
always tell the truth, and others, called knaves always lie. It is
assumed that every inhabitant of this island is either a knight or 
a knave. 

Suppose A says, “Either I am a knave or B is a knight.” 

What are A and B?

The puzzle is easy to solve mentally. Nonetheless, if you have a 2-SAT solver handy, you can also let solver do the work. Here is how:

Let AisKnight and BisKnight be boolean variables, denoting the propositions “A is a Knight”, and “B is a Knight”, respectively. Since an inhabitant can either be Knight or Knave but not both, boolean formulas ¬ AisKnight and ¬ BisKnight denote the propositions that “A is a Knave”, and “B is a Knave”, respectively.

Now, consider the statement made by A: “Either I am a knave or B is a knight”. Symbolically, it is equivalent to: (¬ AisKnight) ∨ BisKnight. Since Knights always speak truth, and Knaves always lie, the statement is true if A is a Knight, and false if A is a Knave. In other words, the statement holds if and only if A is a Knight. Symbolically, it means:

AisKnight ⇔ (¬ AisKnight) ∨ BisKnight

Observe that the above boolean formula completely describes the constraints imposed by the puzzle, and does so quite succinctly. If the above formula is satisfiable, then the constraints imposed by the puzzle are also satisfiable, which means that puzzle has a solution. Furthermore, the satisfying assignment to literals in the formula (i.e., their truth values) give us the solution to the puzzle. To get those satisfying assignments is what the 2-SAT solver is for. If we feed the above formula to a 2-SAT solver, we get AisKnight = trueand BisKnight = true as satisfying assignments, telling us that A and B are both knights.

Note that satisfying assignment only tells us that “if A is knight, then the puzzle is solved”. Maybe the puzzle is also solved if A is a knave. How do we check for this possibility? Again, using the solver. We explicitly encode the constraint that A is Knave, and ask if our constraints are satisfiable. The boolean formula is shown below:

(AisKnight ⇔ (¬ AisKnight) ∨ BisKnight) ∧ (¬ AisKnight)

A 2-SAT solver cannot find any satisfying assignment to the above formula, indicating that our constraints are unsatisfiable. Hence, A can never be a Knave.

Now, lets take a step back and summarize what we have done. We were given a puzzle that is not difficult, but not obvious either. Instead of doing the reasoning ourselves, we simply described the puzzle as a boolean formula, and let a SAT solver do the reasoning for us. In other words, we have outsourced the reasoning to a solver.This facility to outsource some or all of the reasoning to a solver can be quite handy when we are solving logical puzzles that require more complex reasoning, like Cheryl’s birthday puzzle.

Continue reading

May 4, 2015 - Notes - Static Contract Checking for Haskell

I latex’d some notes while reading Dana N. Xu et al’s POPL’09 paper: Static contract checking for Haskell.


Continue reading

Jan 15, 2015 - Notes - A Data-Driven Approach for Algebraic Loop Invariants

This post contains my notes on Rahul Sharma et al’s ESOP’13 paper: A Data-Driven Approach for Algebraic Loop Invariants.

The paper proposes a neat approach of inferring algebraic loop invariants by observing concrete program states, and making use of techniques from linear algebra to gain insights.

Following is an example take from the paper:

assume (x=0 && y=0);
writelog (x,y);
while(nondet()) do
  y := y+1;
  x := x+y;
  writelog (x, y);

For the above example, we are interested in finding the loop invariant for the while loop. Assume we already know that invariant can be non-linear, but no complex than degree 2. Let \(f(x,y)\) denote a polynomial of degree less than or equal to 2. The equation \(f(x,y)=0\) is a polynomial equation or algebraic equation in variables x and y. Since x and y are the only two variables in scope at the head of the loop, we know that loop invariant will be of form:

where each \(f_i\) is a unique polynomial of degree utmost 2. A polynomial of degree 2 is simply a linear combination of monomials of degree utmost 2. In this case, we have 6 unique monomials: \({1,x,y,x^2,y^2,xy}\). Therefore, each polynomial will be of the following form:

The invariant inference problem then is to infer the rational constants \(w_1-w_6\) such that \(f(x,y)=0\).

The paper proposes a data-driven approach to solve this problem. Informally, a loop invariant over-approximates the set of all possible program states that are possible at the loop head. For the above example program, the program state consists of values of x and y. We instrument the code to observe the program state (i.e., values of x and y) at the beginning of every iteration of the loop, when the program is executed over test inputs (user-provided or random). We record these observations in a log file.

Next, for every observed program state, we calculate the values of all 6 monomials and record them in a matrix, where a row denotes a program state (i.e., fixed values for x and y), and a column denotes values of a fixed monomial in various program states. A sample datamatrix for the above example is shown below:

Now, any valid solution for \(w_1-w_6\) (hence the loop invariant problem) should explain the observed program states. Therefore, following has to hold:

Observe that any solution to the above equation gives us an invariant that holds over observed program states. However, it may not hold over all possible program states. Therefore, solving above equation gives us candidate invariants, which we can then verify for validity. But, how do we solve the above equation. Here is where techniques from linear algebra help.

Let us denote the datamatrix with \(A\), and the matrix of coeffecients (\(w_1-w_6\)) with \(W\). The matrix \(W\) has only one column, and is referred to as column vector or simply a vector. Set of all possible solutions to \(W\) (each is a vector) is called the null space of \(A\). Note that NullSpace(A) is a set of vectors. Span of any set of vectors \({x_1,..,x_n}\) is set of all vectors that can be expressed as a linear combination of \({x_1,…,x_n}\). In other words, span of a set of vectors is the entire vector space represented by those vectors. Observe that span of a NullSpace(A) is NullSpace(A), as every linear combination of vectors in NullSpace(A) results in a vector \(x\), which satisfies \(Ax=0\). Hence, NullSpace(A) is a vector space. Basis of any vector space is a set \(B\) of linearly independent vectors such that every other vector in the space can be expressed as a linear combination of vectors in \(B\). For a 3-D cartesian vector space, the set \({(3,0,0), (0,2,0), (0,0,1)}\) can be considered a basis. The cardinality of the basis is the dimension of the vector space, as is evident from the basis set of 3-D cartesian space.

For the matrix \(A\), Let \(B_w\) denote the basis of NullSpace(A). What does set \(B_w\) represent? Each vector in the set is a unique solution to \(W\) that cannot be expressed as linear combination of other solutions. Further, any solution to \(W\) can be expressed as a linear combination of vectors in \(B_w\). Effectively, this means that by calculating basis set of NullSpace(A), we derive _all unique algebraic invariants of degree utmost 2 that hold on observed program states. The invariants are unique as any other invariant can be expressed as a linear combination of polynomials denoted by the basis set.

Once candidate solutions are generated by calculating the basis of null space of \(A\), they are verified via symbolic execution and SMT solvers. Verification failure results in a counterexample, which is simply a valuation for x and y for which candidate invariant fails. In such case, loop is rerun with these values of x and y and more observations are recorded, thereby increasing the datamatrix \(A\). Invariants generated by calculating the basis of null space of larger datamatrix will now definitely account for the counterexample generated by the solver in the previous step. This counterexample guided iterative refinement (which they call \({\sf Guess-And-Check}\)) is similar in flavour to CEGAR, but there is one fundamental difference. Recall that CEGAR starts with a formula that over-approximates program behaviours, and gradually refines the formula until it is valid, or a bug is found in the program. In contrast, \({\sf Guess-And-Check}\) starts with under-approximation of program behaviours (represented by the datamatrix), and adds more behaviours until we have sufficient representation of the all possible program behaviours. In this sense, \({\sf Guess-And-Check}\) is more similar to CEGIS. The only difference is that, while CEGIS relies on an SMT oracle to generate positive samples, \({\sf Guess-And-Check}\) relies on tests.

In practice, the number of possible monomials at the loop head could be large. The paper uses a heuristic to discard monomials that are unlikely to be part of the final invariant. The heuristic is that monomials which grow at a rate much larger than the loop counter \(i\) (i.e., at a rate \(\omega(i^d)\)) are most often red herrings.

Continue reading

Nov 30, 2014 - Sequential Consistency and Datarace freedom in Weak Memory Models

A natural view of execution of a multi-threaded program is as follows:

exn = []
while (there is an unfinished thread) {
  t = select an unfinished thread;
  instr = first(t);
  exn := exn ++ [instr];
  t := rest(t);
}
return exn;

Observe that exn preserves program order of all threads. This is sequential consistency. However, there is still non-determinism on line 3. When there are shared variables, this is the source of data-races. For example, consider the following program (assume that initially x is 0):

[
  [STORE (x,3)],
  [STORE (x,4); PRINT (LOAD x)]
]

Since there is non-determinism, either of the STOREs can be executed first. Possible sequentially consistent (SC) executions are:

1. [STORE(x,3); STORE(x,4); PRINT(LOAD x)]
2. [STORE(x,4); STORE(x,3); PRINT(LOAD x)]
3. [STORE(x,4); PRINT(LOAD x); STORE(x,3)]

Notice that STORE(x,3) conflicts with STORE(x,4) and also with LOAD x, as each pair accesses same memory location, and STORE writes to the memory location. Further, notice that there exists at least one pair of SC executions, where one execution is obtained from other by just reordering an adjacent pair of conflicting operations. Hence the program has data-race. The above SC executions print 4, 3 and 4, respectively. Therefore, the set of acceptable outputs in a sequentially consistent execution (even in presence of data-races) is {3,4} (note: the set does not include 0).

Let us assume we have locks in our language. Then, the execution of a multi-threaded program can be constructed as following:

exn = []
while (there is an unfinished thread) {
  do {
    t = select an unfinished thread;
  } while (first(t) = LOCK(c) and 
           exn = exn'++ [LOCK(c)] ++ exn'' where
           UNLOCK(c) notin exn'')
  instr = first(t);
  exn := exn ++ [instr];
  t := rest(t);
}
return exn;

Locks help us control the amount of non-determinism in the program. For example, for the following program:

[
  [LOCK(c); STORE (x,3); UNLOCK(c)],
  [LOCK(c); STORE (x,4); PRINT (LOAD x); UNLOCK(c)]
]

Possible executions are:

1. [LOCK(c); STORE(x,3); UNLOCK(c); 
    LOCK(c); STORE(x,4); PRINT(LOAD x); UNLOCK(c)]
2. [LOCK(c); STORE(x,4); PRINT(LOAD x); UNLOCK(c); 
    LOCK(c); STORE(x,3); UNLOCK(c)]

The only acceptable output of the program now is 4. Further, there are no adjacent conflicting operations in any of the above SC executions. So, the program is DR-free.

A data-race-free model guarantees SC only for data-race-free programs. A program is DRF if and only if it is impossible to construct an SC execution with concurrent conflicting memory accesses, where one of them is a STORE. (When do we say two memory accesses are concurrent?)

Continue reading

Sep 23, 2014 - SC vs Linearizability

Sequential consistency requires that all data operations appear to have executed atomically in some sequential order that is consistent with the order seen at every individual process.

If instead of individual data operations, we apply sequential consistency to transactions, the resultant condition is called serializability in database theory.

Linearizability imposes more constraints on an order decided by sequential consistency - the order needs to make sense to an external observer. Let us say that an external observer is running two threads A and B. If thread A performs action a0, then syncs with thread B following which B performs b0. A sequentially consistent execution can order b0 before a0, as long as both A and B perceive the same order. On the other hand, such an execution is not valid under linearizability condition. Linearizability dictates that an operation should take effect instantaneously before the perceived end of the operation.

Evidently, linearizability is a stronger constraint than sequential consistency (or serializability). Nevertheless, it is said that linearizability is a local property (i.e., composable), sc is not. (I don’t understand this. If you know what this means, please drop a comment.)

Continue reading

Sep 15, 2014 - Notes - McCarthy's Lisp and Reynolds's Definitional Interpreters

This post is a compilation of my notes on two landmark papers:

  1. The Lisp paper: McCarthy J, Recursive Functions of Symbolic Expressions and Their Execution by Machine, CACM’60
  2. Interpreters paper: Reynolds J C, Definitional Interpreters for Higher-Order Programming Languages, ACM’72

1.The Lisp Paper

This paper by John McCarthy introduces the Lisp programming language and describes its implementation on IBM 704 machines. Lisp stands out among the programming languages of its day (eg: Algol 60) as the first high-level programming language that abstracts the underlying machine completely. Instead, Lisp allows symbolic computation by letting programmers define (recursive) functions over symbolic expressions (called S-expressions). An S-expression is either an atomic symbol (hence “symbolic”), or an ordered pair of S-expressions. Reserving an atomic symbol NIL to denote an empty list, we can construct a list of S-expressions:

[s1,s2,s3]

using ordered pairs as:

(s1,(s2,(s3,NIL)))

Observe that above is also an S-Expression. Hence, a list of S-expressions is also an S-expression. Since most functions operate on a list of symbols, programming such functions in Lisp is effectively LISt Programming (hence, the name “Lisp”).

Lisp defines five primitive S-functions to construct & destruct lists, and to check equality among atomic symbols:

  1. atom
  2. eq
  3. car
  4. cdr
  5. cons

Lisp, as defined in paper, includes a class of expressions called Meta-expressions (written M-expressions) to let us define and apply S-functions over S-expressions. An M-expression is either

  1. A function application. For eg, f[2; EMPTY] is an M-expression that denotes application of function f over S-expressions of atomic symbols 2 and EMPTY.
  2. A function definition with grammar: fname[x0, …, xn] = expr, and
  3. A conditional expression: [expr1a -> expr1b; …; exprna -> exprnb ]

Using M-expressions and primitive S-functions, we can define a subst function to substitute An S-expression (x) for an atomic symbol (y) in another expression (z) as:

subst[x;y;z] = [
  atom[z] -> [
    eq[z;y] -> x;
    T -> z
  ]
  T -> cons [subst[x; y; car[z]]; 
             subst[x; y; cdr[z]]]
]

This example demonstrates that while M-expressions are control structures (functions & conditional expressions), S-expressions represent data to be manipulated by M-expressions.

The definition (semantics) of Lisp are given as an interepreter program in Lisp itself. McCarthy’s list interpreter, implemented as mutually recursive eval and apply functions, is often cited as an example of metacircular evaluation - defining the evaluation semantics of embedded language in terms of evaluation semantics of host-language, which happens to be same as embedded languge. Interestingly, McCarthy defines Lisp as a dynamically scoped language (Paul Graham says so, but is it really? Reynolds’s 2nd interpreter, which he says is similar to McCarthy’s, is lexically scoped.).

In order to use metacircular lisp interpreter to interpret a lisp program, the program first has to be converted to data. In other words, all M-expressions need to be translated to S-expressions (Note: this means that M-expressions are only a syntactic sugar). A simple algorithm to perform the same has been described in the paper. The algorithm converts all function names and their arguments in the user lisp program (which are meta-variables) to atomic symbols. The interpreter (eval function, to be specific) maintains an environment which defines bindings for these atomic symbols. However, note that the original program may also contain atomic symbols, and these symbols have no attached interpretation. To distinguish between atomic symbols that are present in the user program, and atomic symbols that are introduced while convering M-expressions to S-expressions, the former set of atomic symbols are quoted.

As mentioned previously, Lisp is the first language that abstracted the underlying machine completely. In order to be able to do that, the run-time should be able to manage scarce machine resources, such as memory. Towards this end, Lisp introduced automated memory reclamation, which is now commonly known as “Garbage Collection”.

2.Definitional Interpreters Paper

The Lisp paper defines semantics of Lisp using a metacircular interpreter written in Lisp itself. The definitional interpreters paper by Reynolds exposes subtle aspects of such definitional interpreters. In particular, it shows how semantics of the embedded language (called the defined language) can depend on semantics of the host language (called the defining language), such as its treatment of higher-order functions, and the order of function application . To demonstrate this point, the paper first constructs a metacircular interpreter for a higher-order (defined) language in another higher-order (defining) language, by simply translating an expression (eg: function application) in defined language as the corresponding expression in defining language. The paper makes two observations about this interpreter:

  1. The nature of higher-order functions is not clear.
  2. The order of application is not explicit; the order of evaluating function applications in defined language depends on whether the defining language is call-by-value or call-by-name.

Subsequently, Reynolds proposes two transformations to make the treatment of higher-order functions, and order of function applications explicit in the semantics of defined language:

  1. Defunctionalization: A function in defined language is not a function in defining language. Instead, functions and function closures are represented explicitly as data structures (For eg, as S-expressions in Lisp interpreter), and the semantics of a function application is defined explicitly, as a transformation over data. Defunctionalization effectively lets us write a definitional interpreter for a higher-order language in a first-order language.
  2. CPS transformation: A continuation is a function that explicitly represents “rest of the computation”. At every atomic step of evaluation in a definitional interpreter, if we explicitly describe what “rest of the evaluation” is, then the semantics of defined language no longer depends on order of evaluation of defining language; the order of evaluation is now explicit in the interpreter. This kind of interpreter is said to be written in continuation passing style (CPS).

The paper also describes how imperative features, such as assignments, can be encoded explictly in a definitional interpreter written in applicative style.

Continue reading

Sep 9, 2014 - CAP Theorem and Related

My intention in writing this note is to understand the relation between conventional model of distributed systems that they usually teach in the distributed systems course and the the distributed web services hosting replicated datatypes. Fault tolerance is a concern in the former, and it is studied separately from communication link failures. On the other hand, partition tolerance is a concern in the later, and it looks like this means tolerance to communication failures. Consistency and availability are major concerns in distributed web services, whereas they don’t figure anywhere in the formal study of distributed systems. What is the connection? How can we express CAP theorem in conventional models?

The System Model

The conventional model of a distributed system is that of a replicated state machine (automaton), where the automaton models a local computation that reacts to external messages. This makes it clear that the focus in conventional model is on how to perform a replicated computation rather than on how to maintain replicated data.

TRLifeCycle

The above image is taken from Reliable Distributed Systems textbook.

This view is also corroborated by Eric Brewer in his PODC’2000 talk, where he conjectured CAP. Here are some exerpts from his (slides):

Inktomi builds distributed systems ... but very little use of
classic DS research.

"Distributed Systems" don't work

Persistent state is hard. Classic DS focuses on computation, not
data. This is WRONG; computation is the easy part.

The semantics of a distributed system that is intented to maintain replicated data is best captured in Burckhardt et al’s POPL’14 paper. Since we are familiar with the model, I am not going to reproduce it here.

In the conventional model, process is the unit of communication and the the unit of failure. It is assumed that there is a one-to-one communication link between every pair of processes in the system. Conventional model allows for processes to fail. The widely used model of process failure is crash-stop, where a process crashes and stops, and therefore becomes inaccessible. A communication link may also fail, but this failure shows up as a process failure - processes at both ends of a link preceive each other as failed processes. However, other processes may still view them as correct processes, and the conventional model allows for this disparity in processes’ view of what other processes have failed.

Replicated data stores are usually geo-distributed, so there do not exist one-to-one links between processes. Processes are organized into subnets, which are often connected through a single communication link. Therefore, in distributed web services model, a communication link is considered a unit of failure. Failure of a link can cause network to be paritioned into disconnected subnets, and web-services are often required to tolerate these partitions. For most purposes, a process failure can also be modeled as network partitioning by simply assuming that the failed process is in its own parition. In summary, in distributed webservices model, faults occur in communication links thereby leading to network partitions, and fault tolerance effectively means partition tolerance.

Timing Assumptions

Both conventional and current models of a distributed system are asynchronous models - they do not make any assumptions about time bounds on communication delays or (relative) process speeds. In both the models, we rely on logical clocks (eg: vector clocks) and causal ordering to understand the behaviour of the system with respect to passage of time.

Consistency & Availability

Ideally, a distributed web-service has to be consistent in the sense that it should behave the same as if it is operating on an atomic data object. Any operation performed on the data object should see the effects of all previous operations. For example, consider a web-service for a simple read/write register. The operations allowed are write and read, which write a value to the register and read the current value in the register, respectively. Let us assume that the client performs two successful writes of values 20 and 40 to the register. Then a subsequent read should return 40, failing which the system is deemed inconsistent. This kind of consistency guarantee is called strong consistency or sequential consistency. Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instance.

Along with being consistent, a distributed web-service needs to be available - every request made to the web-service should be met with a response, given that the web-service remains accessible on the network. To be available, no non-failing node in the distributed system implementing the web-service should wait infinitely for an event to occur before responding to a client request.

CAP theorem says that it is impossible to guarantee (strong) consistency and availability in a system that needs to be partition tolerant. First, let us see if this makes sense intuitively. Consider the read/write register web-service described previously. Assume that it is being implemented by a distributed system with two geo-distributed nodes, each holding a replica of the register. Consider a client which makes following requests to the web-service in the order shown:

1. write 20
2. write 40
3. read

The order is called session order. Let us tag each request with is serial number in session order. Assume that write 20 goes to first node, which immediately writes 20 to the local register and simlutatenously forwards the request to second node. Now, should the first node wait for the acknowledgement from second node before responding to the client? Given that we are operating in an asynchronous environment and there is no time bound on the delivery of acknowledgement, the wait time for first node could potentially be infinite. This leads to the violation of the availability guarantee. Therefore, the node should respond immediately to the client, and use timeouts and retransmissions to eventually propagate client request to the second node.

Consider a scenario where second write request (write 40) also goes to first node, but the read request goes to the second node. Assume that by the time read request was made to second node, it already received the write 20 request forwarded by the first node. Looking at their serial numbers (1 for write 20 and 3 for read), the second node knows that there is a request made by the client before the read that it has not yet received. This missing request could be a write (it is indeed a write in this case). Therefore, if the node has to respond with a correct value for the read request , it has to wait until it receives the missing request, in which case it might have to wait infinitely as network may get partioned in meantime. This violates availability. The other option is to respond with the current value of the register without waiting for the second request to be delivered. In this case, the client reads 20 instead of the expected 40, which means that the system is no longer (strongly) consistent. Therefore, it is impossible for our service to be both available and consistent in presence of network partitions.

Consistency & Avalability in Conventional Model

The conventional model accounts for network partitioning through process failures - A faulty process that stops communicating with rest of the processes in the system effectively leads to network partitioning. Fault tolerance is the ability of a system to behave in well-defined manner once faults occur. There has been an extensive research on fault tolerance in conventional models, but does this research include a study of consistency and availability properties in presence of faults?

The answer is yes. They were studied under broad categories of safety and liveness properties, respectively. A safety property of a system is usually expressed as a set of legal system configurations, commonly referred to as an invariant. To be safe, a system has to always remain in the set of safe states as defined by the safety property. Consistency is a safety property as it restricts the set of observable states of the system. For the read/write register example, consistency dictates that the observable state of the register after the first two writes is the value 40. Any other state is inconsistent or unsafe.

On the other hand, a liveness property claims that some good thing will eventually happen during system execution. Liveness properties are eventuality properties - a traffic signal should eventually allow every car waiting at an intersection to pass through. Availability is a liveness property, as it requires every request to be eventually met with a response. In literature on fault tolerance, availability property is more commonly referred to as guaranteed service.

Recall that fault tolerance is the ability of a system to behave in well-defined manner once faults occur. In his landmark survey paper on fault tolerance in distributed systems, Gartner identifies four forms of fault tolerance based on which among safety (S) and liveness (L) properties hold when faults occur in the system. The four are listed below:

  1. Masking fault tolerance: when both S and L hold in presence of faults
  2. Fail-safe: when only S holds
  3. Non-masking: when only L holds
  4. None (or fault intolerant): when none of them hold.

Gartner’s paper was published in 1999, before CAP was conjectured, and before eventually consistent data stores proliferated. The paper reflects the kind of applications that researchers had in mind when studying distributed systems. These applications are quite different from replicated data stores, which explains the disparity between conventional models and current systems. Below, I reproduce some of the relevant verbatim from the paper, which is suggestive of this gulf:

  fail-safe fault tolerance and is preferable to non-masking
  whenever safety is much more important than liveness. An example
  is the ground control system of the Ariane 5 space missile project
  [Dega 1996]. The system was masking fault tolerance for a single
  component failure, but was also designed to stop in a safe state
  whenever two successive component failures occurred [Dega 1996].
  For the latter type of faults, the launch of the missile
  (liveness) was less important than the protection of its pre-
  cious cargo and launch site (safety).

  ....

  In effect (of non-masking fault tolerance), the user may
  experience a certain amount of incorrect system behavior (i.e.,
  failures). For example, a calculation result will be wrong or a
  replication variable may not be up to date

  ....

  Research has traditionally focused on forms of fault tolerance
  that continuously ensure safety. This can be attributed to the
  fact that in most fault-tolerance applications, safety is much
  more important than liveness.

  ...

  For a long time nonmasking fault tolerance has been the “ugly
  duckling” in the field, as application scenarios for this type of
  fault tolerance are not readily visible. However, the potential of
  non-masking fault tolerance lies in the fact that it is strictly
  weaker than masking fault tolerance, and can therefore be used in
  cases where masking fault tolerance is too costly to implement or
  even provably impossible.

  ...

  (Talking about self-stabilizing programs, which are pre-cursors of
  eventually consistent programs) Examples show that such programs
  are quite difficult to construct and verify [Theel and Gärtner
  1998]. Also, their nonmasking nature has inhibited them from yet
  becoming practically relevant.

  ...

In summary, the paper says that masking fault tolerance, where both safety and liveness is preserved in presence of faults is “strictest and most costly” form of fault tolerance, and that ensuring such tolerence is a “major area of research”. Instead, fail-safe fault tolerance is preferable for most practical applications.

The CAP Theorem

Now that we are aware of Gartner’s categorization of fault tolerance, we can state the CAP theorem simply as:

It is impossible to have masking fault tolerance in an unreliable
distributed system

In the words of Gilbert and Lynch, who gave the first the proof of the theorem:

The CAP Theorem, in this light, is simply one example of the
fundamental fact that you cannot achieve both safety
and liveness in an unreliable distributed system
      - From "Perspectives on the CAP Theorem"

It should therefore be noted that oft reproduced formulation of the CAP theorem as “pick any two among Consistency, Availability and Partition Tolerance” is misleading at best. A better formulation is:

A distributed system that is network partition tolerant cannot be
consistent and available at the same time.

So, it is more about picking one among the two rather than picking two among the three.

Eventual Consistency & Its Impact on Availability

Bailis et al’s paper on Highly Available Transactions: Virtues and Limitations, classifies operations on a replicated data store as unavailable, sticky-available and available. An operation, such as a write on read/write register, that has no requirements on consistency is classified as available. This is expected, as write can be applied to any replica without any need to wait for an event, and the client can be informed of success/failure of write

Continue reading

Apr 9, 2014 - Coq Basics

GADTs vs Inductive Datatypes

Consider the following definition of Nat-indexed Vector GADT in OCaml and Haskell: OCaml:

    (*  vec : * -> * -> * 
     *)
    type (_,_) vec =
    | Nil : ('a,zero) vec
    | Cons : 'a * ('a,'b) vec -> ('a,'b succ) vec

Haskell:

    data Vec :: * -> Nat -> * where
      VNil :: Vec a 'Zero
      VCons :: a -> Vec a n -> Vec a ('Succ n)

Observe that quantified type variables in the type of constructors are generalized by the type-constructor signature. Infact, this is not always true. For example, consider the GADT for higher-order abstract syntax:

  data Exp :: * -> * where
   -- other constructors elided
    App :: Exp (a -> b) -> Exp a -> Exp b

The type scheme of App contains a type variable (a) that is not generalized by the type-constructor signature.

  App :: a. b. Exp (a -> b) -> Exp a -> Exp b

This type is treated as existential type when an expression of type Exp b is destructed and matched agains App.

Therefore, the correct observation is this: All type variables generalized by the signature of type constructor are also generalized in type schemes of its value constructors.

Here is the nat-indexed vector inductive datatype in Coq:

Inductive vec : Type -> nat -> Type :=
  | vnil : forall A:Type, vec A O
  | vcons : forall A:Type, forall n:nat, A -> vec A n -> vec A (S n).

The above definition should be read as following: vec is a Type indexed by a Type and a nat. Here are the only ways to construct values of Vec X N, where X could be any Type and N could be any nat:

  • For any Type A, vnil A is a value of Vec A O
  • For any Type A and for any nat n, vcons A n accepts a value of type A, a value of type vec A n, and generates a value of type Vec A (S n)

Notice that it looks quite similar to Haskell definition, with type variables explicated. An empty list in explicitly typed haskell is VNil Int, whereas in Coq, it is vnil int. However, unlike Haskell, where there is term-type-kind hierarcy, Coq has full dependent types with infinite hierarchy of universes. One consequence of this is that we will frequently see arrows mapping members of one universe to members of a different universe. For example, if we Print vec., this is what we see:

    Inductive vec : Type (* Top.59 *) -> nat -> Type (* max(Set, (Top.61)+1, (Top.62)+1) *) :=
      | vnil : forall A : Type (* Top.61 *), vec A 0
      | vcons : forall (A : Type (* Top.62 *)) (n : nat),
                   A -> vec A n -> vec A (S n)

Evidently, vec maps a member of Type(59) universe and member of Set universe to a type universe at higher level. Compare this to the type of Vec type-constructor in haskell * -> * -> *, which simply says that when a is a type, and n is a type (i.e., a and n have kind *), then Vec a n is a type (i.e., it has kind *).

In the inductive definition of vec, observe that both constructors construct values of type vec A _. Therefore, we can as well generalize A at the type of type-constructor (here, this is a mere syntactic convenience):

    Inductive vec2 (A : Type) : nat -> Type :=
      | vnil2 : vec2 A O
      | vcons2 : forall n:nat, A -> vec2 A n -> vec2 A (S n). 

The above definition should be read as following: vec is a Type indexed by a Type A and a nat. Here are the only ways to construct values of Vec A N, where N could be any nat

Another consequence of having full dependent types in Coq is that the type of inductive type-constructors can be dependent as well. For eg, here is the inductive definition of sig datatype:

    Inductive sig (T : Type) (P : T -> Prop) : Type :=
        exist : forall x : T, P x -> sig T 

sig T P is the type of values of type T that satisfy proposition P. sig is a function on type, and propositions on that type. Observe that the type of second argument to type-constructor sig depends on the value of its first argument.

How did we arrive at that type definition? Let us say we want to define a refinement type for values of nat satisfying a predicate P. Effectively, we would like to index nat with a predicate P over natural numbers. Since P is a predicate, P : nat -> Prop. We define a new Type natST (read “nat, such that”) that is indexed by P : nat -> Prop:

    Inductive natST (P : nat -> Prop) : Type := ??

What should be its constructors? There has to be only one constructor that, given a nat (n), and a proof that it satisfies the predicate (i.e., proof of P n), returns evidence for natST P. In other words, we have an evidence for natST P if and only if we have a nat, and the evidence that it satisfies the predicate. Destructing a natST P value would produce a nat and a proof that it satisfies P.

Type natST helps us easily write type for division operation:

    Definition natPos : Type := natST (fun n:nat => n>0).
    Definition div : nat -> natPos -> nat * nat.
    Admitted. Defined.
    (* equivalent type of div: *)
    Definition div2 : forall n:nat, n>0 -> nat -> (nat*nat),
    Admitted. Defined.

Implicit Arguments

Following is to set the implicit arguments option in Coq.

    Set Implicit Arguments.

When the mode for automatic declaration of implicit arguments is on, the default is to automatically set implicit only the strict implicit arguments (eg: tyarg of Cons is strict implicit, whereas that of Nil is contextual implicit).

By default, Coq does not automatically set implicit the contextual implicit arguments. To tell Coq to infer also contextual implicit argument, use command

    Set Contextual Implicit.

Setting contextual implcit too may not infer ty annot for Nil. Why? How does coq know whether we want to partially applied Nil or fully apply it? Setting Maximal Implicit works, but that would prevent us from writing certain partial function applications. Therefore, we explciity set implicit args for Nil:

    Implicit Arguments Nil [X].

Notation, Associativity, and Precedence

The level is for precedence. Assume for instance that we want conjunction to bind more than disjunction. This is expressed by assigning a precedence level to each notation, knowing that a lower level binds more than a higher level. Hence the level for disjunction must be higher than the level for conjunction. 100 is the highest level. There is a special 200 level higher than 100.

Since connectives are the less tight articulation points of a text, it is reasonable to choose levels not so far from the higher level which is 100, for example 85 for disjunction and 80 for conjunction

The Omega Tactic

Omega is a decision procedure for linear arithmetic (also known as Presburger arithmetic). It can be used to discharge any silly goals on nats and integers like:

Omega can deal with nats. It converts nats to integers. Following operations on nat are recognized by omega:

  • Predicates, such as =, le, lt, gt, ge
  • Functions, such as +, minus, mult, pred, S, O

Omega needs to be imported:

    Require Import Omega.

apply and eapply tactics

apply H tries to unify the current goal with conclusion of H. If it succeeds, it generates a sub-goal for each assumption (argument) of H. For instance, in the following context:

     A : Type
    s1 : Ensemble A
    s2 : Ensemble A
    s3 : Ensemble A
    x : A
    H2 : In A s1 x
    ============================
     In A (Union A s1 (Union A s2 s3)) 

using apply Union_introl, where

    Union_introl
     : forall (U : Type) (B C : Ensemble U) (x : U),
       In U B x -> In U (Union U B C) 

unifies U with A, B with s1, C with (Union A s2 s3), and x with x to generate following sub-goal:

       ....
     ============================
        In A s1 

which is already an assumption.

The tactic eapply does the same, except that instead of failing due to inability to instantiate variables, it replaces them with existentials. When you have existential in you context, destruct it to skolemize. The ?6201 style variables in goal when you use eapply are actually existential variables.

The Auto Tactic

The auto tactic solves goals that are solvable by any combination of

  • intros, and
  • apply.

By default, auto only does apply on local hypotheses. To add to the global hint database of auto:

    Hint Resolve <Lemma-name>.
    Hint Resolve <Constructors of inductive prop>
    Hint Constructors A. <Hint Resolve for all constructors of A>
    Hint Unfold <A Definition>.

One can optionally declare a hint database using the command Create HintDb.

To see hints applicable in the current context, and list of all hints, respectively:

    Print Hint.
    Print Hint *.

Using auto tactic with database of hints named sets:

    auto with sets.

You can extened hint DB for one application of auto with using keyword. To extend with Union_introl:

    auto using Union_introl.

To extend with all constructors of Union:

    auto using Union.

More avatars of auto:

  • trivial tactic - non recursive auto. depth =0.
  • autounfold and autorewrite to use Hint Unfold and Hint Rewrite hints in the hint database.

Auto Examples

Completely manual proof for theorem union_is_associative:

    Theorem union_is_associative : forall A : Type, forall s1 s2 s3 : Ensemble A,
                         Union _ (Union _ s1 s2) s3 = Union _ s1 (Union _ s2 s3).
    Proof.
      intros; eapply Extensionality_Ensembles. unfold Same_set.
      unfold Included. split.
        intros; inversion H.
            inversion H0. Check Union_introl. apply Union_introl. assumption.
            apply Union_intror. apply Union_introl. assumption.
            apply Union_intror. apply Union_intror. assumption.
        intros. inversion H.
            apply Union_introl. apply Union_introl. assumption.
            inversion H0. 
              apply Union_introl. apply Union_intror; assumption. 
              apply Union_intror. assumption.
    Qed.

Proof with auto:

    Proof.
      intros. apply Extensionality_Ensembles. unfold Same_set.
      unfold Included. split; intros ? H; inversion H;
        [ inversion H0; auto with sets | auto with sets |
         auto with sets | inversion H0; auto with sets].
    Qed.

Even simpler auto proof:

    Proof.
      intros. apply Extensionality_Ensembles. unfold Same_set.
      unfold Included. split; intros ? H; inversion H; subst;
        try (inversion H0); auto with sets.
    Qed.

SF automation chapter (sfauto)says this on auto:

Note that proof search tactics never perform any rewriting step (tactics rewrite, subst), nor any case analysis on an arbitrary data structure or predicate (tactics destruct and inversion), nor any proof by induction (tactic induction). So, proof search is really intended to automate the final steps from the various branches of a proof. It is not able to discover the overall structure of a proof.

The Solve Tactical

Solve tactical takes a list of tactics, and tries them left to right, stopping when a tactic is successful. If none of the tactics are successful, the solve fails.

    solve [inversion H | inversion H1].

The difference between try solve [e1 | e2]' and try e1; try e2` is the short-cut behaviour of solve.

The Intuition Tactical

Intuition t applies propositional decision procedure to simplify the goal, and then applies tactic t on the resultant.

  • intuition fail succeeds if the goal is a propositional tautology. Fails otherwise. This is equivalent to tauto tactic.
  • Simply using intuition is equivalent to using intuition auto with \*.

Notes:

  • To instantiate a universally quantified hypothesis H on variable n, just do (H n). Eg: destruct (H n).
  • When there is ambiguity about which term rewrite rewrites when there are multiple rewritable terms in the goal, assert the required hypothesis as a separate assertion, and prove it (using apply).
  • It is futile to define strong lemmas with conjunction in consequent. Not only that applying them is difficult, but also that auto only tries such lemmas when current goal is a conjunction at top level. Instead, define two separate lemmas.

References:

Continue reading