From the perspective of a transaction,

- Isolation: How should I see effects of other transactions.
- Atomicity: How other transactions see my effects.

From the perspective of a transaction,

- Isolation: How should I see effects of other transactions.
- Atomicity: How other transactions see my effects.

One of the most useful features of ML-family languages (OCaml, Standard ML, F# etc) is the type inference. This post contains the notes I took when I was trying to understand the foundational principles of ML type inference.

Let us take a look at the original type inference algorithm produced
by Damas and Milner. Their language is lambda calculus with
`let`

expression:

```
e ::= x | λx.e | e e | let x=e in e
```

Type language composed of a set of type variables (α) and primitive types (ι).

```
τ = α | ι | τ → τ
σ = τ | ∀α.σ
```

A Type substitution is `[τ1/α1, τ2/α2, ... , τn/αn]`

. It distinguished
between an α, a type, and α, a type variable.

We define a relation (≥) that relates a more polymorphic type to a less
polymorphic type. For eg, `∀α.α→α ≥ ∀β.β list → β list`

.

Typing judgment is `Γ⊢e:σ`

.

Instantiation rule is also subtyping rule, where subtyping is defined in terms of “more polymorphic”:

```
Γ⊢e:σ₁ σ₁≥σ₂
--------------
Γ⊢e:σ₂
```

Since there is no explicit big-lambda expression, generalization rule is important:

```
Γ⊢e:σ α∈FV(σ) α∉FV(Γ)
-----------------------
Γ⊢e:∀α.σ
```

Basically, ∀α.σ is closure of σ w.r.t Γ (denoted \(\bar{\Gamma}(\sigma)\)).

There is only one rule to introduce polymorphic type into the context Γ:

```
Γ⊢e:σ Γ,x:σ⊢e':τ
----------------------
Γ⊢(let x=e in e'):τ
```

Few months ago, Cheryl birthday puzzle has has been an internet phenomenon. If you found the puzzle tricky to solve, we are in the same boat. However, if you observe, the puzzle only requires us to apply simple logic; not number theory or complex arithmetic, just simple logic. What, then, makes the puzzle tricky?

I attribute its trickiness to the chain of reasoning involved - repeated application of simple logic to the existing knowledge to make new deductions until we arrive at the solution. Since the reasoning is quite involved, it would be great if we can enlist the services of a computer to help us in the process. But, how can a computer assist us in making deductions? The answer lies in the well-known SAT problem.

Consider the following simple instance of the famous Knights and Knaves class of puzzles:

```
There is an island in which certain inhabitants, called knights
always tell the truth, and others, called knaves always lie. It is
assumed that every inhabitant of this island is either a knight or
a knave.
Suppose A says, “Either I am a knave or B is a knight.”
What are A and B?
```

The puzzle is easy to solve mentally. Nonetheless, if you have a 2-SAT solver handy, you can also let solver do the work. Here is how:

Let `AisKnight`

and `BisKnight`

be boolean variables, denoting the
propositions “A is a Knight”, and “B is a Knight”, respectively. Since
an inhabitant can either be Knight or Knave but not both, boolean
formulas `¬ AisKnight`

and `¬ BisKnight`

denote the propositions that
“A is a Knave”, and “B is a Knave”, respectively.

Now, consider the statement made by A: “Either I am a knave or B is a
knight”. Symbolically, it is equivalent to: ```
(¬ AisKnight) ∨
BisKnight
```

. Since Knights always speak truth, and Knaves always lie,
the statement is true if A is a Knight, and false if A is a Knave. In
other words, the statement holds if and only if A is a Knight.
Symbolically, it means:

```
AisKnight ⇔ (¬ AisKnight) ∨ BisKnight
```

Observe that the above boolean formula *completely* describes the
constraints imposed by the puzzle, and does so quite succinctly. If
the above formula is satisfiable, then the constraints imposed by the
puzzle are also satisfiable, which means that puzzle has a solution.
Furthermore, the satisfying assignment to literals in the formula
(i.e., their truth values) give us the solution to the puzzle. To get
those satisfying assignments is what the 2-SAT solver is for. If we
feed the above formula to a 2-SAT solver, we get `AisKnight = true`

and
`BisKnight = true`

as satisfying assignments, telling us that A and B
are both knights.

Note that satisfying assignment only tells us that “if A is knight, then the puzzle is solved”. Maybe the puzzle is also solved if A is a knave. How do we check for this possibility? Again, using the solver. We explicitly encode the constraint that A is Knave, and ask if our constraints are satisfiable. The boolean formula is shown below:

```
(AisKnight ⇔ (¬ AisKnight) ∨ BisKnight) ∧ (¬ AisKnight)
```

A 2-SAT solver cannot find any satisfying assignment to the above formula, indicating that our constraints are unsatisfiable. Hence, A can never be a Knave.

Now, lets take a step back and summarize what we have done. We were given a puzzle that is not difficult, but not obvious either. Instead of doing the reasoning ourselves, we simply described the puzzle as a boolean formula, and let a SAT solver do the reasoning for us. In other words, we have outsourced the reasoning to a solver.This facility to outsource some or all of the reasoning to a solver can be quite handy when we are solving logical puzzles that require more complex reasoning, like Cheryl’s birthday puzzle.

Continue readingI latex’d some notes while reading Dana N. Xu et al’s POPL’09 paper: Static contract checking for Haskell.

Continue reading

This post contains my notes on Rahul Sharma et al’s ESOP’13 paper: A Data-Driven Approach for Algebraic Loop Invariants.

The paper proposes a neat approach of inferring algebraic loop invariants by observing concrete program states, and making use of techniques from linear algebra to gain insights.

Following is an example take from the paper:

```
assume (x=0 && y=0);
writelog (x,y);
while(nondet()) do
y := y+1;
x := x+y;
writelog (x, y);
```

For the above example, we are interested in finding the loop invariant
for the `while`

loop. Assume we already know that invariant can be
non-linear, but no complex than degree 2. Let \(f(x,y)\) denote a
polynomial of degree less than or equal to 2. The equation
\(f(x,y)=0\) is a polynomial equation or algebraic equation in
variables `x`

and `y`

. Since `x`

and `y`

are the only two variables in
scope at the head of the loop, we know that loop invariant will be of
form:

where each \(f_i\) is a unique polynomial of degree utmost 2. A polynomial of degree 2 is simply a linear combination of monomials of degree utmost 2. In this case, we have 6 unique monomials: \({1,x,y,x^2,y^2,xy}\). Therefore, each polynomial will be of the following form:

The invariant inference problem then is to infer the rational constants \(w_1-w_6\) such that \(f(x,y)=0\).

The paper proposes a data-driven approach to solve this problem.
Informally, a loop invariant over-approximates the set of all possible
program states that are possible at the loop head. For the above
example program, the program state consists of values of `x`

and `y`

.
We instrument the code to observe the program state (i.e., values of
`x`

and `y`

) at the beginning of every iteration of the loop, when the
program is executed over test inputs (user-provided or random). We
record these observations in a log file.

Next, for every observed program state, we calculate the values of all
6 monomials and record them in a matrix, where a row denotes a program
state (i.e., fixed values for `x`

and `y`

), and a column denotes
values of a fixed monomial in various program states. A sample
datamatrix for the above example is shown below:

Now, any valid solution for \(w_1-w_6\) (hence the loop invariant problem) should explain the observed program states. Therefore, following has to hold:

Observe that any solution to the above equation gives us an invariant
that holds over observed program states. However, it may not hold over
*all* possible program states. Therefore, solving above equation gives
us *candidate invariants*, which we can then verify for validity. But,
how do we solve the above equation. Here is where techniques from
linear algebra help.

Let us denote the datamatrix with \(A\), and the matrix of
coeffecients (\(w_1-w_6\)) with \(W\). The matrix \(W\) has only one
column, and is referred to as column vector or simply a vector. Set of
all possible solutions to \(W\) (each is a vector) is called the *null
space* of \(A\). Note that *NullSpace(A)* is a set of vectors. *Span*
of any set of vectors \({x_1,..,x_n}\) is set of all vectors that can
be expressed as a linear combination of \({x_1,…,x_n}\). In other
words, span of a set of vectors is the entire vector space represented
by those vectors. Observe that span of a *NullSpace(A)* is
*NullSpace(A)*, as every linear combination of vectors in
*NullSpace(A)* results in a vector \(x\), which satisfies \(Ax=0\).
Hence, *NullSpace(A)* is a vector space. Basis of any vector space is
a set \(B\) of linearly independent vectors such that every other
vector in the space can be expressed as a linear combination of
vectors in \(B\). For a 3-D cartesian vector space, the set
\({(3,0,0), (0,2,0), (0,0,1)}\) can be considered a basis. The
cardinality of the basis is the dimension of the vector space, as is
evident from the basis set of 3-D cartesian space.

For the matrix \(A\), Let \(B_w\) denote the basis of *NullSpace(A)*.
What does set \(B_w\) represent? Each vector in the set is a unique
solution to \(W\) that cannot be expressed as linear combination of
other solutions. Further, any solution to \(W\) can be expressed as a
linear combination of vectors in \(B_w\). Effectively, this means
that **by calculating basis set of NullSpace(A), we derive _all
unique algebraic invariants of degree utmost 2 that hold on observed
program states**. The invariants are unique as any other invariant can
be expressed as a linear combination of polynomials denoted by the
basis set.

Once candidate solutions are generated by calculating the basis of
null space of \(A\), they are verified via symbolic execution and SMT
solvers. Verification failure results in a counterexample, which is
simply a valuation for `x`

and `y`

for which candidate invariant
fails. In such case, loop is rerun with these values of `x`

and `y`

and more observations are recorded, thereby increasing the datamatrix
\(A\). Invariants generated by calculating the basis of null space of
larger datamatrix will now definitely account for the counterexample
generated by the solver in the previous step. This counterexample
guided iterative refinement (which they call \({\sf
Guess-And-Check}\)) is similar in flavour to CEGAR, but there is one
fundamental difference. Recall that CEGAR starts with a formula that
over-approximates program behaviours, and gradually refines the
formula until it is valid, or a bug is found in the program. In
contrast, \({\sf Guess-And-Check}\) starts with under-approximation
of program behaviours (represented by the datamatrix), and adds more
behaviours until we have sufficient representation of the all possible
program behaviours. In this sense, \({\sf Guess-And-Check}\) is more
similar to CEGIS. The only difference is that, while CEGIS relies on
an SMT oracle to generate positive samples, \({\sf
Guess-And-Check}\) relies on tests.

In practice, the number of possible monomials at the loop head could be large. The paper uses a heuristic to discard monomials that are unlikely to be part of the final invariant. The heuristic is that monomials which grow at a rate much larger than the loop counter \(i\) (i.e., at a rate \(\omega(i^d)\)) are most often red herrings.

Continue readingA natural view of execution of a multi-threaded program is as follows:

```
exn = []
while (there is an unfinished thread) {
t = select an unfinished thread;
instr = first(t);
exn := exn ++ [instr];
t := rest(t);
}
return exn;
```

Observe that `exn`

preserves program order of all threads. This is
sequential consistency. However, there is still non-determinism on
line 3. When there are shared variables, this is the source of
data-races. For example, consider the following program (assume that
initially `x`

is `0`

):

```
[
[STORE (x,3)],
[STORE (x,4); PRINT (LOAD x)]
]
```

Since there is non-determinism, either of the `STORE`

s can be executed
first. Possible sequentially consistent (SC) executions are:

```
1. [STORE(x,3); STORE(x,4); PRINT(LOAD x)]
2. [STORE(x,4); STORE(x,3); PRINT(LOAD x)]
3. [STORE(x,4); PRINT(LOAD x); STORE(x,3)]
```

Notice that `STORE(x,3)`

conflicts with `STORE(x,4)`

and also with
`LOAD x`

, as each pair accesses same memory location, and `STORE`

writes to the memory location. Further, notice that there exists at
least one pair of SC executions, where one execution is obtained from
other by just reordering an adjacent pair of conflicting operations.
Hence the program has data-race. The above SC executions print 4, 3
and 4, respectively. Therefore, the set of acceptable outputs in a
sequentially consistent execution (even in presence of data-races) is
{3,4} (note: the set does not include 0).

Let us assume we have locks in our language. Then, the execution of a multi-threaded program can be constructed as following:

```
exn = []
while (there is an unfinished thread) {
do {
t = select an unfinished thread;
} while (first(t) = LOCK(c) and
exn = exn'++ [LOCK(c)] ++ exn'' where
UNLOCK(c) notin exn'')
instr = first(t);
exn := exn ++ [instr];
t := rest(t);
}
return exn;
```

Locks help us control the amount of non-determinism in the program. For example, for the following program:

```
[
[LOCK(c); STORE (x,3); UNLOCK(c)],
[LOCK(c); STORE (x,4); PRINT (LOAD x); UNLOCK(c)]
]
```

Possible executions are:

```
1. [LOCK(c); STORE(x,3); UNLOCK(c);
LOCK(c); STORE(x,4); PRINT(LOAD x); UNLOCK(c)]
2. [LOCK(c); STORE(x,4); PRINT(LOAD x); UNLOCK(c);
LOCK(c); STORE(x,3); UNLOCK(c)]
```

The only acceptable output of the program now is 4. Further, there are no adjacent conflicting operations in any of the above SC executions. So, the program is DR-free.

A data-race-free model guarantees SC only for data-race-free programs.
A program is DRF if and only if it is impossible to construct an SC
execution with concurrent conflicting memory accesses, where one of
them is a `STORE`

. (When do we say two memory accesses are concurrent?)

Sequential consistency requires that all data operations appear to have executed atomically in some sequential order that is consistent with the order seen at every individual process.

If instead of individual data operations, we apply sequential consistency to transactions, the resultant condition is called serializability in database theory.

Linearizability imposes more constraints on an order decided by
sequential consistency - the order needs to make sense to an external
observer. Let us say that an external observer is running two threads
`A`

and `B`

. If thread `A`

performs action `a0`

, then syncs with
thread `B`

following which `B`

performs `b0`

. A sequentially
consistent execution can order `b0`

before `a0`

, as long as both `A`

and `B`

perceive the same order. On the other hand, such an execution
is not valid under linearizability condition. Linearizability dictates
that an operation should take effect instantaneously before the
perceived end of the operation.

Evidently, linearizability is a stronger constraint than sequential consistency (or serializability). Nevertheless, it is said that linearizability is a local property (i.e., composable), sc is not. (I don’t understand this. If you know what this means, please drop a comment.)

Continue readingThis post is a compilation of my notes on two landmark papers:

- The Lisp paper: McCarthy J, Recursive Functions of Symbolic Expressions and Their Execution by Machine, CACM’60
- Interpreters paper: Reynolds J C, Definitional Interpreters for Higher-Order Programming Languages, ACM’72

This paper by John McCarthy introduces the Lisp programming language and describes its implementation on IBM 704 machines. Lisp stands out among the programming languages of its day (eg: Algol 60) as the first high-level programming language that abstracts the underlying machine completely. Instead, Lisp allows symbolic computation by letting programmers define (recursive) functions over symbolic expressions (called S-expressions). An S-expression is either an atomic symbol (hence “symbolic”), or an ordered pair of S-expressions. Reserving an atomic symbol NIL to denote an empty list, we can construct a list of S-expressions:

```
[s1,s2,s3]
```

using ordered pairs as:

```
(s1,(s2,(s3,NIL)))
```

Observe that above is also an S-Expression. Hence, a list of S-expressions is also an S-expression. Since most functions operate on a list of symbols, programming such functions in Lisp is effectively LISt Programming (hence, the name “Lisp”).

Lisp defines five primitive S-functions to construct & destruct lists, and to check equality among atomic symbols:

- atom
- eq
- car
- cdr
- cons

Lisp, as defined in paper, includes a class of expressions called Meta-expressions (written M-expressions) to let us define and apply S-functions over S-expressions. An M-expression is either

- A function application. For eg, f[2; EMPTY] is an M-expression that denotes application of function f over S-expressions of atomic symbols 2 and EMPTY.
- A function definition with grammar:
*fname[x0, …, xn] = expr*, and - A conditional expression: [expr1a -> expr1b; …; exprna -> exprnb ]

Using M-expressions and primitive S-functions, we can define a subst function to substitute An S-expression (x) for an atomic symbol (y) in another expression (z) as:

```
subst[x;y;z] = [
atom[z] -> [
eq[z;y] -> x;
T -> z
]
T -> cons [subst[x; y; car[z]];
subst[x; y; cdr[z]]]
]
```

This example demonstrates that while M-expressions are control structures (functions & conditional expressions), S-expressions represent data to be manipulated by M-expressions.

The definition (semantics) of Lisp are given as an interepreter
program in Lisp itself. McCarthy’s list interpreter, implemented as
mutually recursive *eval* and *apply* functions, is often cited as
an example of metacircular evaluation - defining the evaluation
semantics of embedded language in terms of evaluation semantics of
host-language, which happens to be same as embedded languge.
Interestingly, McCarthy defines Lisp as a dynamically scoped language
(Paul Graham
says so,
but is it really? Reynolds’s 2nd interpreter, which he says is similar
to McCarthy’s, is lexically scoped.).

In order to use metacircular lisp interpreter to interpret a lisp
program, the program first has to be converted to data. In other
words, all M-expressions need to be translated to S-expressions (Note:
this means that M-expressions are only a syntactic sugar). A simple
algorithm to perform the same has been described in the paper. The
algorithm converts all function names and their arguments in the user
lisp program (which are meta-variables) to atomic symbols. The
interpreter (*eval* function, to be specific) maintains an environment
which defines bindings for these atomic symbols. However, note that
the original program may also contain atomic symbols, and these
symbols have no attached interpretation. To distinguish between atomic
symbols that are present in the user program, and atomic symbols that
are introduced while convering M-expressions to S-expressions, the
former set of atomic symbols are quoted.

As mentioned previously, Lisp is the first language that abstracted the underlying machine completely. In order to be able to do that, the run-time should be able to manage scarce machine resources, such as memory. Towards this end, Lisp introduced automated memory reclamation, which is now commonly known as “Garbage Collection”.

The Lisp paper defines semantics of Lisp using a metacircular interpreter written in Lisp itself. The definitional interpreters paper by Reynolds exposes subtle aspects of such definitional interpreters. In particular, it shows how semantics of the embedded language (called the defined language) can depend on semantics of the host language (called the defining language), such as its treatment of higher-order functions, and the order of function application . To demonstrate this point, the paper first constructs a metacircular interpreter for a higher-order (defined) language in another higher-order (defining) language, by simply translating an expression (eg: function application) in defined language as the corresponding expression in defining language. The paper makes two observations about this interpreter:

- The nature of higher-order functions is not clear.
- The order of application is not explicit; the order of evaluating function applications in defined language depends on whether the defining language is call-by-value or call-by-name.

Subsequently, Reynolds proposes two transformations to make the treatment of higher-order functions, and order of function applications explicit in the semantics of defined language:

- Defunctionalization: A function in defined language is not a function in defining language. Instead, functions and function closures are represented explicitly as data structures (For eg, as S-expressions in Lisp interpreter), and the semantics of a function application is defined explicitly, as a transformation over data. Defunctionalization effectively lets us write a definitional interpreter for a higher-order language in a first-order language.
- CPS transformation: A continuation is a function that explicitly represents “rest of the computation”. At every atomic step of evaluation in a definitional interpreter, if we explicitly describe what “rest of the evaluation” is, then the semantics of defined language no longer depends on order of evaluation of defining language; the order of evaluation is now explicit in the interpreter. This kind of interpreter is said to be written in continuation passing style (CPS).

The paper also describes how imperative features, such as assignments, can be encoded explictly in a definitional interpreter written in applicative style.

Continue readingMy intention in writing this note is to understand the relation between conventional model of distributed systems that they usually teach in the distributed systems course and the the distributed web services hosting replicated datatypes. Fault tolerance is a concern in the former, and it is studied separately from communication link failures. On the other hand, partition tolerance is a concern in the later, and it looks like this means tolerance to communication failures. Consistency and availability are major concerns in distributed web services, whereas they don’t figure anywhere in the formal study of distributed systems. What is the connection? How can we express CAP theorem in conventional models?

The conventional model of a distributed system is that of a replicated
state machine (automaton), where the automaton models a local
computation that *reacts* to external messages. This makes it clear
that the focus in conventional model is on how to perform a replicated
computation rather than on how to maintain replicated data.

The above image is taken from Reliable Distributed Systems textbook.

This view is also corroborated by Eric Brewer in his PODC’2000 talk, where he conjectured CAP. Here are some exerpts from his (slides):

Inktomi builds distributed systems ... but very little use of classic DS research. "Distributed Systems" don't work Persistent state is hard. Classic DS focuses on computation, not data. This is WRONG; computation is the easy part.

The semantics of a distributed system that is intented to maintain replicated data is best captured in Burckhardt et al’s POPL’14 paper. Since we are familiar with the model, I am not going to reproduce it here.

In the conventional model, process is the unit of communication and the the unit of failure. It is assumed that there is a one-to-one communication link between every pair of processes in the system. Conventional model allows for processes to fail. The widely used model of process failure is crash-stop, where a process crashes and stops, and therefore becomes inaccessible. A communication link may also fail, but this failure shows up as a process failure - processes at both ends of a link preceive each other as failed processes. However, other processes may still view them as correct processes, and the conventional model allows for this disparity in processes’ view of what other processes have failed.

Replicated data stores are usually geo-distributed, so there do not
exist one-to-one links between processes. Processes are organized into
subnets, which are often connected through a single communication
link. Therefore, in distributed web services model, a communication
link is considered a unit of failure. Failure of a link can cause
network to be *paritioned* into disconnected subnets, and web-services
are often required to tolerate these partitions. For most purposes, a
process failure can also be modeled as network partitioning by simply
assuming that the failed process is in its own parition. In summary,
in distributed webservices model, faults occur in communication links
thereby leading to network partitions, and fault tolerance effectively
means partition tolerance.

Both conventional and current models of a distributed system are asynchronous models - they do not make any assumptions about time bounds on communication delays or (relative) process speeds. In both the models, we rely on logical clocks (eg: vector clocks) and causal ordering to understand the behaviour of the system with respect to passage of time.

Ideally, a distributed web-service has to be consistent in the sense
that it should behave the same as if it is operating on an atomic data
object. Any operation performed on the data object should see the
effects of all previous operations. For example, consider a
web-service for a simple read/write register. The operations allowed
are *write* and *read*, which write a value to the register and read
the current value in the register, respectively. Let us assume that
the client performs two successful writes of values 20 and 40 to the
register. Then a subsequent *read* should return 40, failing which the
system is deemed inconsistent. This kind of consistency guarantee is
called *strong consistency* or *sequential consistency*. Under this
consistency guarantee, there must exist a total order on all
operations such that each operation looks as if it were completed at a
single instance.

Along with being consistent, a distributed web-service needs to be available - every request made to the web-service should be met with a response, given that the web-service remains accessible on the network. To be available, no non-failing node in the distributed system implementing the web-service should wait infinitely for an event to occur before responding to a client request.

CAP theorem says that it is impossible to guarantee (strong) consistency and availability in a system that needs to be partition tolerant. First, let us see if this makes sense intuitively. Consider the read/write register web-service described previously. Assume that it is being implemented by a distributed system with two geo-distributed nodes, each holding a replica of the register. Consider a client which makes following requests to the web-service in the order shown:

```
1. write 20
2. write 40
3. read
```

The order is called session order. Let us tag each request with is
serial number in session order. Assume that `write 20`

goes to first
node, which immediately writes 20 to the local register and
simlutatenously forwards the request to second node. Now, should the
first node wait for the acknowledgement from second node before
responding to the client? Given that we are operating in an
asynchronous environment and there is no time bound on the delivery of
acknowledgement, the wait time for first node could potentially be
infinite. This leads to the violation of the availability guarantee.
Therefore, the node should respond immediately to the client, and use
timeouts and retransmissions to eventually propagate client request to
the second node.

Consider a scenario where second write request (`write 40`

) also goes
to first node, but the `read`

request goes to the second node. Assume
that by the time `read`

request was made to second node, it already
received the `write 20`

request forwarded by the first node. Looking
at their serial numbers (1 for `write 20`

and 3 for `read`

), the
second node knows that there is a request made by the client before
the `read`

that it has not yet received. This missing request
could be a `write`

(it is indeed a `write`

in this case). Therefore,
if the node has to respond with a correct value for the `read`

request
, it has to wait until it receives the missing request, in which case
it might have to wait infinitely as network may get partioned in
meantime. This violates availability. The other option is to respond
with the current value of the register without waiting for the second
request to be delivered. In this case, the client reads 20 instead of
the expected 40, which means that the system is no longer (strongly)
consistent. Therefore, it is impossible for our service to be both
available and consistent in presence of network partitions.

The conventional model accounts for network partitioning through process failures - A faulty process that stops communicating with rest of the processes in the system effectively leads to network partitioning. Fault tolerance is the ability of a system to behave in well-defined manner once faults occur. There has been an extensive research on fault tolerance in conventional models, but does this research include a study of consistency and availability properties in presence of faults?

The answer is yes. They were studied under broad categories of safety
and liveness properties, respectively. A safety property of a system
is usually expressed as a set of legal system configurations, commonly
referred to as an *invariant*. To be safe, a system has to always
remain in the set of safe states as defined by the safety property.
Consistency is a safety property as it restricts the set of observable
states of the system. For the read/write register example, consistency
dictates that the observable state of the register after the first two
writes is the value 40. Any other state is inconsistent or unsafe.

On the other hand, a liveness property claims that some good thing will
eventually happen during system execution. Liveness properties are
*eventuality* properties - a traffic signal should *eventually* allow
every car waiting at an intersection to pass through. Availability is
a liveness property, as it requires every request to be eventually met
with a response. In literature on fault tolerance, availability
property is more commonly referred to as *guaranteed service*.

Recall that fault tolerance is the ability of a system to behave in well-defined manner once faults occur. In his landmark survey paper on fault tolerance in distributed systems, Gartner identifies four forms of fault tolerance based on which among safety (S) and liveness (L) properties hold when faults occur in the system. The four are listed below:

- Masking fault tolerance: when both S and L hold in presence of faults
- Fail-safe: when only S holds
- Non-masking: when only L holds
- None (or fault intolerant): when none of them hold.

Gartner’s paper was published in 1999, before CAP was conjectured, and before eventually consistent data stores proliferated. The paper reflects the kind of applications that researchers had in mind when studying distributed systems. These applications are quite different from replicated data stores, which explains the disparity between conventional models and current systems. Below, I reproduce some of the relevant verbatim from the paper, which is suggestive of this gulf:

fail-safe fault tolerance and is preferable to non-masking whenever safety is much more important than liveness. An example is the ground control system of the Ariane 5 space missile project [Dega 1996]. The system was masking fault tolerance for a single component failure, but was also designed to stop in a safe state whenever two successive component failures occurred [Dega 1996]. For the latter type of faults, the launch of the missile (liveness) was less important than the protection of its pre- cious cargo and launch site (safety). .... In effect (of non-masking fault tolerance), the user may experience a certain amount of incorrect system behavior (i.e., failures). For example, a calculation result will be wrong or areplication variable may not be up to date.... Research has traditionally focused on forms of fault tolerance that continuously ensure safety. This can be attributed to the fact thatin most fault-tolerance applications, safety is much more important than liveness. ... For a long time nonmasking fault tolerance has been the “ugly duckling” in the field, asapplication scenarios for this type of fault tolerance are not readily visible. However, the potential of non-masking fault tolerance lies in the fact that it is strictly weaker than masking fault tolerance, and can therefore be used in cases where masking fault tolerance is too costly to implement or evenprovably impossible. ... (Talking about self-stabilizing programs, which are pre-cursors of eventually consistent programs) Examples show that such programs are quite difficult to construct and verify [Theel and Gärtner 1998]. Also, theirnonmasking nature has inhibited them from yet becoming practically relevant. ...

In summary, the paper says that masking fault tolerance, where both safety and liveness is preserved in presence of faults is “strictest and most costly” form of fault tolerance, and that ensuring such tolerence is a “major area of research”. Instead, fail-safe fault tolerance is preferable for most practical applications.

Now that we are aware of Gartner’s categorization of fault tolerance, we can state the CAP theorem simply as:

```
It is impossible to have masking fault tolerance in an unreliable
distributed system
```

In the words of Gilbert and Lynch, who gave the first the proof of the theorem:

```
The CAP Theorem, in this light, is simply one example of the
fundamental fact that you cannot achieve both safety
and liveness in an unreliable distributed system
- From "Perspectives on the CAP Theorem"
```

It should therefore be noted that oft reproduced formulation of the CAP theorem as “pick any two among Consistency, Availability and Partition Tolerance” is misleading at best. A better formulation is:

```
A distributed system that is network partition tolerant cannot be
consistent and available at the same time.
```

So, it is more about picking one among the two rather than picking two among the three.

Bailis et al’s paper on *Highly Available Transactions: Virtues and
Limitations*, classifies operations on a replicated data store as
unavailable, sticky-available and available. An operation, such as a
*write* on read/write register, that has no requirements on
consistency is classified as available. This is expected, as *write*
can be applied to any replica without any need to wait for an event,
and the client can be informed of success/failure of write

Consider the following definition of Nat-indexed Vector GADT in OCaml and Haskell: OCaml:

Haskell:

Observe that quantified type variables in the type of constructors are generalized by the type-constructor signature. Infact, this is not always true. For example, consider the GADT for higher-order abstract syntax:

The type scheme of *App* contains a type variable (*a*) that is not
generalized by the type-constructor signature.

This type is treated as existential type when an expression of type
`Exp b`

is destructed and matched agains `App`

.

Therefore, the correct observation is this: All type variables generalized by the signature of type constructor are also generalized in type schemes of its value constructors.

Here is the nat-indexed vector inductive datatype in Coq:

The above definition should be read as following: `vec`

is a `Type`

indexed by a `Type`

and a `nat`

. Here are the only ways to construct
values of `Vec X N`

, where X could be any `Type`

and N could be any
`nat`

:

- For any
`Type`

*A*,`vnil A`

is a value of`Vec A O`

- For any
`Type`

*A*and for any`nat`

*n*,`vcons A n`

accepts a value of type*A*, a value of type`vec A n`

, and generates a value of type`Vec A (S n)`

Notice that it looks quite similar to Haskell definition, with type
variables explicated. An empty list in explicitly typed haskell is
`VNil Int`

, whereas in Coq, it is `vnil int`

. However, unlike Haskell,
where there is term-type-kind hierarcy, Coq has full dependent types
with infinite hierarchy of universes. One consequence of this is that we
will frequently see arrows mapping members of one universe to members
of a different universe. For example, if we `Print vec.`

, this is what
we see:

Evidently, `vec`

maps a member of Type(59) universe and member of Set
universe to a type universe at higher level. Compare this to the type
of `Vec`

type-constructor in haskell `* -> * -> *`

, which simply says
that when *a* is a type, and *n* is a type (i.e., *a* and *n* have
kind `*`

), then `Vec a n`

is a type (i.e., it has kind `*`

).

In the inductive definition of `vec`

, observe that both constructors
construct values of type `vec A _`

. Therefore, we can as well
generalize *A* at the type of type-constructor (here, this is a mere
syntactic convenience):

The above definition should be read as following: `vec`

is a `Type`

indexed by a `Type`

*A* and a `nat`

. Here are the only ways to construct
values of `Vec A N`

, where N could be any `nat`

…

Another consequence of having full dependent types in Coq is that the
type of inductive type-constructors can be dependent as well. For eg,
here is the inductive definition of `sig`

datatype:

`sig T P`

is the type of values of type *T* that satisfy proposition
*P*. `sig`

is a function on type, and propositions on *that* type.
Observe that the type of second argument to type-constructor `sig`

depends on the value of its first argument.

How did we arrive at that type definition? Let us say we want to
define a refinement type for values of `nat`

satisfying a predicate
*P*. Effectively, we would like to index `nat`

with a predicate *P*
over natural numbers. Since *P* is a predicate, `P : nat -> Prop`

. We
define a new `Type`

*natST* (read “nat, such that”) that is indexed by
`P : nat -> Prop`

:

What should be its constructors? There has to be only one constructor
that, given a nat (*n*), and a proof that it satisfies the predicate
(i.e., proof of `P n`

), returns evidence for `natST P`

. In other
words, we have an evidence for `natST P`

if and only if we have a nat,
and the evidence that it satisfies the predicate. Destructing a ```
natST
P
```

value would produce a nat and a proof that it satisfies *P*.

Type `natST`

helps us easily write type for division operation:

Following is to set the implicit arguments option in Coq.

When the mode for automatic declaration of implicit arguments is on,
the default is to automatically set implicit only the strict implicit
arguments (eg: tyarg of `Cons`

is strict implicit, whereas that of
`Nil`

is contextual implicit).

By default, Coq does not automatically set implicit the contextual implicit arguments. To tell Coq to infer also contextual implicit argument, use command

Setting contextual implcit too may not infer ty annot for Nil. Why? How does coq know whether we want to partially applied Nil or fully apply it? Setting Maximal Implicit works, but that would prevent us from writing certain partial function applications. Therefore, we explciity set implicit args for Nil:

The level is for precedence. Assume for instance that we want conjunction to bind more than disjunction. This is expressed by assigning a precedence level to each notation, knowing that a lower level binds more than a higher level. Hence the level for disjunction must be higher than the level for conjunction. 100 is the highest level. There is a special 200 level higher than 100.

Since connectives are the less tight articulation points of a text, it is reasonable to choose levels not so far from the higher level which is 100, for example 85 for disjunction and 80 for conjunction

Omega is a decision procedure for linear arithmetic (also known as Presburger arithmetic). It can be used to discharge any silly goals on nats and integers like:

Omega can deal with nats. It converts nats to integers. Following operations on nat are recognized by omega:

- Predicates, such as =, le, lt, gt, ge
- Functions, such as +, minus, mult, pred, S, O

Omega needs to be imported:

`apply H`

tries to unify the current goal with conclusion of H. If it
succeeds, it generates a sub-goal for each assumption (argument) of H.
For instance, in the following context:

using `apply Union_introl`

, where

unifies U with A, B with s1, C with (Union A s2 s3), and x with x to generate following sub-goal:

which is already an assumption.

The tactic `eapply`

does the same, except that instead of failing due
to inability to instantiate variables, it replaces them with
existentials. When you have existential in you context, destruct it to
skolemize. The ?6201 style variables in goal when you use eapply are
actually existential variables.

The auto tactic solves goals that are solvable by any combination of

- intros, and
- apply.

By default, auto only does `apply`

on local hypotheses. To add to the
global hint database of auto:

One can optionally declare a hint database using the command ```
Create
HintDb
```

.

To see hints applicable in the current context, and list of all hints, respectively:

Using auto tactic with database of hints named sets:

You can extened hint DB for one application of auto with `using`

keyword. To extend with `Union_introl`

:

To extend with all constructors of `Union`

:

More avatars of auto:

`trivial`

tactic - non recursive auto. depth =0.`autounfold`

and`autorewrite`

to use`Hint Unfold`

and`Hint Rewrite`

hints in the hint database.

Completely manual proof for theorem union_is_associative:

Proof with auto:

Even simpler auto proof:

SF automation chapter (sfauto)says this on `auto`

:

Note that proof search tactics never perform any rewriting step (tactics rewrite, subst), nor any case analysis on an arbitrary data structure or predicate (tactics destruct and inversion), nor any proof by induction (tactic induction). So, proof search is really intended to automate the final steps from the various branches of a proof. It is not able to discover the overall structure of a proof.

Solve tactical takes a list of tactics, and tries them left to right, stopping when a tactic is successful. If none of the tactics are successful, the solve fails.

The difference between `try solve [e1 | e2]' and `

try e1; try e2` is
the short-cut behaviour of solve.

`Intuition t`

applies propositional decision procedure to simplify the
goal, and then applies tactic `t`

on the resultant.

`intuition fail`

succeeds if the goal is a propositional tautology. Fails otherwise. This is equivalent to`tauto`

tactic.- Simply using
`intuition`

is equivalent to using`intuition auto with \*`

.

- To instantiate a universally quantified hypothesis H on variable n,
just do (H n). Eg:
`destruct (H n).`

- When there is ambiguity about which term
`rewrite`

rewrites when there are multiple rewritable terms in the goal, assert the required hypothesis as a separate assertion, and prove it (using apply). - It is futile to define strong lemmas with conjunction in consequent.
Not only that applying them is difficult, but also that
`auto`

only tries such lemmas when current goal is a conjunction at top level. Instead, define two separate lemmas.

- Universes chapter in CPDT.

Newer
Page: 2 of 2