Introduction to (Quantitative?) (Relational?) Semantics

Guy McCusker, University of Bath

Denotational semantics

\[ \def\sem#1{{[\![#1]\!]}} \]

Denotational semantics is the study of programming languages or proof systems via their models.

The idea is to define an interpretation function that takes a program or proof to an element of the model: \[ \sem{-}: \mathcal{L} \rightarrow \mathcal{M} \]

Why do this?

Understanding programs via models

Programs are typically defined via an inductive grammar and perhaps a type system.

This tells you which things are programs.

But it doesn’t tell you what properties they have: we don’t just want to know what they are but what they are like.

For example: do lambda-terms behave like functions? Can we explain the way in which they are functions? A model can.

Plan

  • Denotational semantics of typed λ-calculus using sets and functions
  • Relational semantics, as introduced by Scott
  • Multiset-relational semantics
  • Quantitative (weighted) relational semantics

Typed lambda-terms as functions

Consider a typed lambda-calculus. Types are given by \[ A, B ::= \gamma \mid A \rightarrow B \]

Terms are given by the grammar \[ M ::= x \mid \lambda x^A. M | MM \]

We can define typing rules:

Gamma proves that x:T if x:T is part of Gamma

if Gamma, x:T1 proves M:T'2 then Gamma proves lambda x.M:T1=>T2

if Gamma proves M:T1=>T2 and N:T1, then Gamma proves MN:T2

We can then define α-equivalence, β-reduction, etc.

A simple model

Let us set up a model.

For each ground type \(\gamma\), choose a set \(\sem{\gamma}\).

We plan to show that λ-terms behave like functions so we will define \[ \sem{A \rightarrow B} = \{f \mid f:\sem{A}\rightarrow \sem{B}~\text{is a function}\} \]

We write \(X \Rightarrow Y\) for the set of functions from \(X\) to \(Y\); so this is \(\sem{A}\Rightarrow\sem{B}\).

For each well-typed term \(x_1:A_1, \ldots, x_n:A_n \vdash M:B\) we will define its meaning: a function \[ \sem{M} :\sem{A_1} \times \cdots \times \sem{A_n} \rightarrow \sem{B} \]

Idea: a term with free variables gives us a function whose inputs are the meanings of its variables.

We will define this function compositionally. That is, the meaning of a term will be given by a semantic operation applied to the meanings of its subterms:

\begin{eqnarray*} \sem{\Gamma \vdash x} & = & \text{something}\\ \sem{\Gamma \vdash \lambda x.M:A\rightarrow B} & = & \Lambda(\sem{\Gamma, x:A\vdash M:B}) \\ \sem{\Gamma \vdash MN} & = & @ (\sem{\Gamma\vdash M}, \sem{\Gamma\vdash N}) \\ \end{eqnarray*}

So far we seem to have achieved nothing… But if we can complete our plan, we will have shown:

  • every λ-term describes a function
  • the meaning of the type \(A \rightarrow B\) is the set of functions from \(\sem{A}\) to \(\sem{B}\)
  • not only does every term have a meaning, every term former has a meaning (compositionality)

And we can do more: stay tuned

Completing the semantics

  • \(\sem{x_1:A_1, \ldots, x_n:A_n \vdash x_i:A_i} = \pi_i\), the \(i\)-th projection
  • Given \(f: A\times B \rightarrow C\), define \(\Lambda f: A \rightarrow (B \Rightarrow C)\) by

\[ \Lambda f(a) = g:B \rightarrow C~\text{where}~g(b) = f(a,b) \]

  • Given \(f:A \rightarrow (B \Rightarrow C)\) and \(g:A \rightarrow B\), define

\[ @(f,g)(a) = f(a)g(a). \]

Note that if \(f\) does not depend on \(a\) then the clause for application reduces to function composition.

What about reduction?

We have shown that lambda-terms can be used as notation for functions. But we can show that the λ-calculus is consistent with this idea:

Soundness for β-reduction says that \[ \text{if}~M \rightarrow_\beta N~\text{then}~\sem{M} = \sem{N}. \] This is fairly easy to show for our semantics. The proof hinges on understanding the meaning of substitutions \(M[N/x]\).

Composition is substitution

Suppose we have \[ y:A \vdash N:B \quad\quad x:B \vdash M:C \]

The semantics gives us meanings \[ \sem{N}: \sem{A} \rightarrow \sem{B} \quad\quad \sem{M}:\sem{B}\rightarrow \sem{C} \]

\(\sem{x:B \vdash M:C}\) is a function whose input is meant to represent the meaning of the variable \(x\). So if we supply \(\sem{N}\) as the input \[ \sem{M} \circ \sem{N} : \sem{A} \rightarrow \sem{C} \] we should get the result of replacing \(x\) with \(N\): \[ \sem{M[N/x]} \]

This can be proved by induction on the structure of the term \(M\). Most of the hard work comes from the fact that capture-avoiding substitution is tricky.

Note that composition of functions is a lot less tricky!

To show that β-reduction is modelled soundly, we argue:

  • application is interpreted by (a variant of) composition
  • composition is substitution
  • therefore application becomes substitution: \(\sem{(\lambda x.M)N} = \sem{M[N/x]}\).

(A lot more detail is required.)

What about completeness?

We know that if \(M =_\beta N\) then \(\sem{M} = \sem{N}\).

What about the converse? If terms have equal meanings, are they equal in the lambda-calculus?

This is a much harder problem! But there are answers:

Theorem [Friedman 1975] Let \(\Gamma \vdash M:\alpha\) and \(\Gamma\vdash N:\alpha\) be two terms and let \(\sem{M}\) and \(\sem{N}\) be their denotations in the model of sets and functions, where \(\sem{\gamma} = \mathbf{N}\), the set of natural numbers

Then \(\sem{M} = \sem{N}\) if and only if \(M =_{\beta\eta} N\).

Scott’s Graph Model

In the late 1960s, Scott wanted to model untyped lambda-calculus.

When every term has its meaning in the same set \(D\), the approach we’ve taken means you want to be able to view a function \(D \rightarrow D\) as an element of \(D\).

This is impossible: for any set with more than one element, the set \(D \Rightarrow D\) is strictly larger than \(D\).

Scott found more than one solution to this problem and set off a programme of research which is still alive today. We’ll look at one of his solutions.

From functions to relations

Scott realised that functions can be viewed as collections of input-output entries.

More generally, computations can be viewed as collections of sub-computations: computing \[ f(0) + f(1) \] involves:

  • evaluating \(f\) at 0
  • evaluating \(f\) at 1
  • adding the answers

If we know that \(f\) contains \((0,3)\) and \((1,9)\) then we know this computation returns \(12\).

We are going to model this as something like a pair \[ ( \{ (\{0\},3),(\{1\},9)\}, 12 ) \]

Let’s see how.

Big idea: getting a single piece of information out should require only a finite amount of input-information.

So we can model lambda-terms using relations of the form \[ \mathcal{P}_f A \rightarrow B \]

A term of type \(B\) with no free variables would be a set of elements of \(B\).

Modelling untyped terms

We can use the set of natural numbers to give every term the same (semantic) type.

When the types \(A\) and \(B\) are both the set \(\mathbf{N}\), we can encode any pair \[ (S,n) \] as a natural number.

Then a relation (= set of such pairs) is encoded as a set of natural numbers – which is the same thing that models a term!

So we achieve the desired embedding of functions on terms into terms

Composition in the relational model

We’ll look at how these relations compose. First we recall relational composition.

Given relations \(f: A\rightarrow B\) and \(g:B \rightarrow C\) we can define \[ g \circ f = \{ (a,c) \mid \exists (a,b) \in f, (b,c) \in g\} \]

This directly extends function composition.

But this does not let us compose the interpretations we have in mind!

Instead we have relations of the form: \[ f : \mathcal{P}_f A \rightarrow B \quad\quad g : \mathcal{P}_f B \rightarrow C \] Note the type mismatch in the middle.

But we can try composing like this: \(g \circ f\) is \[ \{ (S_1 \cup \ldots \cup S_n,c) \mid (S_i,b_i) \in f, (\{b_1,\ldots,b_n\},c) \in g \} \]

graphcomp.png

\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \]

Reworking our semantics

We can (almost) now make a model using these ideas:

  • types are interpreted as sets
  • the type \(A \rightarrow B\) is interpreted as the set of relations from \(\mathcal{P}_f \sem{A}\) to \(\sem{B}\).
  • we can define the meaning of application using the composition idea above (details skipped!)

Problem

This almost works as stated. However, there is a small problem.

The enumeration \(\{b_1,\ldots,b_n\}\) is not unique because we could repeat elements without changing anything.

This means that composition with a relation like \(\{ (\{a\},a) \mid a \in A\}\) — which might be the interpretation of \(x:A \vdash x:A\) — does not behave as we want.

This can be fixed (Scott fixed it!); instead, we’ll just try something slightly different.

From powersets to multisets

The issue above arises because \(\mathcal{P}_f A\) does not count the multiplicity of its members.

Replacing \(\mathcal{P}_f\) by \(\mathcal{M}_f\) fixes it immediately.

\( \mathcal{M}_f A = \) finite multisets drawn from \(A\): “subsets with multiplicities”.

We get a model in which a term \[ x_1:A_1, \ldots, x_n:A_n \vdash M:B \] is interpreted as a set of tuples \[ (S_1, \ldots, S_n, b) \] where:

  • \(b\) is an “output event”
  • each \(S_i\) is a multiset of “input events” that the term needs to obtain from its variables, in order to produce \(b\).

Examples

Assuming we have a type of Booleans: a term \[ x:B, y:B \vdash x \wedge y:B \] would include tuples

\begin{array}{ccc} ( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}) \\ ([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}) \\ ([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}) \\ \end{array}

And the term \(x:B \vdash x \wedge x:B\) would include

\begin{array}{cc} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}) \\ ([\mathbf{false},\mathbf{true}], & \mathbf{false}) \\ ([\mathbf{false},\mathbf{false}], & \mathbf{false}) \\ \end{array}

Notice that the second and third lines are equal: the same entry appears for more than one reason. This will be interesting later.

We can now distinguish between the “full evaluation” semantics of \(\wedge\) and a “short-circuit” semantics that gives \(\mathbf{false}\) as soon as it sees a \(\mathbf{false}\) input:

In that semantics the term \(x:B \vdash x \wedge x:B\) would include

\begin{array}{ccl} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}) &\text{now only once!}\\ ([\mathbf{false}], & \mathbf{false}) \\ \end{array}

Notice also that the model handles nondeterminism.

No deterministic program can be substituted for \(x\) and deliver \([\mathbf{false},\mathbf{true}]\).

So the model is flexible (good!) but perhaps too large (bad?).

History

This model seems to have been known since the late 1980s, where it arose as an early model of linear logic.

Girard wrote about a slightly more sophisticated version of this model — which eliminates the nondeterminism using coherence spaces — in 1995.

The linear logic point of view splits our definition of composition into two:

  • first “promote” \(f:\mathcal{M}_f A \rightarrow B\) to a relation \[ f^!: \mathcal{M}_f A \rightarrow \mathcal{M}_f B \]
  • then compose this with \(g: \mathcal{M}_f B \rightarrow C\) using ordinary relational composition.

From relations to weighted relations

We can view relations as matrices:

  • a relation from \(A\) to \(B\) is a matrix indexed by \(A\) and \(B\), with entries in the Booleans.
  • the entry at \((a,b)\) is \(\mathbf{true}\) if \((a,b)\) is in the relation
  • relational composition is matrix multiplication!

\[ (a,c) \in g \circ f \Leftrightarrow \exists b\in B. (a,b)\in f \wedge (b,c) \in g \]

\[ (g\circ f)_{(a,c)} = \sum_{b \in B} f_{(a,b)} \times g_{(b,c)} \] if we read the sum as logical disjunction and product as conjunction.

Idea: generalise from the Booleans

More generally we can construct this model using matrices over any structure that lets us compute the required sums and products.

For example: the natural numbers.

(To handle infinite sets, which may require infinite sums, we add an \(\infty\) element)

Now our model can tell us

  • what input-output computations a term can perform; but also
  • how many ways the term can perform a computation: the weight of its entry in the matrix

Composition

wgraphcomp.png

\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \] This entry contributes weight \(w \times \prod_{i=1}^n w_i\).

The full composition weighs an entry this with the sum of all possible ways to achieve this: \((g \circ f)_{(T,c)}\) is given by \[ \sum_{T = S_1 \cup \cdots \cup S_n} \sum_{b_1,\ldots,b_n} g_{([b_1,\ldots,b_n],c)}\prod_i f_{(S_i,b_i)} \]

Example, again

In this model, the term \[ x:B, y:B \vdash x \wedge y:B \] has weights as follows:

\begin{array}{cccc} ( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}): & 1 \\ ([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}): & 1 \\ ([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}): & 1 \\ ([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}):& 1 \\ \end{array}

But when we identify \(x\) and \(y\) to get the term \(x:B \vdash x \wedge x:B\), we have

\begin{array}{ccc} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}): & 1 \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}):& 2 \\ ([\mathbf{false},\mathbf{false}], & \mathbf{false}):& 1 \\ \end{array}

Varieties of models

This turns out to be a relatively simple algebraic generalisation of the relational model.

With it, we obtain a range of models that let us calculate interesting facts about lambda-terms:

  • using \(\mathbf{N}\) with \(\times\) and \(+\): how many paths are there?
  • using \(\mathbf{N}\) with \(+\) and \(\min\): what is the length of the shortest path?
  • using \(\mathbf{R}\) with \(\times\) and \(+\): what is the probability of termination?
  • … and so on.

History

  • These ideas were first explained by Lamarche in 1992.
  • They cropped up again in Ehrhard and Regnier’s work on differential linear logic and lambda-calculus around ten years later
  • Ten more years later, we (Laird, Manzonetto, Pagani + M) showed how the ideas could extend to model programming languages with recursion
  • And the probabilistic model turned out to be complete (Ehrhard, Tasson, Pagani)

Summary

  • Denotational semantics models programs compositionally
  • Models can be built from sets and functions
  • … or from sets and relations
  • Quite good quantitative models are available
  • And there is a lot more out there