Guy McCusker, University of Bath

\[ \def\sem#1{{[\![#1]\!]}} \]

*Denotational semantics* is the study of programming languages or proof systems via their *models*.

The idea is to define an *interpretation function* that takes a program or proof to an element of the model:
\[
\sem{-}: \mathcal{L} \rightarrow \mathcal{M}
\]

Why do this?

Programs are typically defined via an inductive *grammar* and perhaps a *type system*.

This tells you *which things are programs*.

But it doesn’t tell you what *properties* they have: we don’t just want to know what they *are* but what they are *like*.

For example: do lambda-terms behave like functions? Can we explain the way in which they are functions? A model can.

- Denotational semantics of typed λ-calculus using sets and functions
- Relational semantics, as introduced by Scott
- Multiset-relational semantics
- Quantitative (weighted) relational semantics

Consider a typed lambda-calculus. Types are given by \[ A, B ::= \gamma \mid A \rightarrow B \]

Terms are given by the grammar \[ M ::= x \mid \lambda x^A. M | MM \]

We can define typing rules:

We can then define α-equivalence, β-reduction, etc.

Let us set up a model.

For each ground type \(\gamma\), choose a set \(\sem{\gamma}\).

We plan to show that λ-terms behave like functions so we will define \[ \sem{A \rightarrow B} = \{f \mid f:\sem{A}\rightarrow \sem{B}~\text{is a function}\} \]

We write \(X \Rightarrow Y\) for the set of functions from \(X\) to \(Y\); so this is \(\sem{A}\Rightarrow\sem{B}\).

For each well-typed term \(x_1:A_1, \ldots, x_n:A_n \vdash M:B\) we will define its *meaning*: a function
\[
\sem{M} :\sem{A_1} \times \cdots \times \sem{A_n} \rightarrow \sem{B}
\]

Idea: a term with free variables gives us a *function* whose inputs are the meanings of its *variables*.

We will define this function *compositionally*. That is, the meaning of a term will be given by a *semantic operation* applied to the meanings of its subterms:

\begin{eqnarray*}
\sem{\Gamma \vdash x} & = & \text{something}\\
\sem{\Gamma \vdash \lambda x.M:A\rightarrow B} & = & \Lambda(\sem{\Gamma, x:A\vdash M:B}) \\
\sem{\Gamma \vdash MN} & = & @ (\sem{\Gamma\vdash M}, \sem{\Gamma\vdash N}) \\
\end{eqnarray*}

So far we seem to have achieved nothing… But if we can complete our plan, we will have shown:

- every λ-term describes a function
- the meaning of the type \(A \rightarrow B\) is the set of functions from \(\sem{A}\) to \(\sem{B}\)
- not only does every
*term*have a meaning, every*term former*has a meaning (compositionality)

And we can do more: stay tuned

- \(\sem{x_1:A_1, \ldots, x_n:A_n \vdash x_i:A_i} = \pi_i\), the \(i\)-th projection
- Given \(f: A\times B \rightarrow C\), define \(\Lambda f: A \rightarrow (B \Rightarrow C)\) by

\[ \Lambda f(a) = g:B \rightarrow C~\text{where}~g(b) = f(a,b) \]

- Given \(f:A \rightarrow (B \Rightarrow C)\) and \(g:A \rightarrow B\), define

\[ @(f,g)(a) = f(a)g(a). \]

Note that if \(f\) does not depend on \(a\) then the clause for application reduces to *function composition*.

We have shown that lambda-terms can be used as *notation* for functions. But we can show that the λ-calculus is consistent with this idea:

*Soundness* for β-reduction says that
\[
\text{if}~M \rightarrow_\beta N~\text{then}~\sem{M} = \sem{N}.
\]
This is fairly easy to show for our semantics. The proof hinges on understanding the meaning of substitutions \(M[N/x]\).

Suppose we have \[ y:A \vdash N:B \quad\quad x:B \vdash M:C \]

The semantics gives us *meanings*
\[
\sem{N}: \sem{A} \rightarrow \sem{B} \quad\quad \sem{M}:\sem{B}\rightarrow \sem{C}
\]

\(\sem{x:B \vdash M:C}\) is a function whose input is meant to represent the meaning of the variable \(x\). So if we supply \(\sem{N}\) as the input \[ \sem{M} \circ \sem{N} : \sem{A} \rightarrow \sem{C} \] we should get the result of replacing \(x\) with \(N\): \[ \sem{M[N/x]} \]

This can be proved by induction on the structure of the term \(M\). Most of the hard work comes from the fact that capture-avoiding substitution is tricky.

Note that composition of functions is a lot less tricky!

To show that β-reduction is modelled soundly, we argue:

- application is interpreted by (a variant of) composition
- composition is substitution
- therefore application becomes substitution: \(\sem{(\lambda x.M)N} = \sem{M[N/x]}\).

(A lot more detail is required.)

We know that if \(M =_\beta N\) then \(\sem{M} = \sem{N}\).

What about the converse? If terms have equal meanings, are they equal in the lambda-calculus?

This is a much harder problem! But there are answers:

**Theorem** [Friedman 1975]
Let \(\Gamma \vdash M:\alpha\) and \(\Gamma\vdash N:\alpha\) be two terms
and let \(\sem{M}\) and \(\sem{N}\) be their denotations in
the model of sets and functions, where \(\sem{\gamma} = \mathbf{N}\), the set of natural numbers

Then \(\sem{M} = \sem{N}\) if and only if \(M =_{\beta\eta} N\).

In the late 1960s, Scott wanted to model *untyped lambda-calculus*.

When every term has its meaning in the same set \(D\), the approach we’ve taken means you want
to be able to view a *function* \(D \rightarrow D\) as an element of \(D\).

This is impossible: for any set with more than one element, the set \(D \Rightarrow D\) is strictly larger than \(D\).

Scott found more than one solution to this problem and set off a programme of research which is still alive today. We’ll look at one of his solutions.

Scott realised that functions can be viewed as *collections* of input-output entries.

More generally, computations can be viewed as collections of sub-computations: computing \[ f(0) + f(1) \] involves:

- evaluating \(f\) at 0
- evaluating \(f\) at 1
- adding the answers

If we know that \(f\) contains \((0,3)\) and \((1,9)\) then we know this computation returns \(12\).

We are going to model this as something like a pair \[ ( \{ (\{0\},3),(\{1\},9)\}, 12 ) \]

Let’s see how.

Big idea: getting a single piece of information out should require only a *finite* amount of input-information.

So we can model lambda-terms using *relations* of the form
\[
\mathcal{P}_f A \rightarrow B
\]

A term of type \(B\) with no free variables would be a *set* of elements of \(B\).

We can use the set of natural numbers to give every term the same (semantic) type.

When the types \(A\) and \(B\) are both the set \(\mathbf{N}\), we can *encode* any pair
\[
(S,n)
\]
as a natural number.

Then a relation (= set of such pairs) is encoded as a set of natural numbers – which is the same thing that models a term!

So we achieve the desired embedding of *functions on terms* into *terms*

We’ll look at how these relations compose. First we recall relational composition.

Given relations \(f: A\rightarrow B\) and \(g:B \rightarrow C\) we can define \[ g \circ f = \{ (a,c) \mid \exists (a,b) \in f, (b,c) \in g\} \]

This directly extends function composition.

But this does not let us compose the interpretations we have in mind!

Instead we have relations of the form: \[ f : \mathcal{P}_f A \rightarrow B \quad\quad g : \mathcal{P}_f B \rightarrow C \] Note the type mismatch in the middle.

But we can try composing like this: \(g \circ f\) is \[ \{ (S_1 \cup \ldots \cup S_n,c) \mid (S_i,b_i) \in f, (\{b_1,\ldots,b_n\},c) \in g \} \]

\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \]

We can (almost) now make a model using these ideas:

- types are interpreted as sets
- the type \(A \rightarrow B\) is interpreted as the set of relations from \(\mathcal{P}_f \sem{A}\) to \(\sem{B}\).
- we can define the meaning of application using the composition idea above (details skipped!)

This almost works as stated. However, there is a small problem.

The enumeration \(\{b_1,\ldots,b_n\}\) is not unique because we could repeat elements without changing anything.

This means that composition with a relation like \(\{ (\{a\},a) \mid a \in A\}\) — which might be the interpretation of \(x:A \vdash x:A\) — does not behave as we want.

This can be fixed (Scott fixed it!); instead, we’ll just try something slightly different.

The issue above arises because \(\mathcal{P}_f A\) does not count the *multiplicity* of its members.

Replacing \(\mathcal{P}_f\) by \(\mathcal{M}_f\) fixes it immediately.

\( \mathcal{M}_f A = \) finite multisets drawn from \(A\): “subsets with multiplicities”.

We get a model in which a term \[ x_1:A_1, \ldots, x_n:A_n \vdash M:B \] is interpreted as a set of tuples \[ (S_1, \ldots, S_n, b) \] where:

- \(b\) is an “output event”
- each \(S_i\) is a
*multiset*of “input events” that the term needs to obtain from its variables, in order to produce \(b\).

Assuming we have a type of Booleans: a term \[ x:B, y:B \vdash x \wedge y:B \] would include tuples

\begin{array}{ccc}
( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}) \\
([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}) \\
([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}) \\
([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}) \\
\end{array}

And the term \(x:B \vdash x \wedge x:B\) would include

\begin{array}{cc}
( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\
([\mathbf{true},\mathbf{false}], & \mathbf{false}) \\
([\mathbf{false},\mathbf{true}], & \mathbf{false}) \\
([\mathbf{false},\mathbf{false}], & \mathbf{false}) \\
\end{array}

Notice that the second and third lines are equal: the same entry appears for more than one reason. This will be interesting later.

We can now distinguish between the “full evaluation” semantics of \(\wedge\) and a “short-circuit” semantics that gives \(\mathbf{false}\) as soon as it sees a \(\mathbf{false}\) input:

In that semantics the term \(x:B \vdash x \wedge x:B\) would include

\begin{array}{ccl}
( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\
([\mathbf{true},\mathbf{false}], & \mathbf{false}) &\text{now only once!}\\
([\mathbf{false}], & \mathbf{false}) \\
\end{array}

Notice also that the model handles *nondeterminism*.

No deterministic program can be substituted for \(x\) and deliver \([\mathbf{false},\mathbf{true}]\).

So the model is flexible (good!) but perhaps too large (bad?).

This model seems to have been known since the late 1980s, where it arose as an early model of *linear logic*.

Girard wrote about a slightly more sophisticated version of this model — which eliminates the nondeterminism using *coherence spaces* — in 1995.

The linear logic point of view splits our definition of composition into two:

- first “promote” \(f:\mathcal{M}_f A \rightarrow B\) to a relation \[ f^!: \mathcal{M}_f A \rightarrow \mathcal{M}_f B \]
- then compose this with \(g: \mathcal{M}_f B \rightarrow C\) using ordinary relational composition.

We can view relations as matrices:

- a relation from \(A\) to \(B\) is a matrix indexed by \(A\) and \(B\), with entries in the Booleans.
- the entry at \((a,b)\) is \(\mathbf{true}\) if \((a,b)\) is in the relation
- relational composition is matrix multiplication!

\[ (a,c) \in g \circ f \Leftrightarrow \exists b\in B. (a,b)\in f \wedge (b,c) \in g \]

\[ (g\circ f)_{(a,c)} = \sum_{b \in B} f_{(a,b)} \times g_{(b,c)} \] if we read the sum as logical disjunction and product as conjunction.

More generally we can construct this model using matrices over *any* structure that lets us compute the required sums and products.

For example: the natural numbers.

(To handle infinite sets, which may require infinite sums, we add an \(\infty\) element)

Now our model can tell us

- what input-output computations a term can perform; but also
*how many ways*the term can perform a computation: the*weight*of its entry in the matrix

\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \] This entry contributes weight \(w \times \prod_{i=1}^n w_i\).

The full composition weighs an entry this with the *sum of all possible ways* to achieve this:
\((g \circ f)_{(T,c)}\) is given by
\[
\sum_{T = S_1 \cup \cdots \cup S_n} \sum_{b_1,\ldots,b_n} g_{([b_1,\ldots,b_n],c)}\prod_i f_{(S_i,b_i)}
\]

In this model, the term \[ x:B, y:B \vdash x \wedge y:B \] has weights as follows:

\begin{array}{cccc}
( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}): & 1 \\
([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}): & 1 \\
([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}): & 1 \\
([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}):& 1 \\
\end{array}

But when we identify \(x\) and \(y\) to get the term \(x:B \vdash x \wedge x:B\), we have

\begin{array}{ccc}
( [\mathbf{true},\mathbf{true}], & \mathbf{true}): & 1 \\
([\mathbf{true},\mathbf{false}], & \mathbf{false}):& 2 \\
([\mathbf{false},\mathbf{false}], & \mathbf{false}):& 1 \\
\end{array}

This turns out to be a relatively simple algebraic generalisation of the relational model.

With it, we obtain a range of models that let us calculate interesting facts about lambda-terms:

- using \(\mathbf{N}\) with \(\times\) and \(+\): how many paths are there?
- using \(\mathbf{N}\) with \(+\) and \(\min\): what is the length of the shortest path?
- using \(\mathbf{R}\) with \(\times\) and \(+\): what is the probability of termination?
- … and so on.

- These ideas were first explained by Lamarche in 1992.
- They cropped up again in Ehrhard and Regnier’s work on differential linear logic and lambda-calculus around ten years later
- Ten more years later, we (Laird, Manzonetto, Pagani + M) showed how the ideas could extend to model programming languages with recursion
- And the probabilistic model turned out to be
*complete*(Ehrhard, Tasson, Pagani)

- Denotational semantics models programs
*compositionally* - Models can be built from sets and functions
- … or from sets and relations
- Quite good quantitative models are available
- And there is a
*lot*more out there