Guy McCusker, University of Bath
\[ \def\sem#1{{[\![#1]\!]}} \]
Denotational semantics is the study of programming languages or proof systems via their models.
The idea is to define an interpretation function that takes a program or proof to an element of the model: \[ \sem{-}: \mathcal{L} \rightarrow \mathcal{M} \]
Why do this?
Programs are typically defined via an inductive grammar and perhaps a type system.
This tells you which things are programs.
But it doesn’t tell you what properties they have: we don’t just want to know what they are but what they are like.
For example: do lambda-terms behave like functions? Can we explain the way in which they are functions? A model can.
Consider a typed lambda-calculus. Types are given by \[ A, B ::= \gamma \mid A \rightarrow B \]
Terms are given by the grammar \[ M ::= x \mid \lambda x^A. M | MM \]
We can define typing rules:
We can then define α-equivalence, β-reduction, etc.
Let us set up a model.
For each ground type \(\gamma\), choose a set \(\sem{\gamma}\).
We plan to show that λ-terms behave like functions so we will define \[ \sem{A \rightarrow B} = \{f \mid f:\sem{A}\rightarrow \sem{B}~\text{is a function}\} \]
We write \(X \Rightarrow Y\) for the set of functions from \(X\) to \(Y\); so this is \(\sem{A}\Rightarrow\sem{B}\).
For each well-typed term \(x_1:A_1, \ldots, x_n:A_n \vdash M:B\) we will define its meaning: a function \[ \sem{M} :\sem{A_1} \times \cdots \times \sem{A_n} \rightarrow \sem{B} \]
Idea: a term with free variables gives us a function whose inputs are the meanings of its variables.
We will define this function compositionally. That is, the meaning of a term will be given by a semantic operation applied to the meanings of its subterms:
So far we seem to have achieved nothing… But if we can complete our plan, we will have shown:
And we can do more: stay tuned
\[ \Lambda f(a) = g:B \rightarrow C~\text{where}~g(b) = f(a,b) \]
\[ @(f,g)(a) = f(a)g(a). \]
Note that if \(f\) does not depend on \(a\) then the clause for application reduces to function composition.
We have shown that lambda-terms can be used as notation for functions. But we can show that the λ-calculus is consistent with this idea:
Soundness for β-reduction says that \[ \text{if}~M \rightarrow_\beta N~\text{then}~\sem{M} = \sem{N}. \] This is fairly easy to show for our semantics. The proof hinges on understanding the meaning of substitutions \(M[N/x]\).
Suppose we have \[ y:A \vdash N:B \quad\quad x:B \vdash M:C \]
The semantics gives us meanings \[ \sem{N}: \sem{A} \rightarrow \sem{B} \quad\quad \sem{M}:\sem{B}\rightarrow \sem{C} \]
\(\sem{x:B \vdash M:C}\) is a function whose input is meant to represent the meaning of the variable \(x\). So if we supply \(\sem{N}\) as the input \[ \sem{M} \circ \sem{N} : \sem{A} \rightarrow \sem{C} \] we should get the result of replacing \(x\) with \(N\): \[ \sem{M[N/x]} \]
This can be proved by induction on the structure of the term \(M\). Most of the hard work comes from the fact that capture-avoiding substitution is tricky.
Note that composition of functions is a lot less tricky!
To show that β-reduction is modelled soundly, we argue:
(A lot more detail is required.)
We know that if \(M =_\beta N\) then \(\sem{M} = \sem{N}\).
What about the converse? If terms have equal meanings, are they equal in the lambda-calculus?
This is a much harder problem! But there are answers:
Theorem [Friedman 1975] Let \(\Gamma \vdash M:\alpha\) and \(\Gamma\vdash N:\alpha\) be two terms and let \(\sem{M}\) and \(\sem{N}\) be their denotations in the model of sets and functions, where \(\sem{\gamma} = \mathbf{N}\), the set of natural numbers
Then \(\sem{M} = \sem{N}\) if and only if \(M =_{\beta\eta} N\).
In the late 1960s, Scott wanted to model untyped lambda-calculus.
When every term has its meaning in the same set \(D\), the approach we’ve taken means you want to be able to view a function \(D \rightarrow D\) as an element of \(D\).
This is impossible: for any set with more than one element, the set \(D \Rightarrow D\) is strictly larger than \(D\).
Scott found more than one solution to this problem and set off a programme of research which is still alive today. We’ll look at one of his solutions.
Scott realised that functions can be viewed as collections of input-output entries.
More generally, computations can be viewed as collections of sub-computations: computing \[ f(0) + f(1) \] involves:
If we know that \(f\) contains \((0,3)\) and \((1,9)\) then we know this computation returns \(12\).
We are going to model this as something like a pair \[ ( \{ (\{0\},3),(\{1\},9)\}, 12 ) \]
Let’s see how.
Big idea: getting a single piece of information out should require only a finite amount of input-information.
So we can model lambda-terms using relations of the form \[ \mathcal{P}_f A \rightarrow B \]
A term of type \(B\) with no free variables would be a set of elements of \(B\).
We can use the set of natural numbers to give every term the same (semantic) type.
When the types \(A\) and \(B\) are both the set \(\mathbf{N}\), we can encode any pair \[ (S,n) \] as a natural number.
Then a relation (= set of such pairs) is encoded as a set of natural numbers – which is the same thing that models a term!
So we achieve the desired embedding of functions on terms into terms
We’ll look at how these relations compose. First we recall relational composition.
Given relations \(f: A\rightarrow B\) and \(g:B \rightarrow C\) we can define \[ g \circ f = \{ (a,c) \mid \exists (a,b) \in f, (b,c) \in g\} \]
This directly extends function composition.
But this does not let us compose the interpretations we have in mind!
Instead we have relations of the form: \[ f : \mathcal{P}_f A \rightarrow B \quad\quad g : \mathcal{P}_f B \rightarrow C \] Note the type mismatch in the middle.
But we can try composing like this: \(g \circ f\) is \[ \{ (S_1 \cup \ldots \cup S_n,c) \mid (S_i,b_i) \in f, (\{b_1,\ldots,b_n\},c) \in g \} \]
\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \]
We can (almost) now make a model using these ideas:
This almost works as stated. However, there is a small problem.
The enumeration \(\{b_1,\ldots,b_n\}\) is not unique because we could repeat elements without changing anything.
This means that composition with a relation like \(\{ (\{a\},a) \mid a \in A\}\) — which might be the interpretation of \(x:A \vdash x:A\) — does not behave as we want.
This can be fixed (Scott fixed it!); instead, we’ll just try something slightly different.
The issue above arises because \(\mathcal{P}_f A\) does not count the multiplicity of its members.
Replacing \(\mathcal{P}_f\) by \(\mathcal{M}_f\) fixes it immediately.
\( \mathcal{M}_f A = \) finite multisets drawn from \(A\): “subsets with multiplicities”.
We get a model in which a term \[ x_1:A_1, \ldots, x_n:A_n \vdash M:B \] is interpreted as a set of tuples \[ (S_1, \ldots, S_n, b) \] where:
Assuming we have a type of Booleans: a term \[ x:B, y:B \vdash x \wedge y:B \] would include tuples
And the term \(x:B \vdash x \wedge x:B\) would include
Notice that the second and third lines are equal: the same entry appears for more than one reason. This will be interesting later.
We can now distinguish between the “full evaluation” semantics of \(\wedge\) and a “short-circuit” semantics that gives \(\mathbf{false}\) as soon as it sees a \(\mathbf{false}\) input:
In that semantics the term \(x:B \vdash x \wedge x:B\) would include
Notice also that the model handles nondeterminism.
No deterministic program can be substituted for \(x\) and deliver \([\mathbf{false},\mathbf{true}]\).
So the model is flexible (good!) but perhaps too large (bad?).
This model seems to have been known since the late 1980s, where it arose as an early model of linear logic.
Girard wrote about a slightly more sophisticated version of this model — which eliminates the nondeterminism using coherence spaces — in 1995.
The linear logic point of view splits our definition of composition into two:
We can view relations as matrices:
\[ (a,c) \in g \circ f \Leftrightarrow \exists b\in B. (a,b)\in f \wedge (b,c) \in g \]
\[ (g\circ f)_{(a,c)} = \sum_{b \in B} f_{(a,b)} \times g_{(b,c)} \] if we read the sum as logical disjunction and product as conjunction.
More generally we can construct this model using matrices over any structure that lets us compute the required sums and products.
For example: the natural numbers.
(To handle infinite sets, which may require infinite sums, we add an \(\infty\) element)
Now our model can tell us
\[ (S_1 \cup \cdots \cup S_n, c) \in g \circ f \] This entry contributes weight \(w \times \prod_{i=1}^n w_i\).
The full composition weighs an entry this with the sum of all possible ways to achieve this: \((g \circ f)_{(T,c)}\) is given by \[ \sum_{T = S_1 \cup \cdots \cup S_n} \sum_{b_1,\ldots,b_n} g_{([b_1,\ldots,b_n],c)}\prod_i f_{(S_i,b_i)} \]
In this model, the term \[ x:B, y:B \vdash x \wedge y:B \] has weights as follows:
But when we identify \(x\) and \(y\) to get the term \(x:B \vdash x \wedge x:B\), we have
This turns out to be a relatively simple algebraic generalisation of the relational model.
With it, we obtain a range of models that let us calculate interesting facts about lambda-terms: