## Introduction to (Quantitative?) (Relational?) Semantics

Guy McCusker, University of Bath

## Denotational semantics

$\def\sem#1{{[\![#1]\!]}}$

Denotational semantics is the study of programming languages or proof systems via their models.

The idea is to define an interpretation function that takes a program or proof to an element of the model: $\sem{-}: \mathcal{L} \rightarrow \mathcal{M}$

Why do this?

### Understanding programs via models

Programs are typically defined via an inductive grammar and perhaps a type system.

This tells you which things are programs.

But it doesn’t tell you what properties they have: we don’t just want to know what they are but what they are like.

For example: do lambda-terms behave like functions? Can we explain the way in which they are functions? A model can.

### Plan

• Denotational semantics of typed λ-calculus using sets and functions
• Relational semantics, as introduced by Scott
• Multiset-relational semantics
• Quantitative (weighted) relational semantics

## Typed lambda-terms as functions

Consider a typed lambda-calculus. Types are given by $A, B ::= \gamma \mid A \rightarrow B$

Terms are given by the grammar $M ::= x \mid \lambda x^A. M | MM$

We can define typing rules:

We can then define α-equivalence, β-reduction, etc.

### A simple model

Let us set up a model.

For each ground type $$\gamma$$, choose a set $$\sem{\gamma}$$.

We plan to show that λ-terms behave like functions so we will define $\sem{A \rightarrow B} = \{f \mid f:\sem{A}\rightarrow \sem{B}~\text{is a function}\}$

We write $$X \Rightarrow Y$$ for the set of functions from $$X$$ to $$Y$$; so this is $$\sem{A}\Rightarrow\sem{B}$$.

For each well-typed term $$x_1:A_1, \ldots, x_n:A_n \vdash M:B$$ we will define its meaning: a function $\sem{M} :\sem{A_1} \times \cdots \times \sem{A_n} \rightarrow \sem{B}$

Idea: a term with free variables gives us a function whose inputs are the meanings of its variables.

We will define this function compositionally. That is, the meaning of a term will be given by a semantic operation applied to the meanings of its subterms:

\begin{eqnarray*} \sem{\Gamma \vdash x} & = & \text{something}\\ \sem{\Gamma \vdash \lambda x.M:A\rightarrow B} & = & \Lambda(\sem{\Gamma, x:A\vdash M:B}) \\ \sem{\Gamma \vdash MN} & = & @ (\sem{\Gamma\vdash M}, \sem{\Gamma\vdash N}) \\ \end{eqnarray*}

So far we seem to have achieved nothing… But if we can complete our plan, we will have shown:

• every λ-term describes a function
• the meaning of the type $$A \rightarrow B$$ is the set of functions from $$\sem{A}$$ to $$\sem{B}$$
• not only does every term have a meaning, every term former has a meaning (compositionality)

And we can do more: stay tuned

### Completing the semantics

• $$\sem{x_1:A_1, \ldots, x_n:A_n \vdash x_i:A_i} = \pi_i$$, the $$i$$-th projection
• Given $$f: A\times B \rightarrow C$$, define $$\Lambda f: A \rightarrow (B \Rightarrow C)$$ by

$\Lambda f(a) = g:B \rightarrow C~\text{where}~g(b) = f(a,b)$

• Given $$f:A \rightarrow (B \Rightarrow C)$$ and $$g:A \rightarrow B$$, define

$@(f,g)(a) = f(a)g(a).$

Note that if $$f$$ does not depend on $$a$$ then the clause for application reduces to function composition.

We have shown that lambda-terms can be used as notation for functions. But we can show that the λ-calculus is consistent with this idea:

Soundness for β-reduction says that $\text{if}~M \rightarrow_\beta N~\text{then}~\sem{M} = \sem{N}.$ This is fairly easy to show for our semantics. The proof hinges on understanding the meaning of substitutions $$M[N/x]$$.

### Composition is substitution

Suppose we have $y:A \vdash N:B \quad\quad x:B \vdash M:C$

The semantics gives us meanings $\sem{N}: \sem{A} \rightarrow \sem{B} \quad\quad \sem{M}:\sem{B}\rightarrow \sem{C}$

$$\sem{x:B \vdash M:C}$$ is a function whose input is meant to represent the meaning of the variable $$x$$. So if we supply $$\sem{N}$$ as the input $\sem{M} \circ \sem{N} : \sem{A} \rightarrow \sem{C}$ we should get the result of replacing $$x$$ with $$N$$: $\sem{M[N/x]}$

This can be proved by induction on the structure of the term $$M$$. Most of the hard work comes from the fact that capture-avoiding substitution is tricky.

Note that composition of functions is a lot less tricky!

To show that β-reduction is modelled soundly, we argue:

• application is interpreted by (a variant of) composition
• composition is substitution
• therefore application becomes substitution: $$\sem{(\lambda x.M)N} = \sem{M[N/x]}$$.

(A lot more detail is required.)

We know that if $$M =_\beta N$$ then $$\sem{M} = \sem{N}$$.

What about the converse? If terms have equal meanings, are they equal in the lambda-calculus?

This is a much harder problem! But there are answers:

Theorem [Friedman 1975] Let $$\Gamma \vdash M:\alpha$$ and $$\Gamma\vdash N:\alpha$$ be two terms and let $$\sem{M}$$ and $$\sem{N}$$ be their denotations in the model of sets and functions, where $$\sem{\gamma} = \mathbf{N}$$, the set of natural numbers

Then $$\sem{M} = \sem{N}$$ if and only if $$M =_{\beta\eta} N$$.

## Scott’s Graph Model

In the late 1960s, Scott wanted to model untyped lambda-calculus.

When every term has its meaning in the same set $$D$$, the approach we’ve taken means you want to be able to view a function $$D \rightarrow D$$ as an element of $$D$$.

This is impossible: for any set with more than one element, the set $$D \Rightarrow D$$ is strictly larger than $$D$$.

Scott found more than one solution to this problem and set off a programme of research which is still alive today. We’ll look at one of his solutions.

### From functions to relations

Scott realised that functions can be viewed as collections of input-output entries.

More generally, computations can be viewed as collections of sub-computations: computing $f(0) + f(1)$ involves:

• evaluating $$f$$ at 0
• evaluating $$f$$ at 1

If we know that $$f$$ contains $$(0,3)$$ and $$(1,9)$$ then we know this computation returns $$12$$.

We are going to model this as something like a pair $( \{ (\{0\},3),(\{1\},9)\}, 12 )$

Let’s see how.

Big idea: getting a single piece of information out should require only a finite amount of input-information.

So we can model lambda-terms using relations of the form $\mathcal{P}_f A \rightarrow B$

A term of type $$B$$ with no free variables would be a set of elements of $$B$$.

### Modelling untyped terms

We can use the set of natural numbers to give every term the same (semantic) type.

When the types $$A$$ and $$B$$ are both the set $$\mathbf{N}$$, we can encode any pair $(S,n)$ as a natural number.

Then a relation (= set of such pairs) is encoded as a set of natural numbers – which is the same thing that models a term!

So we achieve the desired embedding of functions on terms into terms

### Composition in the relational model

We’ll look at how these relations compose. First we recall relational composition.

Given relations $$f: A\rightarrow B$$ and $$g:B \rightarrow C$$ we can define $g \circ f = \{ (a,c) \mid \exists (a,b) \in f, (b,c) \in g\}$

This directly extends function composition.

But this does not let us compose the interpretations we have in mind!

Instead we have relations of the form: $f : \mathcal{P}_f A \rightarrow B \quad\quad g : \mathcal{P}_f B \rightarrow C$ Note the type mismatch in the middle.

But we can try composing like this: $$g \circ f$$ is $\{ (S_1 \cup \ldots \cup S_n,c) \mid (S_i,b_i) \in f, (\{b_1,\ldots,b_n\},c) \in g \}$

$(S_1 \cup \cdots \cup S_n, c) \in g \circ f$

### Reworking our semantics

We can (almost) now make a model using these ideas:

• types are interpreted as sets
• the type $$A \rightarrow B$$ is interpreted as the set of relations from $$\mathcal{P}_f \sem{A}$$ to $$\sem{B}$$.
• we can define the meaning of application using the composition idea above (details skipped!)

### Problem

This almost works as stated. However, there is a small problem.

The enumeration $$\{b_1,\ldots,b_n\}$$ is not unique because we could repeat elements without changing anything.

This means that composition with a relation like $$\{ (\{a\},a) \mid a \in A\}$$ — which might be the interpretation of $$x:A \vdash x:A$$ — does not behave as we want.

This can be fixed (Scott fixed it!); instead, we’ll just try something slightly different.

## From powersets to multisets

The issue above arises because $$\mathcal{P}_f A$$ does not count the multiplicity of its members.

Replacing $$\mathcal{P}_f$$ by $$\mathcal{M}_f$$ fixes it immediately.

$$\mathcal{M}_f A =$$ finite multisets drawn from $$A$$: “subsets with multiplicities”.

We get a model in which a term $x_1:A_1, \ldots, x_n:A_n \vdash M:B$ is interpreted as a set of tuples $(S_1, \ldots, S_n, b)$ where:

• $$b$$ is an “output event”
• each $$S_i$$ is a multiset of “input events” that the term needs to obtain from its variables, in order to produce $$b$$.

### Examples

Assuming we have a type of Booleans: a term $x:B, y:B \vdash x \wedge y:B$ would include tuples

\begin{array}{ccc} ( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}) \\ ([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}) \\ ([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}) \\ \end{array}

And the term $$x:B \vdash x \wedge x:B$$ would include

\begin{array}{cc} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}) \\ ([\mathbf{false},\mathbf{true}], & \mathbf{false}) \\ ([\mathbf{false},\mathbf{false}], & \mathbf{false}) \\ \end{array}

Notice that the second and third lines are equal: the same entry appears for more than one reason. This will be interesting later.

We can now distinguish between the “full evaluation” semantics of $$\wedge$$ and a “short-circuit” semantics that gives $$\mathbf{false}$$ as soon as it sees a $$\mathbf{false}$$ input:

In that semantics the term $$x:B \vdash x \wedge x:B$$ would include

\begin{array}{ccl} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}) \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}) &\text{now only once!}\\ ([\mathbf{false}], & \mathbf{false}) \\ \end{array}

Notice also that the model handles nondeterminism.

No deterministic program can be substituted for $$x$$ and deliver $$[\mathbf{false},\mathbf{true}]$$.

So the model is flexible (good!) but perhaps too large (bad?).

### History

This model seems to have been known since the late 1980s, where it arose as an early model of linear logic.

Girard wrote about a slightly more sophisticated version of this model — which eliminates the nondeterminism using coherence spaces — in 1995.

The linear logic point of view splits our definition of composition into two:

• first “promote” $$f:\mathcal{M}_f A \rightarrow B$$ to a relation $f^!: \mathcal{M}_f A \rightarrow \mathcal{M}_f B$
• then compose this with $$g: \mathcal{M}_f B \rightarrow C$$ using ordinary relational composition.

## From relations to weighted relations

We can view relations as matrices:

• a relation from $$A$$ to $$B$$ is a matrix indexed by $$A$$ and $$B$$, with entries in the Booleans.
• the entry at $$(a,b)$$ is $$\mathbf{true}$$ if $$(a,b)$$ is in the relation
• relational composition is matrix multiplication!

$(a,c) \in g \circ f \Leftrightarrow \exists b\in B. (a,b)\in f \wedge (b,c) \in g$

$(g\circ f)_{(a,c)} = \sum_{b \in B} f_{(a,b)} \times g_{(b,c)}$ if we read the sum as logical disjunction and product as conjunction.

### Idea: generalise from the Booleans

More generally we can construct this model using matrices over any structure that lets us compute the required sums and products.

For example: the natural numbers.

(To handle infinite sets, which may require infinite sums, we add an $$\infty$$ element)

Now our model can tell us

• what input-output computations a term can perform; but also
• how many ways the term can perform a computation: the weight of its entry in the matrix

### Composition

$(S_1 \cup \cdots \cup S_n, c) \in g \circ f$ This entry contributes weight $$w \times \prod_{i=1}^n w_i$$.

The full composition weighs an entry this with the sum of all possible ways to achieve this: $$(g \circ f)_{(T,c)}$$ is given by $\sum_{T = S_1 \cup \cdots \cup S_n} \sum_{b_1,\ldots,b_n} g_{([b_1,\ldots,b_n],c)}\prod_i f_{(S_i,b_i)}$

### Example, again

In this model, the term $x:B, y:B \vdash x \wedge y:B$ has weights as follows:

\begin{array}{cccc} ( [\mathbf{true}], & [\mathbf{true}], & \mathbf{true}): & 1 \\ ([\mathbf{true}], & [\mathbf{false}], & \mathbf{false}): & 1 \\ ([\mathbf{false}], & [\mathbf{true}], & \mathbf{false}): & 1 \\ ([\mathbf{false}], & [\mathbf{false}], & \mathbf{false}):& 1 \\ \end{array}

But when we identify $$x$$ and $$y$$ to get the term $$x:B \vdash x \wedge x:B$$, we have

\begin{array}{ccc} ( [\mathbf{true},\mathbf{true}], & \mathbf{true}): & 1 \\ ([\mathbf{true},\mathbf{false}], & \mathbf{false}):& 2 \\ ([\mathbf{false},\mathbf{false}], & \mathbf{false}):& 1 \\ \end{array}

### Varieties of models

This turns out to be a relatively simple algebraic generalisation of the relational model.

With it, we obtain a range of models that let us calculate interesting facts about lambda-terms:

• using $$\mathbf{N}$$ with $$\times$$ and $$+$$: how many paths are there?
• using $$\mathbf{N}$$ with $$+$$ and $$\min$$: what is the length of the shortest path?
• using $$\mathbf{R}$$ with $$\times$$ and $$+$$: what is the probability of termination?
• … and so on.

### History

• These ideas were first explained by Lamarche in 1992.
• They cropped up again in Ehrhard and Regnier’s work on differential linear logic and lambda-calculus around ten years later
• Ten more years later, we (Laird, Manzonetto, Pagani + M) showed how the ideas could extend to model programming languages with recursion
• And the probabilistic model turned out to be complete (Ehrhard, Tasson, Pagani)

## Summary

• Denotational semantics models programs compositionally
• Models can be built from sets and functions
• … or from sets and relations
• Quite good quantitative models are available
• And there is a lot more out there