This project is a typechecker for the Pat language, introduced in the paper Special Delivery: Programming with Mailbox Types.
This paper extends the mailbox typing discipline introduced by de' Liguoro and Padovani at ECOOP 2018 (https://drops.dagstuhl.de/opus/volltexte/2018/9220/pdf/LIPIcs-ECOOP-2018-15.pdf) to the setting of a programming language.
Several core ideas of our typechecking algorithm originated in the following, both by Luca Padovani:
The type checker for Pat
is developed in OCaml, a general-purpose functional programming language. OCaml can be installed by following these instructions.
The easiest way to install OCaml on macOS is to use the Homebrew package manager. Homebrew can be downloaded and installed by following the instructions on the website. Once Homebrew is installed and configured, the latest OCaml package manager, opam
, can be installed as shown:
$ brew install opam
opam
on Ubuntu Linux can be installed via the apt
package manager.
$ sudo add-apt-repository ppa:avsm/ppa
$ sudo apt update
$ sudo apt install opam
On Ubuntu, one also needs to install libgmp-dev
, which is a development library for the GMP (GNU Multiple Precision Arithmetic) that provides functions for performing arithmetic operations on large numbers. This library is used by the Z3 constraint solver invoked externally by the 'Pat' type checker.
sudo apt install libgmp-dev
The next step is to initialise opam
:
$ opam init
Once the initialisation is completed, the following command updates the current shell environment.
$ eval $(opam env --switch=default)
Pat
can be cloned from this GitHub repository as follows.
$ git clone https://github.com/SimonJF/mbcheck.git
The Pat
type checker uses a number of OCaml libraries that should be installed prior to compiling it.
$ opam install dune menhir ppx_import visitors cmdliner z3 bag
The type-checking tool can then be compiled using make
after installing these dependencies.
$ cd mbcheck
mbcheck$ make
Usage: ./mbcheck <file>
. If the program completes successfully, then the
program is correct.
For more information, you can use the -d
(debug) and -v
(verbose) flags --
although the outputs of these are currently pretty unpleasant :)
Mailbox types characterise communication with mailboxes: incoming message queues with many writers and a single reader.
In short, mailbox types consist of a capability : either ! for output, similar to a PID in Erlang, or ? for input; and a pattern which characterises the
A pattern is a commutative regular expression which denotes the pattern of messages contained within that mailbox (for input capabilities), or the pattern of messages which must be sent (for output capabilities).
There are various patterns:
- 0: the unreliable mailbox (denotes an error)
- 1: the empty mailbox
- M: a message with tag M
- E + F: either pattern E or pattern F
- E . F: pattern E or F in any order
- *E: many instances of pattern E
As a concrete example, the receive endpoint for an empty future can be given the type
?(Put.*Get)
which states that the mailbox contains at most one Put message and many Get messages; this would rule out the invalid behaviour of sending two Put messages.
Pat also supports base types: String, Int, Bool.
In order to avoid some horrible name hygiene issues, Pat uses quasi-linear typing. In essence, this means that mailbox names can be used at most once in a thread as a 'full' name (i.e., being allowed to be returned as part of an expression, stored as part of a data type, etc.), and other times as a 'partial' or 'second-class' name (allowing it to be used as part of an expression -- e.g., sending or passing as an argument).
Furthermore, the full or 'returnable' use must be the final lexical use of the name.
Another point is that when receiving a mailbox name as part of a message, this
can only be treated as usable (and may not, therefore, escape the scope of the
receive
).
These restrictions in turn rule out bad aliasing, use-after-free, and self-deadlocks.
Pat is a concurrent functional language with first-class mailboxes and mailbox
types. Examples can be found in the examples
directory, but here we go over
some of the main points here.
A message consists of a tag (i.e., a message name) and a list of message payloads.
An interface describes the messages that a mailbox can receive.
Note that it does not describe any sequencing or pattern information about the
message. As an example, the interface for the Future
example is as follows:
interface Future { Put(Int), Get(User!) }
interface User { Reply(Int) }
Here, we can see that the Future
interface says that a Future
mailbox can
receive a Put
message (with an Int
payload), and a Get
message (with a
User!
payload), but it does not describe the invariant that the mailbox can
only contain a single Put message, for example. Interfaces are used to provide
type information for the inference pass.
A Pat program consists of a series of definitions followed by a body.
Pat has very limited support for first-class functions, so instead a Pat program typically consists of a series of definitions, which are explicitly annotated with argument and return types, and contain an expression. Definitions must be declared at the top level, and do not close over any values.
The body is a single expression, defined at the end of a file; a definition can be called in the usual way.
Here is a simple Pat program:
def addTwoNumbers(x: Int, y: Int): Int {
x + y
}
addTwoNumbers(10, 15)
This would have type Int.
As a functional language, Pat has immutable data. Therefore, typically a Pat
expression consists of let
bindings for intermediate computation steps:
let x = 5 in
let y = 10 in
x + y
Variables x
and y
are standard.
Pat also includes side-effecting operations (such as sending messages), so we
also allow sequencing e1; e2
where e1
returns the unit value ()
.
The unit value is a value which conveys no information and is often used as a
return value for side-effecting operations.
A mailbox is a message queue which can be written to by many processes, but read by only one. In the actor model, a mailbox is associated with every process, and messages are sent to that particular process ID.
The spawn M
construct spawns a term as a new process. Since the process will
run concurrently, all usages are masked as 2nd-class. For example, whereas the
following will not compile:
guard x : M { receive M() -> free(x) };
x ! M()
(since the first guard
is returnable)
the following is permitted, since the guard
is in a separate thread:
spawn { guard x : M { receive M() -> free(x) } };
x ! M()
The mailbox calculus decouples creation of a mailbox from the process with
which it is associated. For this, we use the new
construct:
let mb = new[InterfaceName] in ...
Here, mb
will have type ?1
, denoting that all messages must have been
received.
To send a message to a mailbox, we use the !
construct, which is similar to
Erlang.
mb ! Foo(10)
Here, we are sending a message with tag Foo
and payload 10
to mailbox mb
.
In this case, as a result of the send, mb
will have type !Foo
. By sending to
a mailbox reference, we create an obligation that the message must be received
by that mailbox.
Erlang has a receive
statement to retrieve messages from a mailbox. The
analogous construct in Pat is guard
, which is slightly generalised in that it
also allows us to free a mailbox that we know isn't being used.
As an example, here is a guard expression on a mailbox for a mailbox with type
(M . N) + 1
(i.e., a mailbox which either has both M
and N
messages, or
does not contain any messages)
interface Test { M(Int), N(Int) }
let mb = new[Test] in
if (rand()) then {
spawn { mb ! M(5) };
spawn { mb ! N(10) }
} else {
()
};
in
guard mb : (M . N) + 1 {
free -> ()
receive M(x) from mb ->
guard mb : N {
receive N[y] from mb -> free(mb)
}
receive N(x) from mb ->
guard mb : M {
receive M[y] from mb -> free(mb)
}
}
Note that in the receive
clauses, the variable after from
is a binding
occurrence.
Here, we specify the mailbox name we're guarding on (in this case, mb
), along
with the pattern it currently contains ((M . N) + 1
). We then add three guard
clauses:
-
free
which handles the case where there are no messages in the mailbox, and the mailbox can be freed. Note thatmb
is not re-bound and so is unusable. -
receive M(x) from mb -> ...
which receives a messageM
from the mailbox, re-binding the mailbox name tomb
with type?N
-
receive N(x) from mb -> ...
which receives a messageN
from the mailbox, re-binding the mailbox name tomb
with type?M
In both of the receive clauses, we need a further receive to clear the other message.
We can also write free(mb)
as syntactic sugar for the more verbose:
guard mb : 1 { free -> () }
- Lexer:
lib/frontend/lexer.mll
- Parser:
lib/frontend/parser.mly
- Desugaring:
lib/frontend/desugar_sugared_guards.ml
andlib/frontend/insert_pattern_variables.ml
- IR conversion:
lib/frontend/sugar_to_ir.ml
- Constraint generation (backwards bidirectional typing):
lib/typecheck/gen_cosntraints.ml
- Constraint solving:
lib/typecheck/solve_constraints.ml
andlib/typecheck/z3_solver.ml
- Examples:
test/examples
- Examples deliberately raising errors:
test/errors