Skip to content

Yatima.CodeGen

Arthur Paulino edited this page Mar 1, 2023 · 4 revisions

The code generator is the compiler from Lean 4 to Lurk. It transforms executable Lean 4 code into executable Lurk code.

The general plan for compilation is to build a (big) letrec block that contains all the prerequisites needed to compute the head call from Lean 4. To illustrate how this is done, consider the head call as the function f, which depends on the definitions of a, b and c. This would be the resulting letrec block:

(letrec (
    (a <body of a>)
    (b <body of b>)
    (c <body of c>)
    (f <body of f, which should call a, b and c>))
  f)

If a, b or c have their own prerequisites, they would be prepended in the list of binders on demand. But before the compilation of f even starts, we need to define the sets of "preloads" and "overrides".

Preloads are like a stdlib of Lurk functions to facilitate the implementation of overrides as well as the construction of the binder bodies. This is an example of a preloaded function for pushing an element to the end of a CONS list:

def push : Lean.Name × Expr := (`push, ⟦
  (lambda (xs y) (
    if xs
      (cons (car xs) (push (cdr xs) y))
      (cons y nil)))
⟧)

Overrides are how we (re)define the behavior of some Lean 4 definitions when compiled to Lurk, ignoring their original behaviors. They are used for two reasons:

  1. When functions are marked with the extern tag in Lean, their bodies are erased by the compiler because they're supposed to use custom definitions written in external C code.
  2. When we simply want the function to behave differently in Lurk. This will be essential to make the typechecker, when running in Lurk, pull the right data from the store. More on it later.

As an example, this is an override for Array.push, since Lean's compiler uses low level C code instead:

/-- Warning: this is `O(n)` and extremely inefficient. -/
def Array.push : Override := Override.decl ⟨``Array.push, ⟦
  (lambda (α a v) (push a v))
⟧⟩

Actually compiling

[mention why and how we're using LCNF]

Compiling terms of inductive types

Understanding how terms of inductive types are compiled is necessary in order to understand how constants are hashed in the content-addressing routine.

We use raw lists of strings and numbers to represent them. An example involving Nat should be enough to get the idea across.

  • Nat.zero is compiled to '("Nat" 0)
  • Nat.zero.succ is compiled to '("Nat" 1 ("Nat" 0))
  • Nat.zero.succ.succ is compiled to '("Nat" 1 ("Nat" 1 ("Nat" 0)))

In short, we concatenate the name of the inductive type, the constructor index and the arguments for that constructor.

Compiling declarations