Skip to content

Commit

Permalink
[ doc ] Added a chapter on the build system to the documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
attila-lendvai committed Apr 13, 2022
1 parent d26cf53 commit e7a1f9c
Show file tree
Hide file tree
Showing 3 changed files with 200 additions and 11 deletions.
25 changes: 14 additions & 11 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,29 +52,32 @@ Make sure that:
You can build from pre-built Chez Scheme source, as long as you have Chez Scheme
installed (or, alternatively, Racket). To do this, enter one of the following:

- `make bootstrap SCHEME=chez`
- `make bootstrap-racket`
- `make bootstrap CHEZ=chez-scheme`
- `make bootstrap IDRIS2_CG=racket RACKET_RACO=raco`

`chez` is the executable name of the Chez Scheme compiler. You may need to
But it is recommended to edit `config.mk` to set the preferences permanently.

`chez-scheme` is the executable name of the Chez Scheme compiler. You may need to
replace this with the executable for Chez Scheme on your system. This could be
`scheme`, `chezscheme` or `chezscheme9.5` or something else, depending on your
system and the Chez Scheme version.

This builds an Idris 2 compiler from scheme code output from a working Idris 2
compiler (which isn't necessarily up to date, but is up to date enough to
build the current repository). It then rebuilds using the result, and runs
the tests.
This builds an Idris 2 compiler from the scheme code output made by a
working Idris 2 compiler previously (which isn't necessarily up to
date, but is up to date enough to build the current repository). It
then rebuilds using the result, and runs the tests.

If all is well, to install, type:

- `make install`

### 3: Installing with an existing Idris 2

If you have the latest *released* version of Idris 2
(0.4.0 at the time of writing) installed:
If you have the latest *released* version of Idris 2 (0.5.1 at the
time of writing) installed, then you can omit the variable as it
defaults to `idris2` in the path:

- `make all`
- `make all BOOTSTRAP_IDRIS=/path/to/idris2-v0.4.0-binary`
- `make install`

### 4: (Optional) Installing Idris 2 library documentation
Expand All @@ -90,7 +93,7 @@ that everything has worked correctly. Assuming that `idris2` is in your

- `make clean` -- to make sure you're building everything with the new version
- `make all && make install` -- OR
`make all IDRIS2_BOOT='idris2 --codegen racket' && make install`
`make all BOOTSTRAP_IDRIS='idris2 --codegen racket' && make install`
if using Racket.

### 6: Running tests
Expand Down
185 changes: 185 additions & 0 deletions docs/source/implementation/build-system.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
*************************
The Build System of Idris
*************************

Introduction
------------

Compiling a compiler that is written in its own language (i.e. that is
*self-hosted*) is a somewhat trickier business than compiling other
applications. A `strange loop
<https://en.wikipedia.org/wiki/Strange_loop/>`_ forms when the
application that we are feeding in for the compiler is some version of
the compiler itself.

This process is called *bootstrapping*.

When we want to evolve our language/compiler, e.g. by adding a new language
feature, then we need to first implement it without assuming its existence, so
that the executable of our *bootstrap host* -- which of course has no clue about
the new feature yet --, can compile our sources. Once the new feature is working
(i.e. it is available as an executable binary, that we may call the *binary
seed*), then we can move to the next iteration of our language/compiler, and
start using the new feature in the next *evolutionary stage* of the language. At
this point we can even use the new feature in the implementation of the very
feature itself; our new language feature got fully *bootstrapped*. If you want
to explore this idea, then you may want to read the classic `Reflections on
Trusting Trust
<https://web.archive.org/web/20080731145744/http://cm.bell-labs.com/who/ken/trust.html/>`_
paper by Ken Thompson.

Idris compiles itself in the following consecutive stages, where each
stage is compiled by the previous stage:

* **stage 0** -- this is version ``n-1`` of the language/compiler. It
is either provided by setting the ``BOOTSTRAP_IDRIS`` variable, or
otherwise ``STAGE0_GIT_REF`` will be checked out under
``build/stage0/src-[git ref]``, and built using ``make
bootstrap``. Only cleaned by ``make distclean``.

* **stage 1** -- version ``n`` of the sources compiled by version
``n-1``. This stage is not automatically rebuilt, only after a
``make clean``. The ``make bootstrap`` target compiles the stored
build output of this stage (i.e. compile the checked in
``idris2.ss`` file using Chez) to produce an executable,
i.e. doesn't need a working Idris executable. Let's call this trick
a *bootstrap shortcut* (see `Bootstrap shortcuts`_ for details).

* **stage 2** -- these are the fully bootstrapped build artifacts, i.e.
the output of version ``n`` compiling its own source. This target is
forced (i.e. always rebuilt when invoked).

* **stage 3** -- this is an optional stage that mostly only needs to
get built when we want to verify that the output of our compiler is
reproducible, i.e. it repeatedly produces the same output for the
same input (this can be done with the ``make check-reproducibility``
makefile target).

Workflow
--------

TL;DR:

1) ``git clone``
2) edit ``config.mk``
3) ``make all``
4) ``make test``

(i.e. an external Idris executable is not necessary).

In normal, everyday development it is only stage 2 that gets rebuilt,
and used as a driver for running the tests.

When ``STAGE0_GIT_REF`` is empty (which should be the default in the
repo), then ``make all`` (or simply ``make``) will take a bootstrap
shortcut.

If you want to **make an evolutionary step** of the language
(i.e. regenerate the stage 1 build artifacts, aka the *bootstrap
shortcut*), then you will need a working stage 0. This is not strictly
necessary, because stage 1 can also be produced by using a
bootstrapped stage 1 to compile stage 2, and then copying the stage 2
build artifacts using ``make prepare-bootstrap-files``. But that way
you wouldn't notice if version ``n-1`` was not able to compile version
``n`` anymore, i.e. when some change broke the bootstrappability
chain.

You can make an evolutionary step in two similar ways:

1) by setting the ``BOOTSTRAP_IDRIS`` variable to point to a working
Idris binary, or

2) by setting the ``STAGE0_GIT_REF`` variable in your local makefile
to point to a git tag or branch. Doing so will instruct the
makefile to check out, bootstrap, and use that git ref as stage 0,
i.e. use it as bootstrap host to regenerate the stage 1 build
artifacts.

Once stage 2 is built, you can use the ``make
prepare-bootstrap-files`` to prepare them, and add them to the git
index, i.e. ready to be committed.

``make clean`` does not delete ``build/stage0`` (only ``make
distclean`` does), because the bootstrap compiler is not affected by
any changes made to *our* sources.

``make clean`` only deletes stage 1 when either ``STAGE0_GIT_REF`` or
``BOOTSTRAP_IDRIS`` is set, becuase stage 1 cannot be regenerated
without a valid stage 0, and thus in those configurations stage 1 gets
bootstrapped from the checked in bootstrap shortcut files.

Language evolution vs. releases
-------------------------------

Having made a new evolutionary step in the language doesn't
necessarily mean that we also want to stamp its implementation as a
public release. Therefore, we branch off every new evolutionary stage
into an ``idris.[consecutive stage number]`` git branch, and use git
tags independently for stamping public releases. The consequtive
branches are the evolutionary stages of the language that can
bootstrap each other in a row. The git release tags are the versions
that we want to disseminate to our users.

Note on nomenclature: an *evolutionary stage* is different from a
*stage*: the former is a new version of the language, the latter is a
new stage of the compiler compiling itself. Suggestions for a better
nomenclature are welcome.

Assorted benefits of this setup:

* we can go back and patch older versions and keep them bootstrappable
all the way back until Idris v1.3.4 that is written in Haskell.

* this way it's `bootstrappable <http://bootstrappable.org//>`_!

* enables experimenting with other ways to bootstrap Idris than the
original Haskell implementation; e.g. someone may venture into
writing a subset of Idris in another language that is good enough to
compile Idris proper. If we store this in a branch, then we can
retain a shorter path to bootstrappability than all the way back
from GHC.

Implementation
--------------

Each stage is built and installed in a directory under
``build/stage[n]``, and can be built by ``make sage0`` .. ``make
stage3``.

The version of stage 0 is selected by the ``STAGE0_GIT_REF`` variable,
and can point to any git ref, including branches.

All build output is produced under ``build/`` (for now the exceptions
are ``IdrisPaths.idr`` and the output of the tests).

The makefile is split into two parts:

* ``Makefile.one-stage`` contains the common parts that build a stage.

* ``Makefile`` is the main makefile that manages the chaining of the
build stages, among other things by invoking (calling) the
``Makefile.one-stage`` sub-makefile with varying parameters, and
potentially multiple times.

Bootstrap shortcuts
-------------------

If we need version ``n-1`` to build us, and if ``n-1`` needs ``n-2``,
then we quickly find ourselves all the way back at Idris v1.3.4, the
last version in the now mostly abandoned v1 line of Idris, that was
written in Haskell, and thus can be compiled using GHC.

If we want to avoid this, then we can make a *bootstrap shortcut*: if
we check in e.g. the output of the *chez* backend (a Scheme file) into
the Idris repo, then all we need to produce a working Idris binary is
a working ``chez-scheme`` binary to compile it for us.

The makefile is set up so that you can simply ``git add`` the ``.ss``
and ``.rkt`` backend output files (hint: ``make
prepare-bootstrap-files``), and then ``git commit`` them into the
repo. The ``make bootstrap`` target then ``git checkout build/`` the
bootstrap shortcut files, and produces the stage 1 executable by
simply compiling the intermediate output of one of the CG backends
using the ``chez-scheme`` binary, or Racket's ``raco`` binary (as
opposed to invoking Idris version ``n-1``). The CG to be used for the
bootstrap is selected by the usual ``IDRIS2_CG`` variable.
1 change: 1 addition & 0 deletions docs/source/implementation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ debugging and future contributions.

overview
ide-protocol
build-system

0 comments on commit e7a1f9c

Please # to comment.