Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[ build ] A major refactor of the build system #1990

Closed

Conversation

attila-lendvai
Copy link
Contributor

@attila-lendvai attila-lendvai commented Oct 11, 2021

what

The main point of this refactor is to:

  1. make it possible to bootstrap Idris all the way up from Haskell, and to integrate this into the normal development workflow. with that, it keeps the language ever bootstrappable as the normal flow of development.
  2. add infrastructure to verify whether the compiler builds itself reproducibly.
  3. and ultimately to integrate this bootstrappable and reproducible build into the Guix package.

The detailed description of this PR is in the form of extending the documentation, please see the relevant commit.

some highlights

  • make release
  • make check-reproducibility
  • make prepare-bootstrap-files
  • eliminates the bootstrap related artifacts (besides a couple of files under build/ checked into git)
  • can transparently checkout and build any selected bootstrap host (i.e. the previous version) under build/
  • runs Idris from an installed dir under build/, i.e. it better exercises the real-life setup

notes

  • Regarding the part in the docs about having the evolutionary stages of the language stored in branches: these are the idris.1..idris.7 branches in my github fork (they don't need to be merged/pushed for this PR to work).

status

short of much encouragement from the maintainers, i have apparently lost interest in this PR.

it's ready for review/merge. i only know about one last issues/TODO:

  • fix on windows (the chez/chez020 test fails to print Idris2, even though the output of idris2 --version seems to be fine, and the value, too. and yet, the Idris2 string doesn't reach stdout. this is most probably just exposed by this PR, not caused by it. possibly related to Let bound expressions are evaluated after failed guard in comprehension #2215 or at least the comments over their contain other examples of the same behavior in the Guix build sandbox)
  • test whether this all works fine with the Guix packages
  • fix install-api and install-with-src-api (IDRIS2_PREFIX is used both as the dir to look for libs, and the destination dir of installation (what is usually called DESTDIR in GNU parlance). work it around using a kludge in the makefile)
  • finish debugging the CI scripts (i don't have much experience with it)
  • resolve the bug due to which IdrisPaths.idr doesn't get incorporated into the final executable, and thus make PREFIX=/new/prefix install installs an executable with the wrong yprefix value (a reproducer: failure to recompile modules, seemingly randomly #2033). but the best way to resolve it would be to stop capturing the install path into the executable. works when rebased on top of [fix #2033] determining when to rebuild modules #2188
  • test/fix on mac
  • test whether the release archive created by the release script can be bootstrapped
  • test the more esoteric targets that are not exercised by the bootstrap
  • update INSTALL.md
  • ability to bootstrap using an external executable
  • deal with the windows .exe extension in the makefile
  • test/finish racket based bootstrap
  • test when the git checkout is in a path that contains a space
  • test bootstrapping when checking it out into a different directory

for later

  • separate 1) the lib path of the standard library (read-only on Nix, Guix), from 2) the lib path of the user-installed libs (which must be read-write)
  • introduce DESTDIR and a CLI arg, or some other way to control the installation path separately from the lib search path

downsides

  • certain pathways of the build now only work in a git checkout and requires the git executable. but it can do everything without git that the old makefile used to be able to do.

cross refs

@attila-lendvai attila-lendvai marked this pull request as draft October 11, 2021 11:13
@attila-lendvai attila-lendvai force-pushed the build-system branch 7 times, most recently from 2b1d222 to b5146c8 Compare October 11, 2021 23:16
@attila-lendvai attila-lendvai mentioned this pull request Oct 11, 2021
@benhormann
Copy link
Contributor

I've been working on switching to raco make: (benhormann:raco-make-plus). It'll help that the filenames created this way are consistent across systems. I'd suggest putting idris2.so in idris2_app/compiled/ to further reduce disparities.


Miscellaneous comments:

Add ".mk" extension to fix syntax highlighting in GitHub.

Should add quotes when doing IDRIS2_PREFIX="$(IDRIS2_PREFIX)".

You can use cmp -s ... instead of diff ... 2>/dev/null. Using the short switches for diff will increase portability.

network depends on contrib, but I wonder how often do people invoke make on libs individually? Can it be combined into simply "libs"?
Otherwise I'd call the individual targets <name>.ipkg to get around the "test" clash. Use $(basename $@) for the directory name.

In theory you could build only the modules imported by the compiler for the first stage(s). I'm not sure if it is worth the complexity, but it would trim some bootstrap time.

Trick to skip bytecode compilation: CHEZ=true idris2 --build idris2.ipkg.


Some snippets

cygpath = $(and $1, $(if $(isWindows), $(shell cygpath -m $2 "$1"), $(abspath $1))
IDRIS2_PATH := $(TARGETDIR)/$(subst $= ,/build/ttc:$(TARGETDIR)/,$(wildcard libs/*))/build/ttc
IDRIS2_PATH := $(call cygpath, $(IDRIS2_PATH), --path)
# Use support files without installing for every stage
IDRIS2_DATA := $(call cygpath, $(TARGETDIR)/support)
IDRIS2_LIBS := $(call cygpath, $(TARGETDIR)/support/c)

You could use space := $() $() where I've used $= ,, but fewer variables is nicer IMO.

#!/bin/sh

BOOTDIR=$(cd -P "$(dirname "$0" && pwd)"

export LD_LIBRARY_PATH=$BOOTPATH/lib:$LD_LIBRARY_PATH
if [ -n "$OS" ] && [ -z "${OS##*_NT*}" ] || [ "$OS" = windows ]; then
    PATH=$BOOTPATH/lib:$PATH
fi

if [ "$IDRIS2_CG" = racket ]; then
    "${RACKET-racket}" "$DIR/bin/compiled/idris2_rkt.zo" "$@"
elif [ -n "$SCHEME" ]; then
    "$SCHEME" --script "$BOOTDIR/bin/idris2.so" "$@"
else
    echo 'idris2.sh - Usage:' >&2
    echo 'SCHEME=<chez-scheme> idris2.sh' >&2
    echo 'IDRIS2_CG=racket idris2.sh' >&2
    exit 1
fi
idris2-paths:
  printf %s\\n \
    "-- @""generated" \
    "module IdrisPaths" "" \
    "export" \
    "idrisVersion : ((Nat,Nat,Nat), String)" \
    "idrisVersion = (($(MAJOR),$(MINOR),$(PATCH)), \"$(VERSION_TAG)\")" "" \
    "export" \
    "yprefix : String" \
    "yprefix = \"$(PREFIX)\"" > src/IdrisPaths.idr

It's better than the old one, just. It'd be nice to shove it in a file instead. Maybe:

IdrisPaths:
    sed \
      "s|([0-9,]*), \"\"|($(MAJOR),$(MINOR),$(PATCH)), \"$(GIT_SHA1)\"|" \
      "s|__PREFIX__|$(IDRIS2_PREFIX)|" \
      IdrisPaths.template > build/paths.idr
    if ! cmp -s build/paths.idr src/IdrisPaths.idr; then \
      mv build/paths.idr src/IdrisPaths.idr; \
    fi

You don't need FORCE if you don't use the file's path for the target.

From my custom.mk — maybe interesting, maybe not:

quick: idris2-exec libs
quick: export IDRIS2_BOOT := $(CURDIR)/build/exec/idris2

run: ; IDRIS2_PREFIX=$(CURDIR)/build IDRIS2_PATH=$(IDRIS2_BOOT_PATH) ./build/exec/idris2 $(ARGS)

libs/*: ; $(MAKE) run ARGS+='--build $@/$(@F).ipkg'
.PHONY: $(wildcard libs/*)

version: run
version: ARGS = --version

mostlyclean: ; $(RM) -r build/ttc libs/*/build/ttc

@attila-lendvai
Copy link
Contributor Author

I've been working on switching to raco make: (benhormann:raco-make-plus). It'll help that the filenames created this way are consistent across systems. I'd suggest putting idris2.so in idris2_app/compiled/ to further reduce disparities.

i was hoping that we could get rid of idris2_app dir entirely by adding a simple logic that looks for the libidris2_support.so file somewhere relative to the executable, and in the standard install paths.

but reading you makes me think... shall i give up that pursuit as not feasible/desirable?

Miscellaneous comments:

Add ".mk" extension to fix syntax highlighting in GitHub.

bah, of course. done. thanks!

Should add quotes when doing IDRIS2_PREFIX="$(IDRIS2_PREFIX)".

you probably have reviewed an earlier state. i ran a phase of testing where the checkout dir is in a parent dir that contained spaces, and i have fixed several issues. this above being one of them.

in the current revision i have undone the change that puts the CG's name in the output file's name as file extension. that was only necessary when i was defining every backend's rule unconditionally. now that i had to add conditional definition of the relevant rules to the makefile, i can manage with the standard file extensions, so i reverted that part to keep it all simpler.

You can use cmp -s ... instead of diff ... 2>/dev/null. Using the short switches for diff will increase portability.

makes sense, i have switched entirely to cmp with the IdrisPaths.idr usage.

re diff in the check-reproducibility: i always preferred --long-names to increase code readability. does the increased portability that you mention have actual practical benefits? IOW, should i forgo the better readability?

network depends on contrib, but I wonder how often do people invoke make on libs individually? Can it be combined into simply "libs"?

i'm keeping an eye on supporting make -j parallelism, and merging the rules serializes their compilation (unless there's some makefile magic to pralellize the shell commands of a rule). i'm not sure about this all, though.

Otherwise I'd call the individual targets .ipkg to get around the "test" clash.

hrm. how about renaming it to golden-tests, or some variation of it? inside it's modules = Test.Golden after all...

any suggestions for a better name for the test lib than test?

either way, i'll do that in a separate commit/PR afterwards.

Use $(basename $@) for the directory name.

i don't get what you are referring to here.

In theory you could build only the modules imported by the compiler for the first stage(s). I'm not sure if it is worth the complexity, but it would trim some bootstrap time.

in a future commit... :) this is extensive enough already.

Trick to skip bytecode compilation: CHEZ=true idris2 --build idris2.ipkg.

oh, nice trick! :)

unfortunately, i'm also trying to get away with the discrepancy between racket and the .so output of chez, and between their different wrapper scripts, and also of the fact that the wrapper script needs SCHEME to be defined. this is part of the in-place bootstrap shortcuts, so that the wrapper script can be the same for both CG's.

when i employ this trick, it gets to the wrong place, i.e. into the shell line:

> compiling build/stage2/idris2_app/idris2.ss with output to build/stage2/idris2_app/idris2
Exception in read: invalid syntax #!true at line 1, char 1 of build/stage2/idris2_app/idris2.ss

i hacked this away with some more sed complexity in the makefile... i'm not too happy about it, this should be resolved in a nicer way, e.g. by a command line switch? maybe passing something through --directive? but just learning idris, so i'm not confident enough to implement/debug such a thing in the codebase.

as for an eventual merging of this PR: @benhormann, do you have the commit bit? if you do, then how about you fetch this PR locally (when i consider it done/tested), and then you fix/reshape anything you see, and then push both commits to main? judging from your comment you seem to be much more experienced with make than i am. i keep finding myself websearching for basic makefile stuff...

@benhormann
Copy link
Contributor

i was hoping that we could get rid of idris2_app dir entirely by adding a simple logic that looks for the libidris2_support.so file somewhere relative to the executable, and in the standard install paths.

I don't think we can, or at least not feasible for dev.
To continue using bytecode, we have to tell scheme / racket where to load the library. There are two options (apart from static linking): install the lib on standard search path or hardcode the path. I don't know if it can be relative to the executable, also symlinking would break it.

The current directory is searched. That's enough to do what you want, right?

i always preferred --long-names to increase code readability. does the increased portability that you mention have actual practical benefits?

Alpine Linux uses BusyBox, then there's BSD and macOS (generally the worst offender). It varies.

invalid syntax #!true at line 1

Ah, you'll could try /usr/bin/true (or /bin/true, or /bin/busybox true, or ...).
I didn't realize that you can add a shebang to idris2_rkt.zo, seems to work.

do you have the commit bit?

No, but I could help tidy it up in your branch.

@attila-lendvai attila-lendvai force-pushed the build-system branch 13 times, most recently from da61358 to 2140982 Compare October 17, 2021 13:59
@benhormann
Copy link
Contributor

Check out benhormann:refactor-make-1, it's more of a refactor and less of a new system. Even if a new system is preferred, there are some decent ideas:

  • Extracts VERSION from the ipkg file.
    • Uses --match=v$(VERSION) so other git tags aren't a problem.
  • Drops testenv, uses IDRIS2_PACKAGE_PATH instead.
  • Tries to support DESTDIR when installing.
    • TODO: adjust nix.
  • Builds a subset of libs for quicker bootstrap.
    • TODO: automate the ipkg files?
  • Checksums the bootstrap src and bin, so you know if either changed.
    • Calls maybe-compile-program, which compares by file mtime.

It's based off other work, so the CI has more Racket jobs. And re-enables tests on Windows.


How about a './configure' script?

A fairly simple script to handle things like OS specifics, Chez vs Racket, etc. It would populate config.mk with every option that stays consistent between builds. The version string would still require invoking git each build. Could skip it for many cases by assuming Linux/BSD defaults. (Could even make it easier to support BSD Make...)

It would avoid having to do things like:
make PREFIX=$HOME/.custom/idris2 && make install PREFIX=$HOME/.custom/idris2
Instead you'd do:
./configure --prefix=$HOME/.custom/idris2 && make && make install

@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Mar 23, 2022

FTR, the Exception: invalid memory reference. Some debugging context lost error is a memory fault coming from Chez. my Windows is running in a VM indeed (boots my partition into QEMU -- we came a long way!), but it's not the memory: a simple ./build/stage1/idris2 --version errors out with this.

there are bug reports to Idris itself that suggests that it may be an Idris compiler bug (see e.g. #1974 or #2366), or i'm miscompiling Chez itself, or compile it in an incompatible way with the C side of Idris.

the CI file suggests that it's all compiled using Mingw on Windows (e.g. it uses pacman, which is clearly not cygwin), but then the wrapper script only handles CYGWIN, and not Mingw (I had to add support to handle the MSYS_NT* case for matching the output of uname -s). which is very confusing. next, i'll try to compile everything in a cygwin environment (update: this fails with missing symbols at link time).

Check out benhormann:refactor-make-1, it's more of a refactor and less of a new system. Even if a new system is preferred, there are some decent ideas:

good stuff! i should incorporate those, but this PR is already turning into a never ending story... :)

as for a ./configure script, i have zero experience creating them, and my general, and admittedly uneducated impression is that they bring more complexity than what they help deal with.

@benhormann
Copy link
Contributor

There is a windows installer for Chez, I'd use that version (pity it is awkward to install for CI).

the CI file suggests that it's all compiled using Mingw on Windows (e.g. it uses pacman, which is clearly not cygwin), but then the wrapper script only handles CYGWIN, and not Mingw (I had to add support to handle the MSYS_NT* case for matching the output of uname -s). which is very confusing. next, i'll try to compile everything in a cygwin environment (update: this fails with missing symbols at link time).

Seems MinGW is the only working option. *_NT* matches all the above.

as for a ./configure script, i have zero experience creating them, and my general, and admittedly uneducated impression is that they bring more complexity than what they help deal with.

I don't mean autoconf, a handwritten shell script will do. Question is if that is OK or not wanted.

@melted
Copy link
Collaborator

melted commented Mar 23, 2022

FTR, the Exception: invalid memory reference. Some debugging context lost error is a memory fault coming from Chez. my Windows is running in a VM indeed (boots my partition into QEMU -- we came a long way!), but it's not the memory: a simple ./build/stage1/idris2 --version errors out with this.

there are bug reports to Idris itself that suggests that it may be an Idris compiler bug (see e.g. #1974 or #2366), or i'm miscompiling Chez itself, or compile it in an incompatible way with the C side of Idris.

the CI file suggests that it's all compiled using Mingw on Windows (e.g. it uses pacman, which is clearly not cygwin), but then the wrapper script only handles CYGWIN, and not Mingw (I had to add support to handle the MSYS_NT* case for matching the output of uname -s). which is very confusing. next, i'll try to compile everything in a cygwin environment (update: this fails with missing symbols at link time).

I don't understand this error, I build Chez on Windows with MSYS2 without any modifications, other than those I have submitted upstream to Chez. It should work as well with ordinary MSYS, as in the one that comes with Git for Windows, but I don't do that.

Oh wait, maybe it's because the CI and I are using the Microsoft compiler to build it. I have built it using GCC in the past. But that was a while ago.

Check out benhormann:refactor-make-1, it's more of a refactor and less of a new system. Even if a new system is preferred, there are some decent ideas:

good stuff! i should incorporate those, but this PR is already turning into a never ending story... :)

as for a ./configure script, i have zero experience creating them, and my general, and admittedly uneducated impression is that they bring more complexity than what they help deal with.

@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Mar 23, 2022

I don't understand this error, I build Chez on Windows with MSYS2 without any modifications, other than those I have submitted upstream to Chez. It should work as well with ordinary MSYS, as in the one that comes with Git for Windows, but I don't do that.

sorry for the confusion: Chez does build/work on both MSYS and Cygwin, but Idris itself doesn't build on Cygwin (fails at link time with a few missing symbol errors).

(Further edited to clarify)

@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Mar 23, 2022

There is a windows installer for Chez, I'd use that version (pity it is awkward to install for CI).

when i try to use the binary install of chez, i get this:

C:/msys64/usr/bin/scheme.exe: error while loading shared libraries: api-ms-win-crt-locale-l1-1-0.dll: cannot open shared object file: No such file or directory

even though find /c/windows/ | grep ... gives me these matches:

/c/Windows/WinSxS/amd64_microsoft-windows-m..namespace-downlevel_31bf3856ad364e35_10.0.19041.1_none_7d55e8342077d456/api-ms-win-crt-locale-l1-1-0.dll
/c/Windows/WinSxS/x86_microsoft-windows-m..namespace-downlevel_31bf3856ad364e35_10.0.19041.1_none_21374cb0681a6320/api-ms-win-crt-locale-l1-1-0.dll

but it's not in PATH:

$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/opt/bin:/c/Windows/System32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0/:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl

i have very little experience coding on windows... but maybe it's because the $PATH of MSYS doesn't contain the standard windows directories? maybe this is the MSYS2_PATH_TYPE: inherit in the CI file?

if i add it by hand:

PATH=/c/Windows/WinSxS/amd64_microsoft-windows-m..namespace-downlevel_31bf3856ad364e35_10.0.19041.1_none_7d55e8342077d456/:$PATH

then it just moves the goalpost to some moar pain:

C:/msys64/usr/bin/scheme.exe: error while loading shared libraries: csv956.dll: cannot open shared object file: No such file or directory

@benhormann
Copy link
Contributor

csv956.dll would normally be in "C:\Program Files\Chez Scheme 9.5.6\bin\ti3nt". I just install Chez and the vc_redist, then add Chez to PATH. Seems like you already have the VC Redist.

There is a way to configure it to inherit PATH but you shouldn't need to. IIRC, the CI uses that setting to locate git or other command.


Comparing the logs between ubuntu and windows, it looks equivalent up to make[2]: Leaving directory '/d/a/Idris2/Idris2/support/chez'. This is odd: 1 [main] cmp (4144) shared_info::initialize: size of shared memory region changed from 56248 to 57272

I'd stop silencing commands with @, at least until it is all working. Prepending set -x; might be helpful in some cases. (Normally make --dry-run is useful because it prints all commands, but here it doesn't terminate...)

Eventually you'll discover cmp is not included in base...
Either use something like if [ "$$(cat $@ 2>/dev/null)" != "$$(cat $@.new)" ];, or install diffutils.

Also use #!/usr/bin/env sh to be more accommodating to NixOS and Guix.
attila-lendvai added a commit to attila-lendvai-patches/Idris2 that referenced this pull request Apr 13, 2022
Please find the detailed description of this refactor in the form of
an upcoming commit extending the documentation with a chapter on the
build system.

* Eliminates everything bootstrap related beyond the two Makefiles.

* Adds a 'make release' target.

* Eliminates several sub makefiles from subdirs.

* Uses the $() convention in makefiles instead of ${}.

* Puts all build artifacts under build/ (except for the tests, because
for now we can't control how runtests invokes the idris binary).

* Changes the filename of intermediate outputs to include the CG's
name as file extension. This is needed for having better control in
the makefile to implement automated bootstrap shortcuts.

Discussion available at: idris-lang#1990
Please find the detailed description of this refactor in the form of
an upcoming commit extending the documentation with a chapter on the
build system.

* Eliminates everything bootstrap related beyond the two Makefiles.

* Adds a 'make release' target.

* Eliminates several sub makefiles from subdirs.

* Uses the $() convention in makefiles instead of ${}.

* Puts all build artifacts under build/ (except for the tests, because
for now we can't control how runtests invokes the idris binary).

* Changes the filename of intermediate outputs to include the CG's
name as file extension. This is needed for having better control in
the makefile to implement automated bootstrap shortcuts.

Discussion available at: idris-lang#1990
@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Apr 13, 2022

i've rebased the commits to master, and looks like i have fixed the windows build once again.

now it's back to the single, mysterious chez/chez020 failure on windows. i still think it's unrelated to this PR, only exposed by it.

any chance of merging this with that one windows failure? maybe someone who is actively developing on windows can pick this up and debug it that way?

@attila-lendvai
Copy link
Contributor Author

a reminder: i can see seemingly the same test failures on linux, in the Guix build sandbox: part of the output of the tests is simply missing, and thus many tests fail. last time i checked it was in a consistent manner, but not just one but several dozens.

the sandbox is very barren and isolated, it may even miss the glibc UTF-8 locales, etc.

@benhormann
Copy link
Contributor

@attila-lendvai Should at least fix the make --dry-run issue.

It looks like you choose what the first step is, build it, then recurse to build the next step. Since nothing changed it repeats.

Better to have bootstrap depend on each stage that needs to be built:

bootstrap: stage1 stage2
stage1: $(or $(FromScheme),build/bootstrap/idris2,build/stage0/idris2)
	# Commands...
build/bootstrap/idris2:
	# Commands for the first stage only.

This makes make bootstrap an alias for make stage2, so is nearly useless. The distinction is having an existing idris2 command or not. I'd use FromScheme=$(filter bootstrap,$(MAKECMDGOALS)) to decide, or a target-specific variable (also visible to its prerequisites) e.g. bootstrap: FromScheme=Yes.

I believe recursing the top Makefile is best avoided. Giving the commands to build the scheme files their own target should simplify matters.


I don't recall chez020 being an issue in my branch. I've noticed output buffering in a vm, so it could be related to asynchronous vs synchronous (needs to flush)? But in this case it's a pipe?
Could the commit used to generate the bootstrap files affect this test? Seems unlikely, but perhaps worth looking at.

@benhormann
Copy link
Contributor

benhormann commented Apr 17, 2022

There is still a lot that could/should be improved:

  • Avoid non-POSIX options (primarily echo -e).
    • Use printf '\n%s\n\n' <string> instead.
  • Avoid in-place editing.
  • Nested ifeq cannot be indented with tabs. Avoid nesting.
  • RefC build is not working. YAGNI, remove.
  • CHEZ=true should only be used for stage3, or not at all.
    • On the other hand, it requires less memory...
    • So add a --codegen-only / --no-compile option?
  • Do not rely on shebangs!
  • Do not rely on git / tags.
    • Get the build's version from idris2api.ipkg instead.
      • Do mismatched version numbers even work correctly?
    • Filter tags with --match v$(IDRIS_VERSION) for IDRIS_VERSION_TAG.
  • Revert .github/workflows/ci-bootstrap.yml paths.
  • Use export for common variables (IDRIS2_CG).
  • Too slow.
    • Avoid rebuilding for install.
    • Skip stage2 for quick-check (or stage1).
  • Too many comments?
    • Why document it multiple places? Can we simplify it so it makes sense?

TL;DR Not "finished" yet.


More thoughts:

Another reason to redo the bootstrap: that commit's history is not readily available.

Should IDRIS_VERSION be exported? Or just unset it for stage0.
Has this system been tested for when it becomes the "previous version"?

If you name it One-stage.mk it would sort nicely :)

What's $(4) $(5) $(6) $(7) $(8) for?
Are the make-stage functions worth it? Wouldn't it be clearer, though a little repetitive, with the actual command in full? Might help if STAGE[0-3]_IDRIS vars were absolute paths from the start.

Naming private variables with leading underscore hides it from autocomplete, which is nice. Makes it clear what is public and override more easily avoided too.

The build doesn't work if $(NAME) is not exactly idris2. Let's inline it.

What else can be simplified?


@gallais Another concern is this will require commitment to fixes/compatibility, and perhaps time to stabilise, before any new releases. Not sure what plans are there... Might be better to add a --no-compile option now, wait for next version, then tidy up this PR?

If this PR is merged as is, half of it would need changed for #1992.

@attila-lendvai
Copy link
Contributor Author

@benhormann thank you for the detailed feedback! i'll look into implementing them and report back with the result. (and sorry for losing track of those that you have already mentioned previously!)

@benhormann
Copy link
Contributor

@attila-lendvai All good. I could try a PR to your branch, but... I'm working on an alternate implementation / PoC.

IMO the stage0 recipe is not really worth including, building prerequisites is normally the caller's responsibility.
If it only builds one stage (i.e. much faster than make -C stage0 bootstrap), then it would be useful. But not worth the effort just to make it faster. How often would you use this?

Question: Is a two stage build useful outside of bootstrapping?
You're currently skipping stage1 if it exists, but it could be outdated somewhat easily (e.g. changing the launcher/support). Two stages is best if you want to install, in which case you probably don't want to skip stage1. In that case the caller should clean stage1 first.

If it always builds two stages, is there much difference if it starts from Scheme or installed Idris (assuming first stage from Scheme is cached)? Perhaps... if the installed Idris is significantly newer, such that only one stage is needed.

We could have make all do a simple build, and make bootstrap build two stages (either from Scheme or from an installed Idris). This could be the best of both, and would suit the CI nicely too.

Or we could keep it simple — let the caller decide to self-host or not...
This is almost equivalent of what you're doing (assuming no caches): make all install PREFIX=/tmp/x && make clean && make IDRIS2=/tmp/x/bin/idris2. Just needs a parameter for --build-dir to avoid cleaning in the middle, then add these commands as a handy recipe.

Food for thought. (Has the rationale behind the stages been discussed elsewhere?)

@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Apr 19, 2022

IMO the stage0 recipe is not really worth including, building prerequisites is normally the caller's responsibility. If it only builds one stage (i.e. much faster than make -C stage0 bootstrap), then it would be useful. But not worth the effort just to make it faster. How often would you use this?

it's more than just an optimization: it enables testing easily whether the version that was marked as the previous evolutionary stage (estage from now on) of the language can bootstrap the current HEAD (as opposed to the currently recorded bootstrap shortcut, i.e. the scheme files checked into the repo).

i think anyone who is changing stuff that influences the bootstrap should always run with STAGE0_GIT_REF set (EDIT: or BOOTSTRAP_IDRIS defined), and go through a make distclean before pushing the changes; i.e. go through a proper bootstrap off of the previous estage, not merely through a bootstrap shortcut (which is just a semi-random, intermediate revision, somewhere between two estages).

if the proper bootstrap fails, then it's a sign that a feature of idris has been used in HEAD that is not yet available in the previous estage. such a situation must be resolved either by introducing a new estage, or by rewriting the code without using/assuming anything that has been introduced since the previous estage.

idris currently seems to stick to this setup: HEAD must be compilable verbatim both by the previous estage and HEAD itself. in maru (the tiny lisp where i experimented with these things) it's a bit more flexible: there are two variables that the codebase can dispatch on: bootstrapping?, and evolving?. the former is set whenever maru is bootstrapping itself, and the latter is only set when the previous estage is used as the bootstrap host.

i'm not sure how applicable this is to idris, though. maru is a tiny, dynamically typed lisp, and it's optimized mostly for sourcecode size + readability/clarity, while avoiding any external dependencies.

Question: Is a two stage build useful outside of bootstrapping?

i'm not sure i understand your question in this wording. bootstrapping is an essential part of developing idris (and testing/preserving bootstrappability). also, which two stages do you mean? there are 4 stages currently.

do you mean simply compiling the scheme files vs. also using it to then compile HEAD?

You're currently skipping stage1 if it exists, but it could be outdated somewhat easily (e.g. changing the launcher/support).

stage1 being outdated is not a problem as long as its outdatedness is not affecting whether it can compile stage2 (i.e. HEAD). if someone wants to check whether the build is reproducible (i.e. compare the build artifacts in stage2 to stage3), then stage1 must be recent enough so that its compiler's output is the same as that of HEAD.

If it always builds two stages, is there much difference if it starts from Scheme or installed Idris (assuming first stage from Scheme is cached)?

yes, there's a difference: one tests whether HEAD (or more precisely the current scheme files) can bootstrap HEAD, while the other tests whether the previous estage of the language can bootstrap HEAD.

also note re "installed Idris": with this PR i'm also proposing the separation of public releases from the estages of the language that can bootstrap each other in a row. in the current setup they are the same, but it means that whenever a bootstrap stage is necessary for the introduction of a new feature, then a new release of idris also needs to be made.

have you read the documentation that i added in this PR? i'm asking because if you have, then it's apparently not written clearly enough. i was hoping that this all is explained there. i'd welcome some feedback on which parts i should extend.

(also note that i'm not sure i fully understood all the ideas that you proposed above. if i may sound confused somewhere...)

@benhormann
Copy link
Contributor

I have release version installed at ~/.idris2-$VERSION and symlinked as idris2-$VERSION. This way I don't have to rebuild it in each worktree. You have a good point about checking compatibility, making that easier and faster is a good idea.

It would be ideal to keep the build artifacts separate, e.g. building 'stage1' compiler having build dirs for 'bootstrap' and 'stage0' compilers. The TTC version can differ at that point (as you are aware). Basically you want to be able to test both and have each only compile modules that changed since last time. Ideally you could run such a test twice and no modules need rebuilt the second time.

Question: Is a two stage build useful outside of bootstrapping?

i'm not sure i understand your question in this wording. bootstrapping is an essential part of developing idris (and testing/preserving bootstrappability). also, which two stages do you mean? there are 4 stages currently.

Sorry, "bootstrapping" in this project is overloaded. I meant building from Scheme or previous version. I'd say "bootstrapping from GHC" for the traditional meaning. (I mean stage1 and stage2 — stage0 doesn't count i.e. isn't HEAD, and stage3 is not really a build stage).

BTW, if the stage3 idris2.ss is identical to stage2 idris2.ss, isn't that enough. What's the point of building the libs too?
Also, why run install for stage3?

if someone wants to check whether the build is reproducible (i.e. compare the build artifacts in stage2 to stage3), then stage1 must be recent enough so that its compiler's output is the same as that of HEAD.

The @generated by string might change often, but that's easy to ignore.

whenever a bootstrap stage is necessary for the introduction of a new feature, then a new release of idris also needs to be made

My understanding is this is intentional. Make evolutionary changes more often than public releases is a hassle (speculation)?
From what I've seen it is the libraries that tend to be more of an evolution bottleneck, but I'm not well informed here. I suppose it doesn't matter why, you either need a new step or you don't.


have you read the documentation that i added in this PR? i'm asking because if you have, then it's apparently not written clearly enough. i was hoping that this all is explained there. i'd welcome some feedback on which parts i should extend.

I read it. This description talks about traditional bootstrapping. That's important, but I'm more interested in getting the thing working.
If I want to bootstrap from GHC/Idris1, then I'd use code similar to my previous comments (instead of self-hosting each step). I'm still not convinced it needs to be part of the ordinary Makefile, better to have it as a separate script.

It makes more sense to me that we should optimise the Makefile for "build & install" and "testing my changes" scenarios.

@attila-lendvai
Copy link
Contributor Author

short of much encouragement, i have apparently lost interest in this PR.

If I want to bootstrap from GHC/Idris1, then I'd use code similar to my previous comments (instead of self-hosting each step). I'm still not convinced it needs to be part of the ordinary Makefile, better to have it as a separate script.

it's always just bootstrapping from the previous evolutionary stage of the language. sometimes it just happens to be GHC/Idris1, some other times it's V8.5.

i see value in reifying it into the main workflow to keep it in a functional state. i also see value in keeping the language bootstrappable at all times (see the link for the argument), but i don't have enough stake in Idris to advocate for these values more than what i have already done.

@gallais
Copy link
Member

gallais commented Sep 25, 2022

Sorry that you didn't get much feedback but:

  1. it's a huge change
  2. to a crucial component that currently works satisfactorily
  3. that very few people will be knowledgeable enough to review
  4. and that still does not pass the CI

As highlighted in the CONTRIBUTING:

Please remember when making contributions via pull requests, though, that Idris is primarily a research project, and we are a small team, which limits the time we have available for reviewing and maintaining PRs. Nobody works full time on Idris - we're a team of academics and students at a university with other demands on our time such as teaching, and writing and presenting research papers, which will always take priority. This also means we have to be careful to make sure that we can commit to maintaining any new features which are contributed.

Things that should be discussed via the issue tracker first
(...)
Any fundamental changes to build system, library structure, or CI workflow

@attila-lendvai
Copy link
Contributor Author

attila-lendvai commented Sep 26, 2022

@gallais no hard feelings, i'm just stating the fact for any other interested parties that i probably won't work on this anymore. i enjoy exploring bootstrapping issues, i had my fun with Idris, but i got exhausted and by now most of it is gone from my short-term memory. whether the result is of any value is up to the maintainers to decide.

still does not pass the CI

i don't think this is a fair statement. the CI was clear for a long time except a single test failure on windows, and i'm pretty sure it's not related to this PR, it only exposes it. even though i hate touching windows, i looked into it, and the value seems to be fine, yet it fails to get printed for some misterious reason.

@gallais
Copy link
Member

gallais commented Sep 26, 2022

i don't think this is a fair statement.

My bad, I hadn't looked into the details of the failure.

@gallais gallais closed this Oct 2, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants