Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[AST] Add 'Fix' #6793

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

[AST] Add 'Fix' #6793

wants to merge 5 commits into from

Conversation

effectfully
Copy link
Contributor

@effectfully effectfully commented Jan 15, 2025

This is an experiment adding Fix to the AST to see how much faster we could recurse.

The main change is adding this constructor to the untyped AST:

data Term name uni fun ann
    = <...>
    | Fix !ann !name !(Term name uni fun ann)

which in the typed ASTs becomes

data Term tyname name uni fun ann
    = <...>
    | Fix ann name (Type tyname uni ann) (Term tyname name uni fun ann)

with the following typing:

-- [check| G !- ty :: *]    ty ~> vTy    [check| G , rec : vTy !- body : vTy]
-- ----------------------------------------------------------------------------
-- [infer| G !- fix rec ty body : vTy]
inferTypeM (Fix ann rec ty body) = do
    checkKindM ann ty $ Type ()
    vTy <- normalizeTypeM $ void ty
    withVar rec vTy $ checkTypeM ann body vTy
    pure vTy

and the following operational semantics:

        body ~> bodyV
----------------------------- (1)
fix rec body ~> fix rec bodyV

                  <empty>
-------------------------------------------- (2)
fix rec bodyV ~> [fix rec bodyV / rec] bodyV

I.e. we make fix a binder so that we can evaluate the body of a fix once (the (1) rule) and then substitute it for the name of the recursive call without recomputing it each time (the (2) rule).

In the CK machine this looks as

data Frame uni fun
    = <...>
    | FrameFix Name (Type TyName uni ())

-- By (1)
stack |> Fix _ recName ty body = FrameFix recName ty : stack |> body

-- By (2)
FrameFix recName ty : stack <| bodyV =
    let bodyTerm = ckValueToTerm bodyV
    in case bodyV of
        VLamAbs{} -> stack |> termSubstClosedTerm recName (Fix () recName ty bodyTerm) bodyTerm
        _         -> throwingWithCause _MachineError NonLambdaFixedMachineError $ Just bodyTerm

The reason why we match on VLamAbs is to make the semantics of the CK machine match the ones of the CEK machine where we have to match on VLamAbs in order to update its environment, this is how it looks:

data Context uni fun ann
    = <...>
    | FrameFix {-# UNPACK #-} !Word64 !(Context uni fun ann)

-- By (1)
computeCek !ctx !env (Fix _ rec body) = do
    stepAndMaybeSpend BFix
    let !len' = Env.length env + 1
    computeCek (FrameFix len' ctx) (Env.cons (VBlackHole (ndbnString rec) len') env) body

-- By (2)
returnCek (FrameFix recIx ctx) bodyV =
    case bodyV of
        VLamAbs nameArg bodyLam env -> do
            let env' = Env.contUpdateZero (\_ -> bodyV') env (Env.length env - recIx)
                bodyV' = VLamAbs nameArg bodyLam env'
            returnCek ctx bodyV'
        VBlackHole{} -> throwingDischarged _MachineError FixLoopMachineError bodyV
        _ -> throwingDischarged _MachineError NonLambdaFixedMachineError bodyV

The tricky parts:

  1. In computeCek we save the current size of the environment in FrameFix and proceed to evaluate the body in the current environment extended with a VBlackHole. VBlackHole is a new kind of CekValue that we add to provide a value for the recursive call while we're evaluating the body of a fixpoint to a value. If while evaluating the body we reach VBlackHole, we know that this is infinite recursion and terminate the program earlier. The name is inspired by black holes in GHC, which are thunks being evaluated (and evaluation of a thunk runs into a blackhole, that is also a loop and the Haskell program terminates).
  2. Once we've evaluated the body, we need to update the rec :=> VBlackHole environment entry by replacing VBlackHole with an actual value for rec, for that I added contUpdateZero (similar to contIndexZero). At that point we might have added new entries to the environment and so we need to calculate the De Bruijn index of rec in the environment by subtracting the old size of the environment from the new one.
  3. We completely discharge FrameFix in returnCek without creating any new frames. We're able to achieve that because we encode recursion through a circular object: env' refers to bodyV' and vice versa -- this is an actual loop in memory. The idea here is to make the environment of bodyV' contain bodyV' itself, including its environment containing bodyV', including its environment...

This should be the fastest way of evaluating Fix, since it doesn't create any additional frames, we just literally evaluate the body of a fixpoint multiple times. But it comes with a big disadvantage of creating an infinite structure requiring very careful handling if we want to support it in dischargeCekValue or the Show instance (we'll need to either use unsafePtrEquality to check for pointer loops or make VLamAbs carry a different sort of name indicating that it's an infinitely unrolled fixpoint).

Yet, even with such a performance-focused design the results are very underwhelming: nofib isn't detectably faster (do we have other benchmarks that get recompiled every time?). The lists benchmarks are faster by up to 14.5%, which is normally a very welcome improvement, but given that we're adding an entire new AST node (with very non-trivial evaluation rules), is it worth it?

Note that we don't save much size either, this is what replacing the Z combinator with native recursion in a script amounts to:
image

@effectfully effectfully added Evaluation AST Performance EXPERIMENT Experiments that we probably don't want to merge labels Jan 15, 2025
@effectfully effectfully self-assigned this Jan 15, 2025
({cpu: 410655782016
| mem: 1584672947})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. Must be Fix impeding optimization, I can't think of anything else that could cause code to become slower.


Aggregate Multi Key

n Script size CPU usage Memory usage
----------------------------------------------------------------------
- 1705 (10.4%) 3446371236 (34.5%) 422386 (3.0%)
- 1698 (10.4%) 3439411236 (34.4%) 378886 (2.7%)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About 10% improvement for things in this file. Not a lot, but not too little either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very insignificant changes.

({cpu: 193142904
| mem: 819552})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2% for CPU and 3% for MEM is nothing.

({cpu: 77960900
| mem: 424300})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.9% for CPU and 6.7% for MEM is better, but still not a lot.

| budget: ({cpu: 2999382
| mem: 14712})
| budget: ({cpu: 2215382
| mem: 9812})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

26% CPU and 33.3% MEM.

| budget: ({cpu: 6960100
| mem: 43600})
| budget: ({cpu: 5600100
| mem: 35100})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19.5% CPU and 19.5% MEM.

| budget: ({cpu: 8124345
| mem: 48505})
| budget: ({cpu: 6684345
| mem: 39505})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17.7% CPU and 18.5% MEM.

({cpu: 128100
| mem: 900})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warmup cost for the Z combinator is really high somehow, compared to the native fixpoint combinator.

({cpu: 5486960
| mem: 26920})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9.5% for CPU and 11.8% for MEM.

3410
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size differences are all insignificant, unsurprisingly.

@effectfully
Copy link
Contributor Author

/benchmark nofib

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'nofib' on '52664194d' (base) and '2da59cbe1' (PR)

Results table
Script 5266419 2da59cb Change
clausify/formula1 2.431 ms 2.468 ms +1.5%
clausify/formula2 3.273 ms 3.319 ms +1.4%
clausify/formula3 9.009 ms 9.113 ms +1.2%
clausify/formula4 21.18 ms 21.47 ms +1.4%
clausify/formula5 43.30 ms 44.35 ms +2.4%
knights/4x4 15.33 ms 15.51 ms +1.2%
knights/6x6 39.23 ms 39.18 ms -0.1%
knights/8x8 68.64 ms 68.43 ms -0.3%
primetest/05digits 9.461 ms 9.718 ms +2.7%
primetest/10digits 18.43 ms 19.13 ms +3.8%
primetest/30digits 56.56 ms 58.62 ms +3.6%
primetest/50digits 92.77 ms 97.25 ms +4.8%
queens4x4/bt 4.526 ms 4.578 ms +1.1%
queens4x4/bm 5.618 ms 5.733 ms +2.0%
queens4x4/bjbt1 5.428 ms 5.511 ms +1.5%
queens4x4/bjbt2 5.107 ms 5.190 ms +1.6%
queens4x4/fc 11.21 ms 11.39 ms +1.6%
queens5x5/bt 61.62 ms 63.07 ms +2.4%
queens5x5/bm 63.08 ms 65.15 ms +3.3%
queens5x5/bjbt1 71.47 ms 73.05 ms +2.2%
queens5x5/bjbt2 69.78 ms 71.03 ms +1.8%
queens5x5/fc 140.5 ms 144.3 ms +2.7%
5266419 2da59cb Change
TOTAL 818.0 ms 837.6 ms +2.4%

@effectfully
Copy link
Contributor Author

looooooooooooooool

@effectfully
Copy link
Contributor Author

/benchmark nofib

Copy link
Contributor

Click here to check the status of your benchmark.

@effectfully
Copy link
Contributor Author

/benchmark lists

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark lists

Copy link
Contributor

Comparing benchmark results of 'nofib' on '52664194d' (base) and '2da59cbe1' (PR)

Results table
Script 5266419 2da59cb Change
clausify/formula1 2.514 ms 2.466 ms -1.9%
clausify/formula2 3.394 ms 3.312 ms -2.4%
clausify/formula3 9.323 ms 9.114 ms -2.2%
clausify/formula4 22.10 ms 21.50 ms -2.7%
clausify/formula5 44.78 ms 44.39 ms -0.9%
knights/4x4 15.96 ms 15.64 ms -2.0%
knights/6x6 41.23 ms 39.22 ms -4.9%
knights/8x8 72.12 ms 68.49 ms -5.0%
primetest/05digits 9.890 ms 9.696 ms -2.0%
primetest/10digits 19.32 ms 18.90 ms -2.2%
primetest/30digits 58.92 ms 59.08 ms +0.3%
primetest/50digits 97.39 ms 97.02 ms -0.4%
queens4x4/bt 4.694 ms 4.574 ms -2.6%
queens4x4/bm 5.817 ms 5.738 ms -1.4%
queens4x4/bjbt1 5.627 ms 5.647 ms +0.4%
queens4x4/bjbt2 5.303 ms 5.192 ms -2.1%
queens4x4/fc 11.67 ms 11.47 ms -1.7%
queens5x5/bt 63.62 ms 63.04 ms -0.9%
queens5x5/bm 65.43 ms 65.21 ms -0.3%
queens5x5/bjbt1 74.50 ms 73.14 ms -1.8%
queens5x5/bjbt2 72.50 ms 71.35 ms -1.6%
queens5x5/fc 146.5 ms 144.5 ms -1.4%
5266419 2da59cb Change
TOTAL 852.6 ms 838.7 ms -1.6%

Copy link
Contributor

Click here to check the status of your benchmark.

@effectfully
Copy link
Contributor Author

effectfully commented Jan 15, 2025

looooooooooooooool

OK, so that was noise. It's kinda hard to believe that the new way of doing recursion is as efficient as the old one... But at the same time I don't think that the benchmarks don't use the old Z combinator, because the golden budget tests are updated. This is really funny, is the cost of doing recursion this insignificant for complex programs like nofib? Let's see what the lists benchmark say.

Copy link
Contributor

Comparing benchmark results of 'lists' on '52664194d' (base) and '2da59cbe1' (PR)

Results table
Script 5266419 2da59cb Change
sort/ghcSort/50 194.6 μs 190.4 μs -2.2%
sort/ghcSort/100 454.0 μs 445.7 μs -1.8%
sort/ghcSort/150 784.1 μs 778.7 μs -0.7%
sort/ghcSort/200 1.053 ms 1.038 ms -1.4%
sort/ghcSort/250 1.360 ms 1.343 ms -1.3%
sort/ghcSort/300 1.794 ms 1.771 ms -1.3%
sort/insertionSort/50 672.0 μs 680.5 μs +1.3%
sort/insertionSort/100 2.688 ms 2.696 ms +0.3%
sort/insertionSort/150 6.060 ms 6.039 ms -0.3%
sort/insertionSort/200 10.85 ms 10.79 ms -0.6%
sort/insertionSort/250 16.99 ms 16.88 ms -0.6%
sort/insertionSort/300 24.59 ms 24.42 ms -0.7%
sort/mergeSort/50 610.0 μs 602.0 μs -1.3%
sort/mergeSort/100 1.396 ms 1.368 ms -2.0%
sort/mergeSort/150 2.250 ms 2.200 ms -2.2%
sort/mergeSort/200 3.155 ms 3.103 ms -1.6%
sort/mergeSort/250 4.132 ms 4.092 ms -1.0%
sort/mergeSort/300 5.049 ms 4.923 ms -2.5%
sort/quickSort/50 1.596 ms 1.593 ms -0.2%
sort/quickSort/100 6.498 ms 6.527 ms +0.4%
sort/quickSort/150 14.65 ms 14.67 ms +0.1%
sort/quickSort/200 26.00 ms 25.92 ms -0.3%
sort/quickSort/250 41.00 ms 40.86 ms -0.3%
sort/quickSort/300 59.32 ms 59.08 ms -0.4%
sum/compiled-from-Haskell/sum-right-builtin/100 82.00 μs 77.06 μs -6.0%
sum/compiled-from-Haskell/sum-right-builtin/500 408.7 μs 398.6 μs -2.5%
sum/compiled-from-Haskell/sum-right-builtin/1000 889.6 μs 844.6 μs -5.1%
sum/compiled-from-Haskell/sum-right-builtin/2500 2.700 ms 2.577 ms -4.6%
sum/compiled-from-Haskell/sum-right-builtin/5000 5.833 ms 5.504 ms -5.6%
sum/compiled-from-Haskell/sum-right-Scott/100 50.19 μs 48.26 μs -3.8%
sum/compiled-from-Haskell/sum-right-Scott/500 266.4 μs 252.0 μs -5.4%
sum/compiled-from-Haskell/sum-right-Scott/1000 578.9 μs 546.0 μs -5.7%
sum/compiled-from-Haskell/sum-right-Scott/2500 1.931 ms 1.793 ms -7.1%
sum/compiled-from-Haskell/sum-right-Scott/5000 4.705 ms 4.514 ms -4.1%
sum/compiled-from-Haskell/sum-right-data/100 255.5 μs 253.1 μs -0.9%
sum/compiled-from-Haskell/sum-right-data/500 1.407 ms 1.395 ms -0.9%
sum/compiled-from-Haskell/sum-right-data/1000 3.176 ms 3.135 ms -1.3%
sum/compiled-from-Haskell/sum-right-data/2500 8.531 ms 8.405 ms -1.5%
sum/compiled-from-Haskell/sum-right-data/5000 17.80 ms 17.72 ms -0.4%
sum/compiled-from-Haskell/sum-left-builtin/100 77.23 μs 73.47 μs -4.9%
sum/compiled-from-Haskell/sum-left-builtin/500 406.3 μs 377.2 μs -7.2%
sum/compiled-from-Haskell/sum-left-builtin/1000 862.9 μs 801.9 μs -7.1%
sum/compiled-from-Haskell/sum-left-builtin/2500 2.647 ms 2.461 ms -7.0%
sum/compiled-from-Haskell/sum-left-builtin/5000 5.758 ms 5.405 ms -6.1%
sum/compiled-from-Haskell/sum-left-Scott/100 48.85 μs 46.55 μs -4.7%
sum/compiled-from-Haskell/sum-left-Scott/500 257.7 μs 243.9 μs -5.4%
sum/compiled-from-Haskell/sum-left-Scott/1000 560.7 μs 527.7 μs -5.9%
sum/compiled-from-Haskell/sum-left-Scott/2500 1.826 ms 1.690 ms -7.4%
sum/compiled-from-Haskell/sum-left-Scott/5000 4.394 ms 4.236 ms -3.6%
sum/compiled-from-Haskell/sum-left-data/100 258.7 μs 255.0 μs -1.4%
sum/compiled-from-Haskell/sum-left-data/500 1.441 ms 1.394 ms -3.3%
sum/compiled-from-Haskell/sum-left-data/1000 3.241 ms 3.154 ms -2.7%
sum/compiled-from-Haskell/sum-left-data/2500 8.668 ms 8.380 ms -3.3%
sum/compiled-from-Haskell/sum-left-data/5000 18.28 ms 17.68 ms -3.3%
sum/hand-written-PLC/sum-right-builtin/100 53.56 μs 51.88 μs -3.1%
sum/hand-written-PLC/sum-right-builtin/500 266.2 μs 267.1 μs +0.3%
sum/hand-written-PLC/sum-right-builtin/1000 554.9 μs 541.4 μs -2.4%
sum/hand-written-PLC/sum-right-builtin/2500 1.605 ms 1.560 ms -2.8%
sum/hand-written-PLC/sum-right-builtin/5000 3.558 ms 3.412 ms -4.1%
sum/hand-written-PLC/sum-right-Scott/100 39.10 μs 33.43 μs -14.5%
sum/hand-written-PLC/sum-right-Scott/500 201.2 μs 171.9 μs -14.6%
sum/hand-written-PLC/sum-right-Scott/1000 420.3 μs 361.1 μs -14.1%
sum/hand-written-PLC/sum-right-Scott/2500 1.262 ms 1.053 ms -16.6%
sum/hand-written-PLC/sum-right-Scott/5000 3.161 ms 2.722 ms -13.9%
sum/hand-written-PLC/sum-left-builtin/100 55.76 μs 55.38 μs -0.7%
sum/hand-written-PLC/sum-left-builtin/500 275.9 μs 273.7 μs -0.8%
sum/hand-written-PLC/sum-left-builtin/1000 548.2 μs 552.7 μs +0.8%
sum/hand-written-PLC/sum-left-builtin/2500 1.367 ms 1.362 ms -0.4%
sum/hand-written-PLC/sum-left-builtin/5000 2.731 ms 2.751 ms +0.7%
sum/hand-written-PLC/sum-left-Scott/100 43.15 μs 36.06 μs -16.4%
sum/hand-written-PLC/sum-left-Scott/500 214.5 μs 181.1 μs -15.6%
sum/hand-written-PLC/sum-left-Scott/1000 437.4 μs 372.4 μs -14.9%
sum/hand-written-PLC/sum-left-Scott/2500 1.219 ms 1.026 ms -15.8%
sum/hand-written-PLC/sum-left-Scott/5000 2.775 ms 2.416 ms -12.9%
5266419 2da59cb Change
TOTAL 351.3 ms 345.4 ms -1.7%

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'lists' on '52664194d' (base) and '2da59cbe1' (PR)

Results table
Script 5266419 2da59cb Change
sort/ghcSort/50 194.9 μs 186.6 μs -4.3%
sort/ghcSort/100 453.9 μs 436.1 μs -3.9%
sort/ghcSort/150 782.0 μs 762.5 μs -2.5%
sort/ghcSort/200 1.053 ms 1.016 ms -3.5%
sort/ghcSort/250 1.362 ms 1.317 ms -3.3%
sort/ghcSort/300 1.796 ms 1.738 ms -3.2%
sort/insertionSort/50 671.9 μs 657.8 μs -2.1%
sort/insertionSort/100 2.680 ms 2.612 ms -2.5%
sort/insertionSort/150 6.053 ms 5.865 ms -3.1%
sort/insertionSort/200 10.81 ms 10.46 ms -3.2%
sort/insertionSort/250 17.02 ms 16.41 ms -3.6%
sort/insertionSort/300 24.57 ms 23.74 ms -3.4%
sort/mergeSort/50 604.4 μs 581.5 μs -3.8%
sort/mergeSort/100 1.388 ms 1.329 ms -4.3%
sort/mergeSort/150 2.237 ms 2.139 ms -4.4%
sort/mergeSort/200 3.136 ms 3.008 ms -4.1%
sort/mergeSort/250 4.121 ms 3.938 ms -4.4%
sort/mergeSort/300 5.013 ms 4.787 ms -4.5%
sort/quickSort/50 1.600 ms 1.531 ms -4.3%
sort/quickSort/100 6.529 ms 6.292 ms -3.6%
sort/quickSort/150 14.69 ms 14.12 ms -3.9%
sort/quickSort/200 26.00 ms 25.05 ms -3.7%
sort/quickSort/250 40.95 ms 39.51 ms -3.5%
sort/quickSort/300 59.36 ms 56.89 ms -4.2%
sum/compiled-from-Haskell/sum-right-builtin/100 78.55 μs 76.45 μs -2.7%
sum/compiled-from-Haskell/sum-right-builtin/500 409.0 μs 395.0 μs -3.4%
sum/compiled-from-Haskell/sum-right-builtin/1000 873.7 μs 865.9 μs -0.9%
sum/compiled-from-Haskell/sum-right-builtin/2500 2.700 ms 2.555 ms -5.4%
sum/compiled-from-Haskell/sum-right-builtin/5000 5.821 ms 5.510 ms -5.3%
sum/compiled-from-Haskell/sum-right-Scott/100 50.22 μs 48.17 μs -4.1%
sum/compiled-from-Haskell/sum-right-Scott/500 266.6 μs 253.8 μs -4.8%
sum/compiled-from-Haskell/sum-right-Scott/1000 579.5 μs 545.7 μs -5.8%
sum/compiled-from-Haskell/sum-right-Scott/2500 1.934 ms 1.792 ms -7.3%
sum/compiled-from-Haskell/sum-right-Scott/5000 4.693 ms 4.525 ms -3.6%
sum/compiled-from-Haskell/sum-right-data/100 256.0 μs 251.6 μs -1.7%
sum/compiled-from-Haskell/sum-right-data/500 1.415 ms 1.384 ms -2.2%
sum/compiled-from-Haskell/sum-right-data/1000 3.169 ms 3.113 ms -1.8%
sum/compiled-from-Haskell/sum-right-data/2500 8.569 ms 8.347 ms -2.6%
sum/compiled-from-Haskell/sum-right-data/5000 17.92 ms 17.60 ms -1.8%
sum/compiled-from-Haskell/sum-left-builtin/100 77.27 μs 73.26 μs -5.2%
sum/compiled-from-Haskell/sum-left-builtin/500 406.9 μs 378.5 μs -7.0%
sum/compiled-from-Haskell/sum-left-builtin/1000 862.4 μs 803.1 μs -6.9%
sum/compiled-from-Haskell/sum-left-builtin/2500 2.646 ms 2.459 ms -7.1%
sum/compiled-from-Haskell/sum-left-builtin/5000 5.755 ms 5.416 ms -5.9%
sum/compiled-from-Haskell/sum-left-Scott/100 48.75 μs 46.62 μs -4.4%
sum/compiled-from-Haskell/sum-left-Scott/500 257.7 μs 244.2 μs -5.2%
sum/compiled-from-Haskell/sum-left-Scott/1000 560.9 μs 526.2 μs -6.2%
sum/compiled-from-Haskell/sum-left-Scott/2500 1.824 ms 1.693 ms -7.2%
sum/compiled-from-Haskell/sum-left-Scott/5000 4.337 ms 4.243 ms -2.2%
sum/compiled-from-Haskell/sum-left-data/100 258.5 μs 253.4 μs -2.0%
sum/compiled-from-Haskell/sum-left-data/500 1.433 ms 1.389 ms -3.1%
sum/compiled-from-Haskell/sum-left-data/1000 3.237 ms 3.128 ms -3.4%
sum/compiled-from-Haskell/sum-left-data/2500 8.669 ms 8.322 ms -4.0%
sum/compiled-from-Haskell/sum-left-data/5000 18.28 ms 17.58 ms -3.8%
sum/hand-written-PLC/sum-right-builtin/100 53.59 μs 47.55 μs -11.3%
sum/hand-written-PLC/sum-right-builtin/500 267.4 μs 240.0 μs -10.2%
sum/hand-written-PLC/sum-right-builtin/1000 556.7 μs 500.3 μs -10.1%
sum/hand-written-PLC/sum-right-builtin/2500 1.607 ms 1.411 ms -12.2%
sum/hand-written-PLC/sum-right-builtin/5000 3.548 ms 3.195 ms -9.9%
sum/hand-written-PLC/sum-right-Scott/100 39.17 μs 33.20 μs -15.2%
sum/hand-written-PLC/sum-right-Scott/500 201.9 μs 171.6 μs -15.0%
sum/hand-written-PLC/sum-right-Scott/1000 420.0 μs 361.1 μs -14.0%
sum/hand-written-PLC/sum-right-Scott/2500 1.263 ms 1.053 ms -16.6%
sum/hand-written-PLC/sum-right-Scott/5000 3.156 ms 2.720 ms -13.8%
sum/hand-written-PLC/sum-left-builtin/100 55.52 μs 49.66 μs -10.6%
sum/hand-written-PLC/sum-left-builtin/500 275.7 μs 245.3 μs -11.0%
sum/hand-written-PLC/sum-left-builtin/1000 549.0 μs 494.1 μs -10.0%
sum/hand-written-PLC/sum-left-builtin/2500 1.364 ms 1.225 ms -10.2%
sum/hand-written-PLC/sum-left-builtin/5000 2.726 ms 2.451 ms -10.1%
sum/hand-written-PLC/sum-left-Scott/100 43.23 μs 36.27 μs -16.1%
sum/hand-written-PLC/sum-left-Scott/500 215.1 μs 180.9 μs -15.9%
sum/hand-written-PLC/sum-left-Scott/1000 438.9 μs 369.7 μs -15.8%
sum/hand-written-PLC/sum-left-Scott/2500 1.221 ms 1.008 ms -17.4%
sum/hand-written-PLC/sum-left-Scott/5000 2.774 ms 2.378 ms -14.3%
5266419 2da59cb Change
TOTAL 351.2 ms 336.4 ms -4.2%

@effectfully
Copy link
Contributor Author

/benchmark marlowe

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark marlowe

@effectfully
Copy link
Contributor Author

/benchmark validation

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark validation

@effectfully
Copy link
Contributor Author

/benchmark bls12-381-benchmarks

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark bls12-381-benchmarks

Copy link
Contributor

Comparing benchmark results of 'nofib' on '52664194d' (base) and '2da59cbe1' (PR)

Results table
Script 5266419 2da59cb Change
clausify/formula1 2.458 ms 2.492 ms +1.4%
clausify/formula2 3.343 ms 3.320 ms -0.7%
clausify/formula3 9.154 ms 9.119 ms -0.4%
clausify/formula4 21.64 ms 21.51 ms -0.6%
clausify/formula5 44.24 ms 45.29 ms +2.4%
knights/4x4 15.65 ms 15.58 ms -0.4%
knights/6x6 39.94 ms 39.21 ms -1.8%
knights/8x8 69.90 ms 69.09 ms -1.2%
primetest/05digits 9.620 ms 9.799 ms +1.9%
primetest/10digits 18.79 ms 19.09 ms +1.6%
primetest/30digits 57.80 ms 59.56 ms +3.0%
primetest/50digits 94.75 ms 98.07 ms +3.5%
queens4x4/bt 4.608 ms 4.596 ms -0.3%
queens4x4/bm 5.713 ms 5.793 ms +1.4%
queens4x4/bjbt1 5.538 ms 5.554 ms +0.3%
queens4x4/bjbt2 5.220 ms 5.356 ms +2.6%
queens4x4/fc 11.43 ms 11.50 ms +0.6%
queens5x5/bt 63.04 ms 63.73 ms +1.1%
queens5x5/bm 64.17 ms 67.00 ms +4.4%
queens5x5/bjbt1 72.81 ms 73.94 ms +1.6%
queens5x5/bjbt2 71.03 ms 71.89 ms +1.2%
queens5x5/fc 142.7 ms 145.8 ms +2.2%
5266419 2da59cb Change
TOTAL 833.5 ms 847.3 ms +1.6%

@michaelpj
Copy link
Contributor

What happens if we evaluate a reference to rec before we have done a substitution? It will fail, right? That's a little odd. Should we really be requiring the body of the fix to be a syntactic value or something?

The reason why we match on VLamAbs is to make the semantics of the CK machine match the ones of the CEK machine where we have to match on VLamAbs in order to update its environment, this is how it looks

What if we changed the definition of CekValue so that values always had an environment? In most cases that would be empty; in the case of a lambda/delay it would generally have something in it, but crucially it would let us add to the environment of a value regardless of what the value is. This is also what we'd want if we were going to add a native let to PLC (unsurprisingly!): you just want it to be the value of the inner term plus whatever new bindings you have.

This should be the fastest way of evaluating Fix, since it doesn't create any additional frames, we just literally evaluate the body of a fixpoint multiple times. But it comes with a big disadvantage of creating an infinite structure requiring very careful handling

I agree, this is pretty tricky. I'd be tempted to at least try a version that doesn't rely on the in-memory loop. I'd be interested to see if there is much difference in performance.


Overall the benchmarks are a bit surprising. I was thinking there might be size benefits, but also no. Unless we can squeeze more performance out of it, I'm unsure if it's enough of a win. I guess we just don't do enough recursion for it to really bite 🤔

@effectfully
Copy link
Contributor Author

What happens if we evaluate a reference to rec before we have done a substitution? It will fail, right? That's a little odd.

Very good point, thank you. It is indeed odd. But a bottom is a bottom, I guess, we can always says that we don't distinguish one bottom from another like GHC does. Operationally it's kind of a mess though, I agree.

What if we changed the definition of CekValue so that values always had an environment? In most cases that would be empty; in the case of a lambda/delay it would generally have something in it, but crucially it would let us add to the environment of a value regardless of what the value is.

It'd be a bunch of pointless overhead, as long as this specific PR is concerned. I think looking inside of the body of a fix once per fix node is totally fine. I didn't try thinking whether requiring the body to be a lambda changes any semantics or not, at least we know that we still have to evaluate the body first in order to only do it once.

I'd be tempted to at least try a version that doesn't rely on the in-memory loop.

Me too, but I couldn't come up with a solution quickly and ended up implementing a circular program instead 🙈

I'd be interested to see if there is much difference in performance.

I'd also be curious, but if we agree that what we see doesn't warrant an extra AST node, then neither will another solution and we shouldn't waste time figuring it out.

Unless we can squeeze more performance out of it, I'm unsure if it's enough of a win. I guess we just don't do enough recursion for it to really bite 🤔

Yeah, I feel the same way and @colll78 said something along these lines too.

@effectfully
Copy link
Contributor Author

effectfully commented Jan 23, 2025

It will fail, right?

Actually, I think it's a bug, so thanks for catching! We don't have names and even if we had them, that would be variable capture. But currently referencing rec without returning a lambda first would just point to a different variable, so this is entirely screwed up.

I think what we need to do in computeCek is to save the current environment in FrameFix, extend the environment with a variable referencing itself and then proceed to returnCek.

I'll try it out. Thanks!

@michaelpj
Copy link
Contributor

I guess overall I would expect this to have basically identical semantics to our existing recursion combinators, and if it doesn't then that's a bit suspicious.

Being strict does make the naive approach tricky. We can't just stick the recursively defined value into an environment, since it's not a value until we evaluate it, which we can't do until we've put it in the environment. Did we consider a version that is instead an encoding of Z? So we know that the body can be eta-expanded?

@effectfully effectfully mentioned this pull request Jan 24, 2025
@effectfully effectfully force-pushed the effectfully/ast/add-Fix branch from 834aa00 to b2b4864 Compare January 28, 2025 03:00
@effectfully
Copy link
Contributor Author

/benchmark nofib

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark nofib

@effectfully
Copy link
Contributor Author

/benchmark lists

1 similar comment
@effectfully
Copy link
Contributor Author

/benchmark lists

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'nofib' on '30c3db402' (base) and '97c1471b6' (PR)

Results table
Script 30c3db4 97c1471 Change
clausify/formula1 2.473 ms 2.409 ms -2.6%
clausify/formula2 3.319 ms 3.217 ms -3.1%
clausify/formula3 9.091 ms 8.930 ms -1.8%
clausify/formula4 21.10 ms 21.11 ms +0.0%
clausify/formula5 44.15 ms 43.41 ms -1.7%
knights/4x4 15.68 ms 15.83 ms +1.0%
knights/6x6 39.58 ms 39.53 ms -0.1%
knights/8x8 69.06 ms 68.14 ms -1.3%
primetest/05digits 9.406 ms 10.02 ms +6.5%
primetest/10digits 18.40 ms 19.29 ms +4.8%
primetest/30digits 57.34 ms 59.90 ms +4.5%
primetest/50digits 96.87 ms 98.39 ms +1.6%
queens4x4/bt 4.565 ms 4.575 ms +0.2%
queens4x4/bm 5.696 ms 5.705 ms +0.2%
queens4x4/bjbt1 5.506 ms 5.523 ms +0.3%
queens4x4/bjbt2 5.160 ms 5.186 ms +0.5%
queens4x4/fc 11.32 ms 11.44 ms +1.1%
queens5x5/bt 62.70 ms 62.87 ms +0.3%
queens5x5/bm 64.54 ms 64.65 ms +0.2%
queens5x5/bjbt1 72.87 ms 72.90 ms +0.0%
queens5x5/bjbt2 70.99 ms 71.11 ms +0.2%
queens5x5/fc 143.5 ms 145.2 ms +1.2%
30c3db4 97c1471 Change
TOTAL 833.3 ms 839.3 ms +0.7%

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'nofib' on '30c3db402' (base) and '97c1471b6' (PR)

Results table
Script 30c3db4 97c1471 Change
clausify/formula1 2.497 ms 2.418 ms -3.2%
clausify/formula2 3.334 ms 3.252 ms -2.5%
clausify/formula3 9.207 ms 8.960 ms -2.7%
clausify/formula4 21.27 ms 21.20 ms -0.3%
clausify/formula5 44.48 ms 43.41 ms -2.4%
knights/4x4 15.79 ms 15.73 ms -0.4%
knights/6x6 39.88 ms 39.02 ms -2.2%
knights/8x8 69.48 ms 67.75 ms -2.5%
primetest/05digits 9.888 ms 9.925 ms +0.4%
primetest/10digits 18.50 ms 19.05 ms +3.0%
primetest/30digits 59.35 ms 58.77 ms -1.0%
primetest/50digits 95.48 ms 97.27 ms +1.9%
queens4x4/bt 4.572 ms 4.566 ms -0.1%
queens4x4/bm 5.693 ms 5.685 ms -0.1%
queens4x4/bjbt1 5.482 ms 5.497 ms +0.3%
queens4x4/bjbt2 5.157 ms 5.147 ms -0.2%
queens4x4/fc 11.32 ms 11.43 ms +1.0%
queens5x5/bt 62.63 ms 62.90 ms +0.4%
queens5x5/bm 64.53 ms 64.92 ms +0.6%
queens5x5/bjbt1 72.91 ms 72.77 ms -0.2%
queens5x5/bjbt2 71.11 ms 70.82 ms -0.4%
queens5x5/fc 143.8 ms 144.1 ms +0.2%
30c3db4 97c1471 Change
TOTAL 836.4 ms 834.6 ms -0.2%

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'lists' on '30c3db402' (base) and '97c1471b6' (PR)

Results table
Script 30c3db4 97c1471 Change
sort/ghcSort/50 193.9 μs 192.3 μs -0.8%
sort/ghcSort/100 450.7 μs 446.0 μs -1.0%
sort/ghcSort/150 770.6 μs 775.0 μs +0.6%
sort/ghcSort/200 1.043 ms 1.035 ms -0.8%
sort/ghcSort/250 1.342 ms 1.345 ms +0.2%
sort/ghcSort/300 1.767 ms 1.769 ms +0.1%
sort/insertionSort/50 654.4 μs 655.2 μs +0.1%
sort/insertionSort/100 2.625 ms 2.612 ms -0.5%
sort/insertionSort/150 5.915 ms 5.854 ms -1.0%
sort/insertionSort/200 10.58 ms 10.60 ms +0.2%
sort/insertionSort/250 16.64 ms 16.43 ms -1.3%
sort/insertionSort/300 24.04 ms 23.70 ms -1.4%
sort/mergeSort/50 602.2 μs 603.6 μs +0.2%
sort/mergeSort/100 1.369 ms 1.378 ms +0.7%
sort/mergeSort/150 2.207 ms 2.212 ms +0.2%
sort/mergeSort/200 3.091 ms 3.107 ms +0.5%
sort/mergeSort/250 4.052 ms 4.035 ms -0.4%
sort/mergeSort/300 4.945 ms 4.922 ms -0.5%
sort/quickSort/50 1.585 ms 1.554 ms -2.0%
sort/quickSort/100 6.484 ms 6.351 ms -2.1%
sort/quickSort/150 14.61 ms 14.28 ms -2.3%
sort/quickSort/200 25.90 ms 25.25 ms -2.5%
sort/quickSort/250 40.66 ms 39.76 ms -2.2%
sort/quickSort/300 58.78 ms 57.39 ms -2.4%
sum/compiled-from-Haskell/sum-right-builtin/100 79.79 μs 75.72 μs -5.1%
sum/compiled-from-Haskell/sum-right-builtin/500 414.5 μs 390.1 μs -5.9%
sum/compiled-from-Haskell/sum-right-builtin/1000 878.1 μs 829.1 μs -5.6%
sum/compiled-from-Haskell/sum-right-builtin/2500 2.700 ms 2.546 ms -5.7%
sum/compiled-from-Haskell/sum-right-builtin/5000 5.829 ms 5.477 ms -6.0%
sum/compiled-from-Haskell/sum-right-Scott/100 48.15 μs 48.81 μs +1.4%
sum/compiled-from-Haskell/sum-right-Scott/500 255.9 μs 256.3 μs +0.2%
sum/compiled-from-Haskell/sum-right-Scott/1000 556.5 μs 553.4 μs -0.6%
sum/compiled-from-Haskell/sum-right-Scott/2500 1.880 ms 1.835 ms -2.4%
sum/compiled-from-Haskell/sum-right-Scott/5000 4.594 ms 4.578 ms -0.3%
sum/compiled-from-Haskell/sum-right-data/100 257.7 μs 266.1 μs +3.3%
sum/compiled-from-Haskell/sum-right-data/500 1.420 ms 1.438 ms +1.3%
sum/compiled-from-Haskell/sum-right-data/1000 3.177 ms 3.214 ms +1.2%
sum/compiled-from-Haskell/sum-right-data/2500 8.565 ms 8.673 ms +1.3%
sum/compiled-from-Haskell/sum-right-data/5000 17.90 ms 18.18 ms +1.6%
sum/compiled-from-Haskell/sum-left-builtin/100 78.17 μs 77.50 μs -0.9%
sum/compiled-from-Haskell/sum-left-builtin/500 410.7 μs 397.0 μs -3.3%
sum/compiled-from-Haskell/sum-left-builtin/1000 873.2 μs 838.7 μs -4.0%
sum/compiled-from-Haskell/sum-left-builtin/2500 2.672 ms 2.545 ms -4.8%
sum/compiled-from-Haskell/sum-left-builtin/5000 5.822 ms 5.666 ms -2.7%
sum/compiled-from-Haskell/sum-left-Scott/100 47.67 μs 47.62 μs -0.1%
sum/compiled-from-Haskell/sum-left-Scott/500 253.8 μs 245.8 μs -3.2%
sum/compiled-from-Haskell/sum-left-Scott/1000 552.5 μs 539.9 μs -2.3%
sum/compiled-from-Haskell/sum-left-Scott/2500 1.801 ms 1.737 ms -3.6%
sum/compiled-from-Haskell/sum-left-Scott/5000 4.307 ms 4.350 ms +1.0%
sum/compiled-from-Haskell/sum-left-data/100 264.8 μs 264.2 μs -0.2%
sum/compiled-from-Haskell/sum-left-data/500 1.460 ms 1.455 ms -0.3%
sum/compiled-from-Haskell/sum-left-data/1000 3.284 ms 3.232 ms -1.6%
sum/compiled-from-Haskell/sum-left-data/2500 8.778 ms 8.666 ms -1.3%
sum/compiled-from-Haskell/sum-left-data/5000 18.47 ms 18.37 ms -0.5%
sum/hand-written-PLC/sum-right-builtin/100 52.32 μs 46.99 μs -10.2%
sum/hand-written-PLC/sum-right-builtin/500 263.4 μs 238.7 μs -9.4%
sum/hand-written-PLC/sum-right-builtin/1000 545.0 μs 495.1 μs -9.2%
sum/hand-written-PLC/sum-right-builtin/2500 1.581 ms 1.433 ms -9.4%
sum/hand-written-PLC/sum-right-builtin/5000 3.514 ms 3.266 ms -7.1%
sum/hand-written-PLC/sum-right-Scott/100 37.74 μs 33.58 μs -11.0%
sum/hand-written-PLC/sum-right-Scott/500 194.7 μs 172.8 μs -11.2%
sum/hand-written-PLC/sum-right-Scott/1000 410.9 μs 363.3 μs -11.6%
sum/hand-written-PLC/sum-right-Scott/2500 1.244 ms 1.070 ms -14.0%
sum/hand-written-PLC/sum-right-Scott/5000 3.117 ms 2.852 ms -8.5%
sum/hand-written-PLC/sum-left-builtin/100 54.97 μs 50.49 μs -8.1%
sum/hand-written-PLC/sum-left-builtin/500 271.2 μs 250.5 μs -7.6%
sum/hand-written-PLC/sum-left-builtin/1000 542.7 μs 497.5 μs -8.3%
sum/hand-written-PLC/sum-left-builtin/2500 1.354 ms 1.235 ms -8.8%
sum/hand-written-PLC/sum-left-builtin/5000 2.689 ms 2.466 ms -8.3%
sum/hand-written-PLC/sum-left-Scott/100 41.29 μs 35.60 μs -13.8%
sum/hand-written-PLC/sum-left-Scott/500 206.2 μs 181.2 μs -12.1%
sum/hand-written-PLC/sum-left-Scott/1000 422.5 μs 365.6 μs -13.5%
sum/hand-written-PLC/sum-left-Scott/2500 1.185 ms 1.013 ms -14.5%
sum/hand-written-PLC/sum-left-Scott/5000 2.721 ms 2.450 ms -10.0%
30c3db4 97c1471 Change
TOTAL 348.4 ms 341.6 ms -2.0%

Copy link
Contributor

Click here to check the status of your benchmark.

Copy link
Contributor

Comparing benchmark results of 'lists' on '30c3db402' (base) and '97c1471b6' (PR)

Results table
Script 30c3db4 97c1471 Change
sort/ghcSort/50 193.7 μs 191.0 μs -1.4%
sort/ghcSort/100 448.1 μs 443.4 μs -1.0%
sort/ghcSort/150 774.3 μs 768.9 μs -0.7%
sort/ghcSort/200 1.041 ms 1.024 ms -1.6%
sort/ghcSort/250 1.349 ms 1.335 ms -1.0%
sort/ghcSort/300 1.775 ms 1.755 ms -1.1%
sort/insertionSort/50 657.7 μs 649.7 μs -1.2%
sort/insertionSort/100 2.636 ms 2.584 ms -2.0%
sort/insertionSort/150 5.918 ms 5.779 ms -2.3%
sort/insertionSort/200 10.59 ms 10.32 ms -2.5%
sort/insertionSort/250 16.64 ms 16.17 ms -2.8%
sort/insertionSort/300 24.08 ms 24.12 ms +0.2%
sort/mergeSort/50 599.0 μs 595.8 μs -0.5%
sort/mergeSort/100 1.373 ms 1.357 ms -1.2%
sort/mergeSort/150 2.205 ms 2.185 ms -0.9%
sort/mergeSort/200 3.090 ms 3.063 ms -0.9%
sort/mergeSort/250 4.051 ms 3.993 ms -1.4%
sort/mergeSort/300 4.947 ms 4.878 ms -1.4%
sort/quickSort/50 1.586 ms 1.543 ms -2.7%
sort/quickSort/100 6.464 ms 6.318 ms -2.3%
sort/quickSort/150 14.59 ms 14.19 ms -2.7%
sort/quickSort/200 25.93 ms 25.05 ms -3.4%
sort/quickSort/250 40.72 ms 39.47 ms -3.1%
sort/quickSort/300 58.92 ms 56.99 ms -3.3%
sum/compiled-from-Haskell/sum-right-builtin/100 79.83 μs 75.20 μs -5.8%
sum/compiled-from-Haskell/sum-right-builtin/500 414.6 μs 404.5 μs -2.4%
sum/compiled-from-Haskell/sum-right-builtin/1000 878.9 μs 823.2 μs -6.3%
sum/compiled-from-Haskell/sum-right-builtin/2500 2.706 ms 2.541 ms -6.1%
sum/compiled-from-Haskell/sum-right-builtin/5000 5.839 ms 5.435 ms -6.9%
sum/compiled-from-Haskell/sum-right-Scott/100 48.25 μs 48.45 μs +0.4%
sum/compiled-from-Haskell/sum-right-Scott/500 255.6 μs 254.0 μs -0.6%
sum/compiled-from-Haskell/sum-right-Scott/1000 557.1 μs 549.7 μs -1.3%
sum/compiled-from-Haskell/sum-right-Scott/2500 1.876 ms 1.826 ms -2.7%
sum/compiled-from-Haskell/sum-right-Scott/5000 4.582 ms 4.537 ms -1.0%
sum/compiled-from-Haskell/sum-right-data/100 258.4 μs 258.3 μs -0.0%
sum/compiled-from-Haskell/sum-right-data/500 1.421 ms 1.417 ms -0.3%
sum/compiled-from-Haskell/sum-right-data/1000 3.208 ms 3.170 ms -1.2%
sum/compiled-from-Haskell/sum-right-data/2500 8.638 ms 8.536 ms -1.2%
sum/compiled-from-Haskell/sum-right-data/5000 18.05 ms 17.95 ms -0.6%
sum/compiled-from-Haskell/sum-left-builtin/100 78.39 μs 76.25 μs -2.7%
sum/compiled-from-Haskell/sum-left-builtin/500 415.4 μs 393.7 μs -5.2%
sum/compiled-from-Haskell/sum-left-builtin/1000 879.5 μs 832.4 μs -5.4%
sum/compiled-from-Haskell/sum-left-builtin/2500 2.690 ms 2.534 ms -5.8%
sum/compiled-from-Haskell/sum-left-builtin/5000 5.875 ms 5.644 ms -3.9%
sum/compiled-from-Haskell/sum-left-Scott/100 48.32 μs 46.93 μs -2.9%
sum/compiled-from-Haskell/sum-left-Scott/500 256.7 μs 245.9 μs -4.2%
sum/compiled-from-Haskell/sum-left-Scott/1000 558.2 μs 532.6 μs -4.6%
sum/compiled-from-Haskell/sum-left-Scott/2500 1.817 ms 1.731 ms -4.7%
sum/compiled-from-Haskell/sum-left-Scott/5000 4.343 ms 4.326 ms -0.4%
sum/compiled-from-Haskell/sum-left-data/100 267.3 μs 260.4 μs -2.6%
sum/compiled-from-Haskell/sum-left-data/500 1.479 ms 1.438 ms -2.8%
sum/compiled-from-Haskell/sum-left-data/1000 3.313 ms 3.206 ms -3.2%
sum/compiled-from-Haskell/sum-left-data/2500 8.849 ms 8.567 ms -3.2%
sum/compiled-from-Haskell/sum-left-data/5000 18.65 ms 18.11 ms -2.9%
sum/hand-written-PLC/sum-right-builtin/100 52.76 μs 46.90 μs -11.1%
sum/hand-written-PLC/sum-right-builtin/500 265.4 μs 238.0 μs -10.3%
sum/hand-written-PLC/sum-right-builtin/1000 551.6 μs 491.3 μs -10.9%
sum/hand-written-PLC/sum-right-builtin/2500 1.597 ms 1.420 ms -11.1%
sum/hand-written-PLC/sum-right-builtin/5000 3.528 ms 3.213 ms -8.9%
sum/hand-written-PLC/sum-right-Scott/100 37.48 μs 33.33 μs -11.1%
sum/hand-written-PLC/sum-right-Scott/500 194.2 μs 172.1 μs -11.4%
sum/hand-written-PLC/sum-right-Scott/1000 407.6 μs 361.5 μs -11.3%
sum/hand-written-PLC/sum-right-Scott/2500 1.243 ms 1.063 ms -14.5%
sum/hand-written-PLC/sum-right-Scott/5000 3.112 ms 2.838 ms -8.8%
sum/hand-written-PLC/sum-left-builtin/100 54.92 μs 50.57 μs -7.9%
sum/hand-written-PLC/sum-left-builtin/500 273.1 μs 248.9 μs -8.9%
sum/hand-written-PLC/sum-left-builtin/1000 543.3 μs 494.9 μs -8.9%
sum/hand-written-PLC/sum-left-builtin/2500 1.353 ms 1.231 ms -9.0%
sum/hand-written-PLC/sum-left-builtin/5000 2.685 ms 2.454 ms -8.6%
sum/hand-written-PLC/sum-left-Scott/100 41.32 μs 35.22 μs -14.8%
sum/hand-written-PLC/sum-left-Scott/500 207.4 μs 179.8 μs -13.3%
sum/hand-written-PLC/sum-left-Scott/1000 422.6 μs 368.1 μs -12.9%
sum/hand-written-PLC/sum-left-Scott/2500 1.186 ms 1.019 ms -14.1%
sum/hand-written-PLC/sum-left-Scott/5000 2.721 ms 2.407 ms -11.5%
30c3db4 97c1471 Change
TOTAL 349.4 ms 338.9 ms -3.0%

@wadler
Copy link

wadler commented Jan 30, 2025

Re: @michaelpj comment above. Yes, it is common to require the body of the fix to be a value. Here are the relevant lines from a denotational semantics I wrote recently. Might be worth checking whether requiring the fixpoint body to be a lambda abstraction makes a difference in the benchmarks.

Intrinsically-typed term constructor (where Γ ⊢ A is a term in context Γ of type A).

  μƛ_ :     Γ , A ⇒ B , A ⊢ B
            -----------------
          → Γ ⊢ A ⇒ B

Fixpoint operator.

fixpoint : {{_ : Pointed X}} → (X →ᵐ X) → X
fixpoint f = ⊔ (iterate f ⊥ least)

Denotational semantics

⟦ μƛ N ⟧ᵗ ρ        =  pure (fixpoint {{Pointed⇒}} (λ k v → ⟦ N ⟧ᵗ (ρ , k , v)))

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
AST Evaluation EXPERIMENT Experiments that we probably don't want to merge Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants