Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Costing for caseList and caseData #6929

Draft
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

kwxm
Copy link
Contributor

@kwxm kwxm commented Mar 9, 2025

This is an initial attempt to cost the caseList and caseData builtins, but it's turning out to be a little tricky. The problem is that these functions really return a function and a list of arguments for that function and then the evaluator has to carry on and do some more work to turn this into a real application and evaluate it. I used λx.() and λx.λy.() in the benchmarks and assumed that the cost of evaluating these would be minimal, but that may not be true.

Extensive benchmarking shows that both functions are constant time (or at least approximately: see later), but the CPU costs that get inferred from the benchmark results are as follows.

caseList: 2297053
caseData: 4121410

In contrast, the costs for chooseList and chooseData are

chooseList: 132994
chooseData: 94375

(These costs are based on some quite old benchmark results, but I re-ran the benchmarks and got numbers that were pretty similar).

The cost of caseList exceeds that of chooseList by 2164059 units, equivalent to about 135 CEK steps, and the cost of caseData exceeds that of chooseData by 4027035 units, or about 251 CEK steps. If we really use these numbers then it'll probably be cheaper to use the choose functions (and do some extra work) than it is to use the case functions, which kind of defeats the point of adding the builtins.

Why is this happening? The mean times for the raw benchmark results are

chooseList: 1.101 µs
caseList:      3.255 µs

chooseData: 1.4257 µs
caseData:      5.4173 µs

so the case functions are more expensive than the choose ones, but only by a factor of 3 or 4, whereas the inferred costs for the case functions are 17 and 43 times more expensive than for the choose ones. The reason for this is that we try to account for the time taken for the machine to load the function argument in the benchmarks and just cost the time required for actual execution. There are some Nop functions that do nothing except load some arguments and return a constant result, and we subtract these from the raw benchmark results and fit some modelling function to the adjusted data. The time for the two-argument Nop is 0.838µs and for the six-argument Nop it's 1.331 µs, and these are quite close to the raw times for the choose functions, so the adjusted time is pretty small. However, the Nop times are much smaller than the times for the case functions so the adjusted times remain quite large.

I think that we really will have to account for the extra work done by the evaluator after the case functions return, and it now occurs to me that maybe subtracting the Nop times twice would help because the Nop builtins are very similar to the functions that I've supplied to the case builtins in the costing benchmarks. I'll try that and see what happens. [UPDATE: I've tried that and it reduces the cost of caseList from 2790214 to 1328893 and the cost of caseData from 4121410 to 2790214, so it's not that effective.]

It would also be useful to have some realistic benchmarks that make a lot of use of the choose and case builtins so that we can see how the cost model predictions compare to actual execution times.

@kwxm kwxm temporarily deployed to github-pages March 9, 2025 21:58 — with GitHub Actions Inactive
@kwxm kwxm added No Changelog Required Add this to skip the Changelog Check Builtins Costing Anything relating to costs, fees, gas, etc. labels Mar 9, 2025
@kwxm
Copy link
Contributor Author

kwxm commented Mar 9, 2025

There's also a slight peculiarity in the benchmark results for caseData. The results for caseList are pretty uniform, but for caseData they look like this:
caseData-times

The benchmark first applies caseData to 30 Constr objects, then 30 Map objects, then the same for List, I, and B. The vertical red lines separate these and it appears that Map uniformly takes longer than the average time and Constr uniformly takes less than the average (and if anything you'd expect it to take longer, since it returns a deferred application with two arguments and all of the rest only have one). The difference is actually too small to worry about, but I'm mildly curious as to why this might happen (and I've observed a similar pattern over multiple benchmark runs). Maybe it's a sign that I got something wrong ...

@kwxm kwxm requested a review from effectfully March 9, 2025 22:06
@kwxm kwxm marked this pull request as draft March 9, 2025 22:09
@kwxm kwxm temporarily deployed to github-pages March 11, 2025 13:54 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 13, 2025 15:12 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 13, 2025 18:30 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 13, 2025 21:06 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 13, 2025 21:23 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 13, 2025 22:21 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 16, 2025 17:03 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 18, 2025 23:27 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 19, 2025 00:40 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 19, 2025 01:01 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 19, 2025 12:34 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 19, 2025 15:14 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 20, 2025 10:45 — with GitHub Actions Inactive
@kwxm kwxm temporarily deployed to github-pages March 20, 2025 11:59 — with GitHub Actions Inactive
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Builtins Costing Anything relating to costs, fees, gas, etc. No Changelog Required Add this to skip the Changelog Check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant