Skip to content

Skip schedule points if we've done one recently enough #1757

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
oremanj opened this issue Oct 12, 2020 · 4 comments
Open

Skip schedule points if we've done one recently enough #1757

oremanj opened this issue Oct 12, 2020 · 4 comments

Comments

@oremanj
Copy link
Member

oremanj commented Oct 12, 2020

This is a performance improvement that we keep talking about, but I couldn't find an issue for it, except #32 which is broader-reaching.

The idea: checkpoint_if_cancelled() (which is also used by checkpoint() in the not-cancelled case) should not actually yield to the scheduler if it has yielded within the past... some amount of time that is probably in the 0.5ms to 5ms range.

Exception: we should always yield on the first checkpoint after an assignment to Task.coro or Task.context, because there is code in the wild that relies on this as a way to pick up the effects of those assignments. This can be done by making these members into properties that set some flag/etc.

We should measure whether it works better to do this in unrolled_run() (so checkpoint_if_cancelled() remains unchanged, but the scheduler immediately resumes the same task if it hasn't been long enough) or in checkpoint_if_cancelled() (the "when last yielded" would be a member of Task in that case). It depends how the overhead of yielding (which will be worse for deeper callstacks) compares to the overhead of looking up the current task from thread-locals.

It should probably be possible (for the benefit of tests & other code that relies on the reschedule-every-tick assumption) to disable checkpoint skipping in a particular region, e.g. with a context manager that sets some flag on the current task.

@njsmith
Copy link
Member

njsmith commented Oct 12, 2020

Since this is a pure optimization, we can also consider approximations. E.g., keep a single global (or thread local) record of the last time any cancel_shielded_checkpoint was executed.

We'll also want to think how this affects test determinism and any future pluggable scheduler support. Though those are low-level enough, and cancel_shielded_checkpoint is performance-critical enough, that it might make sense to monkeypatch in a different implementation in those cases.

@belm0
Copy link
Member

belm0 commented Oct 14, 2020

Selecting some hard-coded threshold sounds... not suitable for the variety of computers x environments in the present and future?

I'd like to be able to run some blessed calibration code on a particular platform that estimates the overhead of a checkpoint (the yield + scheduler's machinery), and suggests a reasonable threshold range. The application needs to decide where it wants to be in that range, trading off wasted CPU cycles vs. scheduling responsiveness.

@belm0
Copy link
Member

belm0 commented Oct 14, 2020

by the way, we track scheduling responsiveness like this:

# Latency of sleep(0) is used as a proxy for trio scheduler health.
# A large latency implies that there are tasks performing too much work
# between checkpoints.
async for _ in periodic(1 / 10):
    start_time = trio.current_time()
    await trio.sleep(0)
    trio_scheduling_latency_metric.observe(trio.current_time() - start_time)

and typical output is:
Screen Shot 2020-10-14 at 8 40 48 PM

i.e. typical median pass of the scheduler is 500 usec (on a measly low-powered i5, and we have a fair number of active tasks at any moment), and as a soft real-time app that's kind of important

@A5rocks
Copy link
Contributor

A5rocks commented Sep 2, 2024

Selecting some hard-coded threshold sounds... not suitable for the variety of computers x environments in the present and future?

I'd like to be able to run some blessed calibration code on a particular platform that estimates the overhead of a checkpoint (the yield + scheduler's machinery), and suggests a reasonable threshold range. The application needs to decide where it wants to be in that range, trading off wasted CPU cycles vs. scheduling responsiveness.

Should it be possible for users to pass a function (+ extra data field on trio.lowlevel.Task and pass the task to the function) that decides whether to skip or not, and provide a trio.skip_checkpoints_within(time: int) implementation that returns a good default? That would handle any sort of testing concerns. I'm concerned it's a bit too abstract but it might work well.

If that's not too abstract, I'm not sure where exactly we should allow this function to be passed. Individual start_soon calls? (I mean, this would force changing checkpoint_cancelled instead of unrolled_run) At the level of nurseries? At trio.run like instruments? There's a pretty large matrix of possibilities if this level of configuration is what we want (I'm not sure it is).


Alternatively, just some sort of skip_checkpoints_within: float argument on trio.run would allow tuning, not induce the paradox of choice, be faster, and allow testing (just pass 0 I guess?).

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants