Skip schedule points if we've done one recently enough #1757

oremanj · 2020-10-12T17:06:41Z

This is a performance improvement that we keep talking about, but I couldn't find an issue for it, except #32 which is broader-reaching.

The idea: checkpoint_if_cancelled() (which is also used by checkpoint() in the not-cancelled case) should not actually yield to the scheduler if it has yielded within the past... some amount of time that is probably in the 0.5ms to 5ms range.

Exception: we should always yield on the first checkpoint after an assignment to Task.coro or Task.context, because there is code in the wild that relies on this as a way to pick up the effects of those assignments. This can be done by making these members into properties that set some flag/etc.

We should measure whether it works better to do this in unrolled_run() (so checkpoint_if_cancelled() remains unchanged, but the scheduler immediately resumes the same task if it hasn't been long enough) or in checkpoint_if_cancelled() (the "when last yielded" would be a member of Task in that case). It depends how the overhead of yielding (which will be worse for deeper callstacks) compares to the overhead of looking up the current task from thread-locals.

It should probably be possible (for the benefit of tests & other code that relies on the reschedule-every-tick assumption) to disable checkpoint skipping in a particular region, e.g. with a context manager that sets some flag on the current task.

The text was updated successfully, but these errors were encountered:

njsmith · 2020-10-12T19:14:51Z

Since this is a pure optimization, we can also consider approximations. E.g., keep a single global (or thread local) record of the last time any cancel_shielded_checkpoint was executed.

We'll also want to think how this affects test determinism and any future pluggable scheduler support. Though those are low-level enough, and cancel_shielded_checkpoint is performance-critical enough, that it might make sense to monkeypatch in a different implementation in those cases.

belm0 · 2020-10-14T11:27:18Z

Selecting some hard-coded threshold sounds... not suitable for the variety of computers x environments in the present and future?

I'd like to be able to run some blessed calibration code on a particular platform that estimates the overhead of a checkpoint (the yield + scheduler's machinery), and suggests a reasonable threshold range. The application needs to decide where it wants to be in that range, trading off wasted CPU cycles vs. scheduling responsiveness.

belm0 · 2020-10-14T11:47:13Z

by the way, we track scheduling responsiveness like this:

# Latency of sleep(0) is used as a proxy for trio scheduler health.
# A large latency implies that there are tasks performing too much work
# between checkpoints.
async for _ in periodic(1 / 10):
    start_time = trio.current_time()
    await trio.sleep(0)
    trio_scheduling_latency_metric.observe(trio.current_time() - start_time)

and typical output is:

i.e. typical median pass of the scheduler is 500 usec (on a measly low-powered i5, and we have a fair number of active tasks at any moment), and as a soft real-time app that's kind of important

A5rocks · 2024-09-02T06:54:32Z

Selecting some hard-coded threshold sounds... not suitable for the variety of computers x environments in the present and future?

I'd like to be able to run some blessed calibration code on a particular platform that estimates the overhead of a checkpoint (the yield + scheduler's machinery), and suggests a reasonable threshold range. The application needs to decide where it wants to be in that range, trading off wasted CPU cycles vs. scheduling responsiveness.

Should it be possible for users to pass a function (+ extra data field on trio.lowlevel.Task and pass the task to the function) that decides whether to skip or not, and provide a trio.skip_checkpoints_within(time: int) implementation that returns a good default? That would handle any sort of testing concerns. I'm concerned it's a bit too abstract but it might work well.

If that's not too abstract, I'm not sure where exactly we should allow this function to be passed. Individual start_soon calls? (I mean, this would force changing checkpoint_cancelled instead of unrolled_run) At the level of nurseries? At trio.run like instruments? There's a pretty large matrix of possibilities if this level of configuration is what we want (I'm not sure it is).

Alternatively, just some sort of skip_checkpoints_within: float argument on trio.run would allow tuning, not induce the paradox of choice, be faster, and allow testing (just pass 0 I guess?).

oremanj added performance todo soon labels Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip schedule points if we've done one recently enough #1757

Skip schedule points if we've done one recently enough #1757

oremanj commented Oct 12, 2020 •

edited

Loading

njsmith commented Oct 12, 2020

belm0 commented Oct 14, 2020 •

edited

Loading

belm0 commented Oct 14, 2020

A5rocks commented Sep 2, 2024 •

edited

Loading

Skip schedule points if we've done one recently enough #1757

Skip schedule points if we've done one recently enough #1757

Comments

oremanj commented Oct 12, 2020 • edited Loading

njsmith commented Oct 12, 2020

belm0 commented Oct 14, 2020 • edited Loading

belm0 commented Oct 14, 2020

A5rocks commented Sep 2, 2024 • edited Loading

oremanj commented Oct 12, 2020 •

edited

Loading

belm0 commented Oct 14, 2020 •

edited

Loading

A5rocks commented Sep 2, 2024 •

edited

Loading