Skip to content

Added local queue scheduling and "next_task" optimization #22

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 41 commits into from

Conversation

nullchinchilla
Copy link

Two major changes significantly improve performance:

  • When Executor::run() is called, a handle to the local queue and ticker are cached into TLS. This lets tasks schedule to a thread-local queue rather than always to the global queue.
  • Within the local queue, we implement a next_task optimization (see https://tokio.rs/blog/2019-10-scheduler) to greatly reduce context-switch costs in message-passing patterns. We avoid putting the same task into next_task twice to avoid starvation.

Through both unit testing and production deployment in https://github.com/geph-official/geph4, whose QUIC-like sosistab protocol is structured in an actor-like fashion that greatly stresses the scheduler, I see significant improvements in real-world throughput (up to 30%, and this is in a server dominated by cryptography CPU usage) and massive improvements in microbenchmarks (up to 10x faster in the yield_now benchmark and similar context-switch benchmarks). I see no downsides --- the code should gracefully fall back to pushing to the global queeu in case e.g. nesting Executors invalidates the TLS cache.

I also added criterion benchmarks.

@notgull
Copy link
Member

notgull commented Oct 17, 2023

@nullchinchilla Would you be open to rebasing/cutting down on this PR? These optimizations are important and I would be open to reviewing it.

@nullchinchilla
Copy link
Author

I've actually decided on a different course, since I've realized that local scheduling and an unstealable next_task cell can cause issues (such as deadlocks if we nest smol::block_on).

You can check my latest executor work in the "smolscale" crate, which uses smol::Task as well and is fully compatible with the smol-rs ecosystem, but to be easier to optimize forces a global executor.

@notgull
Copy link
Member

notgull commented Oct 18, 2023

Thanks for letting me know! I think that this crate should act more as a "reference" executor, that aims to implement features rather than be as optimal as possible. I'll close this for now.

@notgull notgull closed this Oct 18, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants