Skip to content

Devise a scheme for scheduler shutdown at process-exit-time. #7702

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
bblum opened this issue Jul 11, 2013 · 6 comments
Closed

Devise a scheme for scheduler shutdown at process-exit-time. #7702

bblum opened this issue Jul 11, 2013 · 6 comments
Labels
A-concurrency Area: Concurrency A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows

Comments

@bblum
Copy link
Contributor

bblum commented Jul 11, 2013

We can't rely on the main task collecting children failures for shutting down the schedulers, because there might be tasks whose exit statuses don't propagate that far (any unlinked task may outlive the main task). We should instead use a global refcount that counts the total task population, or perhaps devise some other scheme that might avoid global synchronization on every spawn/exit.

@brson
Copy link
Contributor

brson commented Jul 19, 2013

I still prefer using a task tree. We can maintain the invariant that all tasks wait on their children even if children don't propagate failure.

@bblum
Copy link
Contributor Author

bblum commented Jul 20, 2013

This seems pricey to me. Even we are optimizing for the fastest possible spawn path, that would require the extra synchronization of tombstoning for tasks that wouldn't otherwise need to do it. For KillHandle, this is a refcount at each end of the task's lifecycle, and (in case of outliving children) an extra lock and allocation. Maintaining population counts would just be one atomic op.

Hmm, actually, I just realized that population-counting can avoid have basically zero contention in the best case by keeping the population counts per-scheduler, and making use of the number-of-schedulers-awake refcount. When the last scheduler goes to sleep, it can scan the refcounts of all other schedulers (0? process exit. nonzero? if debug mode, check for deadlock; otherwise, do nothing). Then cross-CPU contention will only happen (a) on the scheduler-awake refcount, which is only touched when schedulers go to sleep, so it can be a slowpath, and (b) when a scheduler steals work, it will need to steal a refcount from the other scheduler too.

@bblum
Copy link
Contributor Author

bblum commented Jul 20, 2013

No wait, that doesn't make any sense. Refcounts would also have to migrate when a task blocks.

Anyway, task tree maintenance will still be more expensive than one global refcount, both in atomics and in space. True that we would already be doing those ops for some spawn modes, but someone who wants the fastest possible spawn will want to avoid those modes anyway.

Still trying to think if we can do better than a global refcount.

@toddaaro
Copy link
Contributor

What exactly is the issue here? I don't quite see it. The current implementation is such that we shouldn't be getting early shutdown due to how handles to schedulers keep the scheduler alive. If we are missing a case due to how task killing interacts I can start thinking about this.

The hole that might exist is the main task sending Shutdown to all the schedulers while there is still work to do, but as long as work exists we can't lose all the schedulers, but maybe we would lose a fraction of the schedulers with outstanding work. A solution might be to put a set of SchedHandles inside each scheduler struct, and then having the "I am actually really done" logic for a scheduler destruct them. Then when all the scheduler's are done and these handles drop all the schedulers shutdown together.

I'm not opposed to a global refcount though. This is a high-performance scalable datastructure for doing parallel refcounts, so it could be a good fit. If we moved to something like this for shutdown a lot of other logic in the scheduler would become simpler, and we could eliminate the sleeper list in favor of checking this.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.3091&rep=rep1&type=pdf

bblum added a commit to bblum/rust that referenced this issue Aug 22, 2013
@bblum
Copy link
Contributor Author

bblum commented Aug 22, 2013

My latest leading plan is to use a global population counter that is only accessed by the "root" tasks in the "watched task trees", sort of a best-of-both-worlds between my idea and brian's idea.

I attempted to implement this, but we don't have a mechanism for global (cross-scheduler) data except for using static mut variables, and those are broken when running stdtest because we end up with two copies of each one. As such it works fine for any programs run outside of libstd, but it breaks stdtest itself.

However the logic for this plan can be seen at: bblum@ac4c393

@alexcrichton
Copy link
Member

Closing, each pool of schedulers has a shared counter of live tasks with a shared channel back to the task that will shut down the pool. Only after the entire pool of tasks has been drained do the schedulers actually get shut down.

From the looks of it, this issue sounds like it was asking for something along those lines.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-concurrency Area: Concurrency A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows
Projects
None yet
Development

No branches or pull requests

4 participants