Skip to content

Timely remote channels assume strong synchronization #63

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
frankmcsherry opened this issue May 3, 2017 · 1 comment
Closed

Timely remote channels assume strong synchronization #63

frankmcsherry opened this issue May 3, 2017 · 1 comment

Comments

@frankmcsherry
Copy link
Member

The set-up of timely communication channels, from timely_communication, make strong assumptions about the synchronization of the workers. In particular, they assume that if a message is received for a channel that does not yet exist, it is safe to spin waiting for it to appear (the associated worker is assumed to also be constructing the same graph at the same moment, perhaps just slower).

This has the potential to go wrong if the worker is for whatever reason blocked, for example on the wrong side of a worker.step_while() call. While the workers are expected to be running equivalent code, slight non-determinism could cause some divergence.

Instead, the channels could probably easily rendezvous, with either end-point creating the appropriate (send, recv) pairs in some common location, and extracting their endpoint from the list. The process-local channels look a bit like this.

I haven't actually seen this happen in practice yet, but we haven't exercised dataflow construction at anything other than start of computation, before workers might diverge on their synchrony. If nothing else, it would be valuable to spec out what is expected to work when, for guidance on writing worker code that doesn't diverge.

@frankmcsherry
Copy link
Member Author

The zero copy allocators in #135 no longer make this assumption, and create shared queues for channels either when constructed or when they first see a message bearing that identifier.

There is still an assumption of determinism in graph construction, but the problem above of channel construction should be resolved.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant