Timely remote channels assume strong synchronization #63

frankmcsherry · 2017-05-03T15:04:04Z

The set-up of timely communication channels, from timely_communication, make strong assumptions about the synchronization of the workers. In particular, they assume that if a message is received for a channel that does not yet exist, it is safe to spin waiting for it to appear (the associated worker is assumed to also be constructing the same graph at the same moment, perhaps just slower).

This has the potential to go wrong if the worker is for whatever reason blocked, for example on the wrong side of a worker.step_while() call. While the workers are expected to be running equivalent code, slight non-determinism could cause some divergence.

Instead, the channels could probably easily rendezvous, with either end-point creating the appropriate (send, recv) pairs in some common location, and extracting their endpoint from the list. The process-local channels look a bit like this.

I haven't actually seen this happen in practice yet, but we haven't exercised dataflow construction at anything other than start of computation, before workers might diverge on their synchrony. If nothing else, it would be valuable to spec out what is expected to work when, for guidance on writing worker code that doesn't diverge.

The text was updated successfully, but these errors were encountered:

frankmcsherry · 2018-09-16T07:48:06Z

The zero copy allocators in #135 no longer make this assumption, and create shared queues for channels either when constructed or when they first see a message bearing that identifier.

There is still an assumption of determinism in graph construction, but the problem above of channel construction should be resolved.

frankmcsherry closed this as completed Sep 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timely remote channels assume strong synchronization #63

Timely remote channels assume strong synchronization #63

frankmcsherry commented May 3, 2017

frankmcsherry commented Sep 16, 2018

Timely remote channels assume strong synchronization #63

Timely remote channels assume strong synchronization #63

Comments

frankmcsherry commented May 3, 2017

frankmcsherry commented Sep 16, 2018