-
Notifications
You must be signed in to change notification settings - Fork 32
Memory model review #134
Comments
In the specific case of lock-unlock regions it is customary to allow memory actions outside the critical region to be moved inside it, but in the case of sequentially consistent atomics -- which is all this proposal has -- it would be very surprising to me if they could be reordered with respect to unsynchronizing accesses. Doing so could introduce a data race where there was none. With acquire-release semantics it's different of course, a read could be moved from after the release to before it, which is the situation you're describing, but that's not what we have in this proposal. It's known that we need acquire-release for good performance on some systems, see issue #15, but I consider that future work. Events are, I think, properly limited to shared-memory accesses, wait/wake, and other inter-agent operations, and the prohibition should have no impact on accesses to unshared memory. (Edit: clarified) |
You're right, that definition is wrong. For sequentially consistent atomics we have that the synchronizing events appear in a total order, but the non-synchronizing events do not. That impacts the definition of viability (irrespective of your other remarks about viability).
At a minimum I was a poor writer here. The definition says that the execution order is a total order of the events of an execution of the agent cluster consistent with each agent's program order, and the definition of the program order is insufficiently clear that there are many possible program orders (each one consistent with sequential semantics) and that any of them can be chosen for any execution of the agent. What I wish to express here is that a program order represents a valid execution of a program in an agent (consistent with intra-agent semantics) and that an execution order represents a valid interleaving of a set of programs' program orders in an agent cluster. Validity does not allow for abitrary interleavings of events, just those interleavings that could follow from program actions on possible inputs. (Edit: clarified, several times) |
One more observation: Regarding how "new" memory affects viability; the base case is that memory does not start out shared, it becomes shared when the creator of the memory shares it with another agent. Suppose A creates the memory. A can arrange to initialize a location with an atomic write before it sends the memory to B, and B's read from the location should then be viable: all initializing writes to the memory happened in A before A performed the atomic initializing write, and B observes only that write. Agent A:
Agent B:
A similar situation arises if a location is first used for non-atomic purposes and later becomes atomic (or first used for atomics of one size and later of another size, or if the memory was originally created by an agent C before A or B existed); one agent (A) must initialize the location with an appropriate atomic write and until so has happened another agent (B) cannot access it with an atomic read that is viable. In this case, there must be other synchronization between A and B so that B knows when it is safe to access the memory with an atomic read. (I'm not saying the prose in the spec expresses that, but that is the intent.) |
About viability: The intent is that for a candidate atomic read from locations i..j to be viable (ie, to participate in synchronization) it needs to observe a prior atomic write to i..j without any of the i..j having been written by anything else since that write, be that a non-atomic write to any of the i..j or an atomic write to a subsequence of i..j. However, talking about "prior" and "since" without talking about synchronization is hard, since synchronization establishes agent interdependencies, giving rise to our understanding of time. I thought I had broken that loop with the idea that one observation in one execution of an atomic read of something that was not a matching atomic write makes the read non-viable, but the expression of that idea does not appear to work in its current form. Viability is operationally intended to be a dynamic type system, where we can imagine that each byte in memory carries information about how it was written (non-atomically, or part of an atomic write of N bytes of which it is the Kth) and these tags are updated with the memory. An atomic read of N locations i..j is viable if the tag of i is (atomic,N,0), the tag of i+1 is (atomic,N,1), and so on. One way to view it is that the type system carves reliable "atomic cells" out of the flat memory, for a section of the program. I've been reluctant to go with such an operational definition but at this point it seems worth an attempt. |
Completed. |
Comments from @waldemarhorwat that reached me by email:
My comments:
Such a total order does not exist on modern architectures.
This may make it hard to implement. It's customary to, for example, allow a non-synchronizing read that follows a synchronizing write to move up before the write.
Now I'm a bit confused. The definition of an execution ordered defined "the execution order", not "an execution order". Now I interpret this as there being many, with the only constraint being that they are consistent with the program order on each agent. OK, but then...
...by the above definition, no candidate atomic read that reads something atomically written by another agent will ever be viable. You can always put the reader's read R before the first instruction of all other agents and obtain an execution order consistent with the agents' program orders that puts the read before the write.
I suspect the answer has something to do with the exact meaning of "consistent with the semantics" in the definition of a program order, but elaborating that to include atomic synchronizations risks making the definition circular.
I can kind of see what you're after, but it's not clear how to make it work. For example, assume that all variables start with 0 and run:
Agent A:
Agent B:
Here the atomic read from b done by Agent B is not viable because there exists an execution order in which Agent A's atomic store has not happened yet. OK, so let's fix this by saying that all memory is born pre-written with atomic writes.
The problem remains, in that Agent B's atomic read of b could have seen the atomic store done by Agent A or some initial atomic store of 0 that precedes the execution of both agents. Is it viable or not? I could read your definition of viability both ways and wasn't sure on first reading, but given the context of "After requiring atomic reads to be viable there may still be data races (by the definition below) that involve atomic writes, but none of them will be observed by atomic reads", I see that you intended to make Agent B's atomic load non-viable. That's not workable — it makes the most common patterns of the use of atomics useless.
So let's try to read the definition of viability the other way: an atomic read is viable if it can read the results of one of many different atomic writes, even if they race with each other. You still have a problem with compiling C++ code that uses malloc to dynamically turn memory into atomic or non-atomic. OK, so let's fix that by writing a malloc that initializes all atomic values that it will use. Let's say Agent A does that, as well as destroying the object later (after other synchronization omitted from the example). You still have a problem because an execution order exists in which Agent B doesn't see Agent A doing that; moreover, an execution order exists in which Agent B sees the object after it was destroyed and reused for non-atomic data. You can disallow those execution orders by making them depend on atomic access synchronization, but that makes the definition of an execution order circular.
Given that there are many different execution orders postulated, do they all result in the same single synchronization order? I don't see how that's possible for nontrivial programs. That would mean that all programs that use atomics are deterministic.
At this point I'm stuck. In other languages the notion of synchronization critically depends on what value was read from a variable, whereas this appears to be trying to make synchronization value-independent.
The text was updated successfully, but these errors were encountered: