-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Merge is not idempotent #1187
Comments
My inclination is: let's do pointwise merging on arrays. I have thought about it all of 2min, and I'm not saying that it's a particularly smart answer. But it's where I'm at right now. Concatenation, for instance, is very much not idempotent. There is a question of how we can allow custom “datatypes” with their own merge function in the future (and how to make it not be terrible, which is really non-obvious). I think concatenation (by which we really mean union of sets) could be one of these. |
Pointwise merging sounds mathematically natural, but I think the behaviour sounds like it could be quite counter-intuitive to an end-user. It really seems like the behaviour you want here is largely dependent on the domain the array is representing, which I guess is another vote for some form of custom merge function. In in the meantime I'd be in favour of doing the least surprising thing. I think it's a lot easier for a layperson to understand "arrays merge iff they're equal" than to understand "arrays use pointwise merging, which means the exact behaviour depends on the contents of the arrays & how those merge". |
The problem is that testing for equality is strict. And may cause massive evaluations in edge cases. I'm not sure that it is quite so reasonable. There is most definitely no perfect solution there. Arrays are just trickier than they first appear. |
Ah, that's a good point. I suppose we could implement some sort of lazy pointwise equality contract(s) instead, by deferring the equality check until the array members are actually accessed? The main concern for me is situations like:
where a pointwise merge leads to an error that wouldn't necessarily be simple to diagnose. Even if a lazy equality check is more complex to implement, I still think the logic is simpler to understand & would lead to better diagnostics for the above kind of mistake. That said, I guess this could be somewhat mitigated by investing in good tutorial content around merging. |
This is fair enough. It's not harder than lazy contract checking or anything. So it could be a good non-committal approach. |
Out of curiosity, what is the behaviour you believe that a user expects in this situation? |
I don't know that users will have a specific general expectation of what should happen here, but I can certainly imagine situations where one would write code like the above assuming it would concatenate the arrays, because it's the most immediate way to "merge" the values. I think that users will likely have different expectations depending on the specific domain they're representing, and - at least for me personally - pointwise merging is unlikely to be the most intuitive option. I also think it's quite likely many of these expectations will not be idempotent, but that's probably something to confront later if/when we think about overriding merge behaviour. This is broadly why I think it makes sense to be as restrictive as possible here, at least initially. |
I feel like point-wise "lazy" equality testing does sound like a reasonable compromise. As @aspiwack says, it's less committing. Technically, I don't expect it to be hard to implement: merge already exists in two different modes (normal merging and contract application), I think lazy equality would correspond to a form of symmetric contract application, which could be a third mode (the original one is: the left-hand side mustn't have fields that aren't in the right-hand side, while the "lazy equality" would be fields must be exactly the same). As often, the most challenging part might be to provide good error reporting, because it involves threading error reporting data alongside recursive merges. Maybe we can make use of labels here as well?
In principle, this is answered already in RFC001 using the custom merge functions approach. It hasn't been implemented yet, and has been postponed to post 1.0, being nice but not strictly necessary to write useful programs. Custom merge functions must be "semantically" associative, idempotent and commutative (e.g. representing stuff as arrays and doing concatenation is fine if the associated final contract, or the consuming system outside of the boundaries of Nickel, does de-duplication and sorting, or doesn't care about duplication and order). |
This is what I'd expect. It's essentially a contract application (we can think of equality as a binary contract. Then, partially applied equality is a regular contract. So we should have the machinerie to give reasonable error messages for this). |
Describe the bug
The merge operation is advertised as being idempotent, but it is currently not. In particular, non-empty arrays are not mergeable, even if they are the same. It's probably fine not to be idempotent on functions (or is it?), but it's a real problem on data, as detailed through actual examples below.
To Reproduce
Here is a typical example of non idempotency:
It actually happens in real life, in particular with default values. If a contract sets a default value which is a non-empty array, and the contract happened to be applied several times, then we get the following. For example, in Nixel:
BashShell
defines anenv
field, that automatically compute the environment from thestructured_env
fieldRustShell
orCShell
"derives" from this base builder, that isRustShell = BashShell & {...}
.MyShell = {...} & RustShell & CShell
evaluates toMyShell = {...} & BashShell & { ...def of Rust shell.. } & BashShell & { ... def of CShell ... }
, leading to two occurrences ofBashShell
.So, "contract inheritance" can lead to the same contract being applied several time. Different parts of the code (a framework consuming a configuration versus the configuration itself) may also have to ensure that some contract is applied, and can't rely on the fact that another part will do it.
Expected behavior
Merging pure data (excluding functions, at least as a first step) should be idempotent.
Possible solutions
Until now, we've refrained from merging arrays, because it might not correspond to what the user has in mind (typically, concatenation makes more sense that pointwise merging in a lot of concrete applications). But this is becoming a problem. Here are possible solutions:
0
,1
, ...,n-1
(wheren
the length). This would naturally make merging idempotent on arrays, and doesn't sound far-fetched. However, it could lead to unwilling merge to actually proceed without errors (that being said, we have a type and contract system for that).In practice, it seems most problematic merges are actually merging the exact same definition (pointing to the same part of the AST). They could in principle be elided by an optimization which checks, before merging, if two values are physically equal, in which case the merging would be eliminated altogether (we already do that for equality testing). However, at least currently, this optimization would break referential transparency and makes the semantics quite fragile. As long as one stays within the zone where the interpreter can prove two things are physically equal, everything works fine. But if we add one indirection and get out of this zone, suddenly the semantically identical config refuses to evaluate. Thus, I see this as a possible addendum to one of the two previous solutions: only if merge is actually idempotent, this optimization is sound and doesn't change the behavior. We shouldn't rely on this optimizations for program to have the right behavior.
The text was updated successfully, but these errors were encountered: