Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Position of duplicates in ordered maps #1725

Closed
michaelhkay opened this issue Jan 23, 2025 · 4 comments · Fixed by #1727 · May be fixed by #1740
Closed

Position of duplicates in ordered maps #1725

michaelhkay opened this issue Jan 23, 2025 · 4 comments · Fixed by #1727 · May be fixed by #1740
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. PR Pending A PR has been raised to resolve this issue XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

It became clear to me when writing test cases that the specs aren't entirely clear about what happens when building a map from an input sequence that contains duplicate keys. It says clearly what entry should be created for the duplicated key, but it doesn't say clearly where this entry should appear in the result.

There are four functions/instructions that this applies to: map:merge, map:build, map:of-pairs, and xsl:map.

I propose that in each case, the position of the entry for the duplicated key in the resulting map should correspond to the position of the first occurrence of that key in the input sequence. That is, "order of first appearance": the effect should be the same as if new entries are always created using a map:put() operation.

This might be slightly unexpected in the case of map:merge() with the option duplicates=use-last. It means the value will be that of the last duplicate, but its position will be that of the first duplicate. However, the other three functions/instructions achieve the effect of use-last with the callback on-duplicates=fn{$a, $b){$a} which only controls the value of the entry, and cannot be used to control its position, and I think it makes sense for map:merge with duplicates=use-last to behave in the same way.

Of course we could introduce a separate option to control the position of the combined entry, but I think that would be overkill. xsl:for-each-group and distinct-values both use the "order of first appearance" rule and this has never caused any problems. (group-by in XQuery delivers groups in implementation-dependent order, however).

@michaelhkay
Copy link
Contributor Author

I think we can make this simpler if we define map:merge in terms of map:of-pairs. I think the equivalent becomes:

let $FOJS0003 := QName("http://www.w3.org/2005/xqt-errors", "FOJS0003")
let $duplicates-handler := {
  "use-first": fn($a, $b) { $a },
  "use-last":  fn($a, $b) { $b },
  "combine":   fn($a, $b) { $a, $b },
  "reject":    fn($a, $b) { fn:error($FOJS0003) },
  "use-any":   fn($a, $b) { fn:random-number-generator()?permute(($a, $b))[1] }
}
return map:of-pairs($maps =!> map:pairs(), 
                                   $duplicates-handler?($options?duplicates otherwise 'use-first'))

@ChristianGruen
Copy link
Contributor

I propose that in each case, the position of the entry for the duplicated key in the resulting map should correspond to the position of the first occurrence of that key in the input sequence. That is, "order of first appearance": the effect should be the same as if new entries are always created using a map:put() operation.

I agree: use-last should only affect values, not keys. I checked various projects, and a popular use case for this option is to overwrite values in a default map:

map:merge(
  ($default-map, { 'debug': true(), 'level': 3 }),
  { 'duplicates': use-last' }
)

When it comes to reordering keys, I believe we should rather offer a separate set of functions in a later step (if we want to do so at all).

Another edge case that may need to be refined in the spec is the replacement of different keys that are deemed equal (1 vs 1.0) with map:put:

{ 1: 'integer', 2e0: 'double' }
=> map:put(1.0, 'decimal')

As the stored key is changed by this update, I think we should treat it not as an in-place update, but as a delete-and-insert operation. One advantage of this would be that the resulting order would unveil what has been going on.

I think we can make this simpler if we define map:merge in terms of map:of-pairs. I think the equivalent becomes:
"use-any": fn($a, $b) { fn:random-number-generator()?permute(($a, $b))[1] }

I think that this one should stay implementation-defined (but I have no idea how to express this in pseudo code…).

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. labels Jan 23, 2025
@michaelhkay
Copy link
Contributor Author

Since we are contemplating additional options for map:build and map:of-pairs, perhaps we should move the current $combine parameter into an extensible $options parameter?

@ChristianGruen
Copy link
Contributor

Since we are contemplating additional options for map:build and map:of-pairs, perhaps we should move the current $combine parameter into an extensible $options parameter?

…true; this has never been a mistake in the past.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. PR Pending A PR has been raised to resolve this issue XQFO An issue related to Functions and Operators
Projects
None yet
2 participants