Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fn:ranks: Produce all ranks in applying a function on the items of a sequence #150

Open
dnovatchev opened this issue Sep 21, 2022 · 7 comments
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 21, 2022

We all know the value and usefulness of functions such as fn:highest() and fn:lowest().

Sometimes, when we need to see all rankings of a particular sorting result, for example the rankings of a sport competition, we realize that highest and lowest are just the highest and lowest ranking-groups from all the rankings.

We define these three overloads for fn:ranks:

fn:ranks($input as item()*) as array(item()*)*

fn:ranks($input as item()*, $collation as xs:string?) as array(item()*)*

fn:ranks($input as item()*, $collation as xs:string?, $key as function(item()) as xs:anyAtomicType*) as array(item()*)*

The rules and semantics for the arguments are the same as those for fn:highest, fn:lowest, fn:sort. What is different is just the result.

Here is one possible XPath implementation and also a complete example:

let $ranks := function(
                $input as item()*,
                $collation as xs:string?,
                $key as function(item()) as xs:anyAtomicType*) as array(item()*)*
 {
    for $v in sort(distinct-values($input ! $key(.)),  $collation)
     return [$input[$key(.) eq $v]]
 },
 
   $inp := (3, 2, 4),
   $keyfun := function($n) {$n mod 2},
   $theRanks :=  $ranks($inp, (), $keyfun),
   $theHighest := $theRanks[last()], 
   $theLowest := $theRanks[1]
 return
 
   ( "Ranks:", $theRanks,
     "=================",
     "Highest:",
     $theHighest,
     "=================",
     "Lowest:",
     $theLowest
   )

The result, as intended is all the rankings (in this case they are just 2 groups of equally-ranked items), then the highest and lowest, extracted as the last and first of the rankings:

Ranks:
[(2, 4)]
[3]
=================
Highest:
[3]
=================
Lowest:
[(2, 4)]
@dnovatchev dnovatchev changed the title [XPath] Function fn:rank. Ranking the result of sorting in groups of items each having the same sort-key value [XPath] Function fn:ranks. Ranking the result of sorting in groups of items each having the same sort-key value Sep 21, 2022
@ChristianGruen ChristianGruen added XPath An issue related to XPath Feature A change that introduces a new feature labels Sep 21, 2022
@michaelhkay
Copy link
Contributor

It would be useful to see some use cases for this function.

This seems to be essentially a combination of group-by() and sort(). It's essentially equivalent, as far as I can see, to

<xsl:for-each-group select="$input" group-by="$key(.)" collation="{$collation}">
  <xsl:sort select="current-grouping-key()"/>
  ....
</xsl:for-each>

or in XQuery

for $item in $input
group by $key($item)
order by $key($item)
return ...

Which raises questions like:

(a) do we actually need a pure XPath / functional way of doing this?
(b) should we follow the example of XSLT and XQuery in treating grouping and sorting as two separate operations, rather than combining them into a single function

My feeling is I'd like to see grouping done separately from sorting:

group-by($input, $key)

where $key is a function that computes a grouping key; the result of

group-by(("apple", "banana", "cherry", "apricot"), ->{characters(.)[1]})

is

map{"a":("apple", "apricot"), "b":"banana", "c":"cherry}

and then we provide a separate function to process a map in sorted order of keys:

map:for-each-sorted($map, $action)

@ChristianGruen ChristianGruen added XQUF An issue related to the XQuery Update Facility and removed XPath An issue related to XPath labels Sep 21, 2022
@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 21, 2022

It would be useful to see some use cases for this function

Here are some of the most-frequent use cases (winner's podium, the 10 most cold/hot and the 10 average-temperature days in a year. etc.)

Large_Speed Cat Podium11

There are 4 different ranking functions in T-SQL, all with their use-cases:

  1. ROW_NUMBER()
  2. RANK()
  3. DENSE_RANK()
  4. NTILE()

The proposed fn:ranks() is analogous to DENSE_RANK

So, all use-cases of the ranking functions (and numerous examples) from T-SQL are also use-cases for fn:ranks()

There are many examples within the existing sample SQL databases, such as in NorthWind, Pubs, AdventureWorks, WideWorldImporters, ..., etc.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 21, 2022

Which raises questions like:

(a) do we actually need a pure XPath / functional way of doing this?

Yes, because the XPath solution is both an XQuery and XSLT solution. If we always provide a single sample implementation (suitable to both XQuery and XSLT users/implementers) then our Specs might shrink to half their current size.

(b) should we follow the example of XSLT and XQuery in treating grouping and sorting as two separate operations, rather than combining them into a single function

Let's see how they did it in T-SQL (SQL Server). They have both sorting (ORDER BY), grouping (GROUP BY) and yes, they have also 4 ranking functions (ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE())

Hmm... Wonder why they did this? And no users are complaining???

@dnovatchev dnovatchev added XQFO An issue related to Functions and Operators and removed XQUF An issue related to the XQuery Update Facility labels Sep 22, 2022
@dnovatchev dnovatchev changed the title [XPath] Function fn:ranks. Ranking the result of sorting in groups of items each having the same sort-key value [XPath] Function fn:ranks(). Ranking the result of sorting in groups of items each having the same sort-key value Sep 24, 2022
@michaelhkay
Copy link
Contributor

I feel this tries to pack too much into one function: it ought to be possible to decompose it into smaller functions that do one thing only.

For grouping by key we have (proposed) map:build().

Next we need a convenient way to sort a map by key. Such an operation needs produce a sequence of map entries as output. We have a bit of a problem here in that there are two competing ways to represent a map entry as an item: it can be a singleton map (a single key-value pair) or it can be a two-entry map (record(key as xs:anyAtomic, value as item()*)). So this becomes map:build(..) => map:entries() => sort(key := ->{?key}).

Then we can augment the sorted sequence of entries $seq with an integer position using $seq => map:put('rank', position()), and we can construct a map from key to position using map:build($entries, ->{?key}, ->{?rank}).

Perhaps the missing primitive we need here is the ability to process a map in sorted key order? A simple function map:for-each-sorted($map, function(key, value, position)) would seem useful with position being an optional parameter we are adding to all such callbacks.

Then the ranks() function becomes map:build($input, key:=EXPR) => map:for-each-sorted(->($key, $value, $pos){map($key, $pos)} => map:merge().

I don't think this is the final answer but I think it suggests some useful primitives for map processing. The critical requirement, I think, is for it to become easier to process a map as a sequence of entries and recombine those entries into a new map.

@dnovatchev
Copy link
Contributor Author

I feel this tries to pack too much into one function: it ought to be possible to decompose it into smaller functions that do one thing only.

The purpose of fn:ranks() is to provide to the XPath developer the same level of convenience, conciseness, compactness, and readability as has been available for decades to the T-SQL programmer. We know that 1% to 10% of people could probably write the equivalent of fn:ranks() from smaller primitives (even in pure XPath 3.1 as shown above).

The goal of this proposal to give everyone a small, compact and simple function to do the same

. . . . .

Then the ranks() function becomes map:build($input, key:=EXPR) => map:for-each-sorted(->($key, $value, $pos){map($key, $pos)} => map:merge().

Just a look at this expression confirms the necessity to have a simpler way of doing this, which is the sole purpose of fn:ranks()

I don't think this is the final answer but I think it suggests some useful primitives for map processing. The critical requirement, I think, is for it to become easier to process a map as a sequence of entries and recombine those entries into a new map.

I definitely agree that we need to have further, more evolved ways of working with maps. This however should not be used as a brake in providing useful functions that are well-established in other DP languages, and have been used en masse for decades.

@ChristianGruen ChristianGruen changed the title [XPath] Function fn:ranks(). Ranking the result of sorting in groups of items each having the same sort-key value fn:ranks: Rank the result of sorting in groups of items each having the same sort-key value Apr 27, 2023
@ChristianGruen
Copy link
Contributor

There hasn't been discussion on this for 15 months. Do we think it will result in a pull request, or should we close it?

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Feb 13, 2024
@dnovatchev dnovatchev removed the Propose Closing with No Action The WG should consider closing this issue with no action label Feb 13, 2024
@dnovatchev
Copy link
Contributor Author

I can write a PR when the time allows it.

@dnovatchev dnovatchev changed the title fn:ranks: Rank the result of sorting in groups of items each having the same sort-key value fn:ranks: Produce all ranks in applying a function on the items of a sequence Feb 17, 2024
@dnovatchev dnovatchev added PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 labels Feb 18, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

3 participants