Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Filtering by type in lookup expressions #1456

Open
michaelhkay opened this issue Sep 17, 2024 · 10 comments · May be fixed by #1778
Open

Filtering by type in lookup expressions #1456

michaelhkay opened this issue Sep 17, 2024 · 10 comments · May be fixed by #1778
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XPath An issue related to XPath

Comments

@michaelhkay
Copy link
Contributor

We have dropped the syntax ??type(T) for filtering the results of lookup expressions, because of problems with syntax ambiguity. This issue seeks an alternative.

Although selection by type also makes sense with shallow lookup, it is most relevant with deep lookup. The main need arises with intermediate steps of a path such as ?? X ?? Y which gives a dynamic error if X selects something that is not a map or array. This is consistent at one level with // X // Y, except that // X can never select something that isn't a node.

The main problems with filtering using an [. instance of record(p, q)] predicate is that it's very long-winded. For example, if we want to select only those members of a selected array that are sequences of a particular record type, without flattening everything else, we have to write something like ?? values::* ?[. instance of record(p, q)+] ? *, which is a bit of a nightmare.

Starting from the end goal, I would like to be able to write something close to ??record(first, last) to select all the items of this record type at any depth. We know that syntax doesn't work, because ??NCName is already taken. That's also true for ??items::record(first, last), unless we change the rules for what can appear after ::.

Also, there's another syntax hazard: what we want here is a SequenceType, not an ItemType, and that means that it can contain a trailing ? occurrence indicator, which is easily confused with the next lookup operator in a path.

Looking at it from all angles, I do feel the best solution is to prefix the record(first, last) with a marker character so that we know we've got a type filter here. Characters that might do the job include @, #, $, %, ^, ~. Of these, my preference remains ~, for three reasons:

(a) it's currently unused: overloading a different symbol is more likely to cause visual confusion
(b) one of the traditional uses of ~ is to indicate a "matches" or "is kind of like" relationship.
(c) there's a mnemonic association between "tilde" and "type" (compare "at" and "attribute")

@johnlumley
Copy link
Contributor

For named types would we use a construct ??~type(FOO) and for atomics ??~xs:integer or even ??~integer given xs: default?

@michaelhkay
Copy link
Contributor Author

I'm working backwards from the common case of selection using a record type to the more general case (just as path expressions focus on having convenient syntax for the common cases).

But I think we could achieve something like

KeySpecifier ::= .... | "~" SequenceType

but allowing the SequenceType to be in parentheses, or perhaps requiring it to be in parentheses if there is an occurrence indicator, which would make it "~" ( ItemType | "(" SequenceType ")")

@ChristianGruen
Copy link
Contributor

Let’s assume we have XML encoded either in a document or in a “structured item” (which is how we occasionally call maps/arrays internally). Are the following two expressions comparable to some extent / would they both return the element <a/>?

let $doc := document { <a/> }
return $doc / element()
let $struct := [ <a/> ]
return $struct ?~ element()

If we wanted to try to make the syntax accessible to non-experts, would it be fair to present / and ?~ as somewhat equivalent?

@michaelhkay
Copy link
Contributor Author

It would be great if we could agree on a collective term for "maps and arrays". "Structured item" feels too generic to me. I've toyed with terms like "tabulation", "tabula", "composition", "dataset", "compendium", "aggregate".

Perhaps "combo"? It's best to have a word that stands out from the crowd if we can't find one whose meaning is self-explanatory.

With "/", the RHS is always selecting nodes, and we are primarily selecting nodes by nodekind and name, occasionally by type. So we can write a/element(*, xs:integer) but we rarely need to, because element names usually provide the handle that we need. With JSON, we don't have element names, so selecting by type becomes a much more common requirement.

The syntax a/element() works only because element is reserved as a function name. We don't have the luxury of reserving any names after "?" in the same way. Logically we could think of a/element() as an abbreviation for a/~element(), where the ~ can be omitted because element is a reserved name.

@ChristianGruen ChristianGruen added XPath An issue related to XPath Feature A change that introduces a new feature labels Sep 18, 2024
@johnlumley
Copy link
Contributor

Is there any restriction on using something like element as an ItemType name? (I can only see restrictions against using atomic type names). If the are none, then a/~element would be legal (assuming suitable declaration), but somewhat confusing!

@michaelhkay
Copy link
Contributor Author

There's no restriction on using bare NCNames as atomic type names or declared item type names. It's quite legal today to do a/element(element, element).

@dnovatchev
Copy link
Contributor

My first reaction is to use syntax like:

?? X ?? Y::map

or

?? X ?? Y[isMap(.)]

or

? X ?? maps(Y)

or

?? X ?? Y[hasKeys(.)]

Or why not:

?? X ?? map::Y

I am against introducing new, unreadable symbols in the already quite messed symbol-set we are using at present.

Readability must have much higher priority in our design than introducing new, fancy (cryptic) symbols.

@dnovatchev
Copy link
Contributor

And of course, if the proposal for Total Maps is accepted,

Then any constant non-map value can be represented-as / coerced-to a map:

map {
'\' : ()
}      (: produces the empty sequence  for any lookup:)

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Feb 7, 2025

We definitely need a convenient way to do the equivalent of $map??record(long, lat).

We can't use that syntax because ??record(X) is legal and means something else.

I've toyed with all sorts of alternatives and listened to suggestions, and I can't come up with anything better than ~type. So I'm going to raise a PR to that effect.

I'm also inclined to reinforce the association of ~ with a type test by allowing A ~ T as a synonym of instance of, with the unary form ~ T meaning . ~ T; by using ~ in place of type in XSLT type patterns. This would allow, for example:

<xsl:switch select="$x">
   <xsl:when test="~xs:integer" select="..."/>
   <xsl:when test="~xs:float" select="..."/>
   <xsl:when test="~xs:double" select="..."/>
</xsl:switch>

and
<xsl:template match="~record(a, b)">...</xsl:template>

but perhaps that should be a separate proposal.

@dnovatchev
Copy link
Contributor

I can't come up with anything better than ~type.

My preference is very much more for:
??<-<map>->,
??<-<integer>->
Or
??$<map>,
??$<integer>
Or anything that is easy to read and well-protected from a single key-mistyping that results in another correct lexical representation.

@michaelhkay michaelhkay linked a pull request Feb 7, 2025 that will close this issue
@michaelhkay michaelhkay added the PR Pending A PR has been raised to resolve this issue label Feb 7, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XPath An issue related to XPath
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants