From 889e63ffcfc3f6e3d2c1f43b382e8a991fb89524 Mon Sep 17 00:00:00 2001 From: Andy Seaborne Date: Wed, 29 Jan 2025 17:32:10 +0000 Subject: [PATCH] SPARQL String. Unicode escapes exclude surrogates --- spec/index.html | 62 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 18 deletions(-) diff --git a/spec/index.html b/spec/index.html index 6a1582a..f0c4765 100644 --- a/spec/index.html +++ b/spec/index.html @@ -10510,30 +10510,49 @@

Notes

SPARQL Grammar

The SPARQL grammar covers both SPARQL Query and [[[SPARQL11-UPDATE]]].

-

SPARQL Request String

+

SPARQL String

- A SPARQL Request String is - a SPARQL Query String or SPARQL Update String and is a Unicode character string - (c.f. section 6.1 String concepts of [[CHARMOD]]) in the language defined by the following - grammar.

+ + A SPARQL string is an + RDF string that + conforms to the grammar given in this section. +

+

+ An RDF string is + a sequence of + Unicode code points + which are Unicode scalar values. + Unicode scalar values do not include the + surrogate code points. +

- A SPARQL Query String starts - at the QueryUnit production.

+ + A SPARQL query string is a + SPARQL Request String that conforms to the grammar starting at + the QueryUnit production. +

- A SPARQL Update String starts - at the UpdateUnit production.

-

For compatibility with future versions of Unicode, the characters in this string may + + A SPARQL update string is a + SPARQL Request String that conforms to the grammar starting at + the UpdateUnit production. +

+

+ For compatibility with future versions of Unicode, the characters in this string may include Unicode codepoints that are unassigned as of the date of this publication (see [[[UAX31]]] [[UAX31]] section 4 Pattern Syntax). For productions with excluded character classes (for example [^<>'{}|^`]), the characters are excluded from the - range #x0 - #x10FFFF.

+ range #x0 - #x10FFFF. +

Codepoint Escape Sequences

-

A SPARQL Query String is processed for codepoint escape sequences before parsing by the +

+ A SPARQL string is processed for codepoint escape sequences before parsing by the grammar defined in EBNF below. The codepoint escape sequences for a SPARQL query string - are:

+ are: +

@@ -10551,7 +10570,9 @@

Codepoint Escape Sequences

HEXHEX + encoded hexadecimal value, excluding U+D800 to U+DFFF, the + surrogate code points. + + encoded hexadecimal value, excluding U+D800 to U+DFFF, the + surrogate code points. +
A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the - encoded hexadecimal value.
@@ -10559,7 +10580,9 @@

Codepoint Escape Sequences

HEX HEX HEX HEX HEX HEX
A Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the - encoded hexadecimal value.
@@ -10572,13 +10595,16 @@

Codepoint Escape Sequences

<ab\u00E9xy> # Codepoint 00E9 is Latin small e with acute - é \u03B1:a # Codepoint x03B1 is Greek small alpha - α a\u003Ab # a:b -- codepoint x3A is colon -

Codepoint escape sequences can appear anywhere in the query string. They are processed +

+ Codepoint escape sequences can appear anywhere in the query string. They are processed before parsing based on the grammar rules and so may be replaced by codepoints with - significance in the grammar, such as ":" marking a prefixed name.

+ significance in the grammar, such as ":" marking a prefixed name. +

These escape sequences are not included in the grammar below. Only escape sequences for characters that would be legal at that point in the grammar may be given. For example, the variable "?x\u0020y" is not legal (\u0020 is a space and is not - permitted in a variable name).

+ permitted in a variable name). +

White Space