From 889e63ffcfc3f6e3d2c1f43b382e8a991fb89524 Mon Sep 17 00:00:00 2001
From: Andy Seaborne <andy@apache.org>
Date: Wed, 29 Jan 2025 17:32:10 +0000
Subject: [PATCH] SPARQL String. Unicode escapes exclude surrogates

---
 spec/index.html | 62 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 18 deletions(-)
diff --git a/spec/index.html b/spec/index.html
index 6a1582a..f0c4765 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -10510,30 +10510,49 @@ <h4>Notes</h4>
       <h2>SPARQL Grammar</h2>
       <p>The SPARQL grammar covers both SPARQL Query and [[[SPARQL11-UPDATE]]].</p>
       <section id="queryString">
-        <h3>SPARQL Request String</h3>
+        <h3>SPARQL String</h3>
         <p>
-          A <dfn data-lt="SPARQLRequestString">SPARQL Request String</dfn> is
-          a <a>SPARQL Query String</a> or <a>SPARQL Update String</a> and is a Unicode character string
-          (c.f. section 6.1 String concepts of [[CHARMOD]]) in the language defined by the following
-          grammar.</p>
+          <span id="defn_SPARQLRequestString"></span>
+          A <dfn>SPARQL string</dfn> is an
+          <a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> that
+          conforms to the grammar given in this section.
+        </p>
+        <p class="note">
+          An <a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> is
+          a sequence of 
+          <a data-cite="I18N-GLOSSARY#dfn-code-point" class="lint-ignore">Unicode code points</a>
+          which are <a data-cite="I18N-GLOSSARY#dfn-scalar-value" class="lint-ignore">Unicode scalar values</a>.
+          Unicode scalar values do not include the
+          <a data-cite="I18N-GLOSSARY#dfn-surrogate" class="lint-ignore">surrogate code points</a>.
+        </p>
         <p>
-          A <dfn data-lt="SPARQLQueryString">SPARQL Query String</dfn> starts
-          at the <a href="#rQueryUnit">QueryUnit</a> production.</p>
+          <span id="defn_SPARQLQueryString"></span>
+          A <dfn>SPARQL query string</dfn> is a
+          <a>SPARQL Request String</a> that conforms to the grammar starting at 
+          the <a href="#rQueryUnit">QueryUnit</a> production.
+        </p>
         <p>
-          A <dfn data-lt="SPARQLUpdateString">SPARQL Update String</dfn> starts
-          at the <a href="#rUpdateUnit">UpdateUnit</a> production.</p>
-        <p>For compatibility with future versions of Unicode, the characters in this string may
+          <span id="defn_SPARQLUpdateString"></span>
+          A <dfn>SPARQL update string</dfn> is a 
+          <a>SPARQL Request String</a> that conforms to the grammar starting at 
+          the <a href="#rUpdateUnit">UpdateUnit</a> production.
+        </p>
+        <p>
+          For compatibility with future versions of Unicode, the characters in this string may
           include Unicode codepoints that are unassigned as of the date of this publication (see
           [[[UAX31]]] [[UAX31]] section 4 Pattern Syntax). For productions with excluded character
           classes (for example <code>[^&lt;&gt;'{}|^`]</code>), the characters are excluded from the
-          range <code>#x0 - #x10FFFF</code>.</p>
+          range <code>#x0 - #x10FFFF</code>.
+        </p>
       </section>
 
       <section id="codepointEscape">
         <h3>Codepoint Escape Sequences</h3>
-        <p>A SPARQL Query String is processed for codepoint escape sequences before parsing by the
+        <p>
+          A <a>SPARQL string</a> is processed for codepoint escape sequences before parsing by the
           grammar defined in EBNF below. The codepoint escape sequences for a SPARQL query string
-          are:</p>
+          are:
+        </p>
         <span class="doc-ref" id="table68"></span>
         <table title="Codepoint escapes">
           <colgroup>
@@ -10551,7 +10570,9 @@ <h3>Codepoint Escape Sequences</h3>
                 <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
               </td>
               <td>A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the
-                encoded hexadecimal value.</td>
+                encoded hexadecimal value, excluding U+D800 to U+DFFF, the 
+                <a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.
+              </td>
             </tr>
             <tr>
               <td>
@@ -10559,7 +10580,9 @@ <h3>Codepoint Escape Sequences</h3>
                 <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
               </td>
               <td>A Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the
-                encoded hexadecimal value.</td>
+                encoded hexadecimal value, excluding U+D800 to U+DFFF, the 
+                <a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.
+                
             </tr>
           </tbody>
         </table>
@@ -10572,13 +10595,16 @@ <h3>Codepoint Escape Sequences</h3>
           &lt;ab\u00E9xy&gt;        # Codepoint 00E9 is Latin small e with acute - é
           \u03B1:a            # Codepoint x03B1 is Greek small alpha - α
           a\u003Ab            # a:b -- codepoint x3A is colon</pre>
-        <p>Codepoint escape sequences can appear anywhere in the query string. They are processed
+        <p>
+          Codepoint escape sequences can appear anywhere in the query string. They are processed
           before parsing based on the grammar rules and so may be replaced by codepoints with
-          significance in the grammar, such as "<code>:</code>" marking a prefixed name.</p>
+          significance in the grammar, such as "<code>:</code>" marking a prefixed name.
+        </p>
         <p>These escape sequences are not included in the grammar below. Only escape sequences for
           characters that would be legal at that point in the grammar may be given. For example, the
           variable "<code>?x\u0020y</code>" is not legal (<code>\u0020</code> is a space and is not
-          permitted in a variable name).</p>
+          permitted in a variable name).
+        </p>
       </section>
       <section id="whitespace">
         <h3>White Space</h3>

A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the - encoded hexadecimal value.
@@ -10559,7 +10580,9 @@ Codepoint Escape Sequences HEX HEX HEX HEX HEX HEX	A Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the - encoded hexadecimal value.