From b01d508c3ee4548710f660415f5497450289e976 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kat=20March=C3=A1n?= <kzm@zkat.tech>
Date: Sun, 19 Jan 2025 14:30:15 -0800
Subject: [PATCH] =?UTF-8?q?Create=20spec-text=20for=20=E2=80=9Cclassic?=
 =?UTF-8?q?=E2=80=9D=20RFC=20experience?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 src/spec-text | 1512 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1512 insertions(+)
 create mode 100644 src/spec-text

diff --git a/src/spec-text b/src/spec-text
new file mode 100644
index 0000000..3709f66
--- /dev/null
+++ b/src/spec-text
@@ -0,0 +1,1512 @@
+
+
+
+
+KDL Community                                                 K. Marchán
+                                                               Microsoft
+                                                        KDL Contributors
+                                                         19 January 2025
+
+
+                       The KDL Document Language
+                       draft-marchan-kdl2-latest
+
+Abstract
+
+   KDL is a node-oriented document language.  Its niche and purpose
+   overlaps with XML, and as do many of its semantics.  You can use KDL
+   both as a configuration language, and a data exchange or storage
+   format, if you so choose.
+
+   This is the formal specification for KDL, including the intended data
+   model and the grammar.
+
+   This document describes KDL version KDL 2.0.0.  It was released on
+   2024-12-21.  It is the latest stable version of the language, and
+   will only be edited for minor copyedits or major errata.
+
+About This Document
+
+   This note is to be removed before publishing as an RFC.
+
+   Status information for this document may be found at
+   https://datatracker.ietf.org/doc/draft-marchan-kdl2/.
+
+   information can be found at https://kdl.dev/.
+
+   Source for this draft and an issue tracker can be found at
+   https://github.com/kdl-org/kdl.
+
+License
+
+   This work is licensed under Creative Commons Attribution-ShareAlike
+   4.0 International.  To view a copy of this license, visit
+   https://creativecommons.org/licenses/by-sa/4.0/
+
+Table of Contents
+
+   1.  Compatibility . . . . . . . . . . . . . . . . . . . . . . . .   3
+   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
+   3.  Components  . . . . . . . . . . . . . . . . . . . . . . . . .   3
+     3.1.  Document  . . . . . . . . . . . . . . . . . . . . . . . .   3
+
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 1]
+
+                                   KDL                      January 2025
+
+
+       3.1.1.  Example . . . . . . . . . . . . . . . . . . . . . . .   4
+     3.2.  Node  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
+       3.2.1.  Example . . . . . . . . . . . . . . . . . . . . . . .   5
+     3.3.  Line Continuation . . . . . . . . . . . . . . . . . . . .   5
+       3.3.1.  Example . . . . . . . . . . . . . . . . . . . . . . .   5
+     3.4.  Property  . . . . . . . . . . . . . . . . . . . . . . . .   5
+     3.5.  Argument  . . . . . . . . . . . . . . . . . . . . . . . .   6
+       3.5.1.  Example . . . . . . . . . . . . . . . . . . . . . . .   6
+     3.6.  Children Block  . . . . . . . . . . . . . . . . . . . . .   6
+       3.6.1.  Example . . . . . . . . . . . . . . . . . . . . . . .   6
+     3.7.  Value . . . . . . . . . . . . . . . . . . . . . . . . . .   6
+     3.8.  Type Annotation . . . . . . . . . . . . . . . . . . . . .   7
+       3.8.1.  Reserved Type Annotations for Numbers Without
+               Decimals: . . . . . . . . . . . . . . . . . . . . . .   7
+       3.8.2.  Reserved Type Annotations for Numbers With
+               Decimals: . . . . . . . . . . . . . . . . . . . . . .   8
+       3.8.3.  Reserved Type Annotations for Strings:  . . . . . . .   8
+       3.8.4.  Examples  . . . . . . . . . . . . . . . . . . . . . .   9
+     3.9.  String  . . . . . . . . . . . . . . . . . . . . . . . . .   9
+     3.10. Identifier String . . . . . . . . . . . . . . . . . . . .  10
+       3.10.1.  Non-initial characters . . . . . . . . . . . . . . .  10
+       3.10.2.  Non-identifier characters  . . . . . . . . . . . . .  11
+     3.11. Quoted String . . . . . . . . . . . . . . . . . . . . . .  11
+       3.11.1.  Escapes  . . . . . . . . . . . . . . . . . . . . . .  11
+     3.12. Multi-line String . . . . . . . . . . . . . . . . . . . .  13
+       3.12.1.  Newline Normalization  . . . . . . . . . . . . . . .  14
+       3.12.2.  Examples . . . . . . . . . . . . . . . . . . . . . .  14
+       3.12.3.  Interaction with Whitespace Escapes  . . . . . . . .  16
+     3.13. Raw String  . . . . . . . . . . . . . . . . . . . . . . .  17
+       3.13.1.  Example  . . . . . . . . . . . . . . . . . . . . . .  17
+     3.14. Number  . . . . . . . . . . . . . . . . . . . . . . . . .  18
+       3.14.1.  Keyword Numbers  . . . . . . . . . . . . . . . . . .  19
+     3.15. Boolean . . . . . . . . . . . . . . . . . . . . . . . . .  19
+       3.15.1.  Example  . . . . . . . . . . . . . . . . . . . . . .  19
+     3.16. Null  . . . . . . . . . . . . . . . . . . . . . . . . . .  20
+       3.16.1.  Example  . . . . . . . . . . . . . . . . . . . . . .  20
+     3.17. Whitespace  . . . . . . . . . . . . . . . . . . . . . . .  20
+       3.17.1.  Single-line comments . . . . . . . . . . . . . . . .  21
+       3.17.2.  Multi-line comments  . . . . . . . . . . . . . . . .  22
+       3.17.3.  Slashdash comments . . . . . . . . . . . . . . . . .  22
+     3.18. Newline . . . . . . . . . . . . . . . . . . . . . . . . .  22
+     3.19. Disallowed Literal Code Points  . . . . . . . . . . . . .  23
+   4.  Full Grammar  . . . . . . . . . . . . . . . . . . . . . . . .  24
+     4.1.  Grammar language  . . . . . . . . . . . . . . . . . . . .  26
+   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  27
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 2]
+
+                                   KDL                      January 2025
+
+
+1.  Compatibility
+
+   KDL 2.0 is designed such that for any given KDL document written as
+   KDL 1.0 (./SPEC_v1.md) or KDL 2.0, the parse will either fail
+   completely, or, if the parse succeeds, the data represented by a v1
+   or v2 parser will be identical.  This means that it's safe to use a
+   fallback parsing strategy in order to support both v1 and v2
+   simultaneously.  For example, node "foo" is a valid node in both
+   versions, and should be represented identically by parsers.
+
+   A version marker /- kdl-version 2 (or 1) _MAY_ be added to the
+   beginning of a KDL document, optionally preceded by the BOM, and
+   parsers _MAY_ use that as a hint as to which version to parse the
+   document as.
+
+2.  Introduction
+
+   KDL is a node-oriented document language.  Its niche and purpose
+   overlaps with XML, and as do many of its semantics.  You can use KDL
+   both as a configuration language, and a data exchange or storage
+   format, if you so choose.
+
+   The bulk of this document is dedicated to a long-form description of
+   all Components (Section 3) of a KDL document.  There is also a much
+   more terse Grammar (Section 4) at the end of the document that covers
+   most of the rules, with some semantic exceptions involving the data
+   model.
+
+   KDL is designed to be easy to read _and_ easy to implement.
+
+   In this document, references to "left" or "right" refer to directions
+   in the _data stream_ towards the beginning or end, respectively; in
+   other words, the directions if the data stream were only ASCII text.
+   They do not refer to the writing direction of text, which can flow in
+   either direction, depending on the characters used.
+
+3.  Components
+
+3.1.  Document
+
+   The toplevel concept of KDL is a Document.  A Document is composed of
+   zero or more Nodes (Section 3.2), separated by newlines and
+   whitespace, and eventually terminated by an EOF.
+
+   All KDL documents should be UTF-8 encoded and conform to the
+   specifications in this document.
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 3]
+
+                                   KDL                      January 2025
+
+
+3.1.1.  Example
+
+   The following is a document composed of two toplevel nodes:
+
+   foo {
+       bar
+   }
+   baz
+
+3.2.  Node
+
+   Being a node-oriented language means that the real core component of
+   any KDL document is the "node".  Every node must have a name, which
+   must be a String (Section 3.9).
+
+   The name may be preceded by a Type Annotation (Section 3.8) to
+   further clarify its type, particularly in relation to its parent
+   node.  (For example, clarifying that a particular date child node is
+   for the _publication_ date, rather than the last-modified date, with
+   (published)date.)
+
+   Following the name are zero or more Arguments (Section 3.5) or
+   Properties (Section 3.4), separated by either whitespace
+   (Section 3.17) or a slash-escaped line continuation (Section 3.3).
+   Arguments and Properties may be interspersed in any order, much like
+   is common with positional arguments vs options in command line tools.
+   Collectively, Arguments and Properties may be referred to as
+   "Entries".
+
+   Children (Section 3.6) can be placed after the name and the optional
+   Entries, possibly separated by either whitespace or a slash-escaped
+   line continuation.
+
+   Arguments are ordered relative to each other and that order must be
+   preserved in order to maintain the semantics.  Properties between
+   Arguments do not affect Argument ordering.
+
+   By contrast, Properties _SHOULD NOT_ be assumed to be presented in a
+   given order.  Children (Section 3.6) should be used if an order-
+   sensitive key/value data structure must be represented in KDL.  Cf.
+   JSON objects preserving key order.
+
+   Nodes _MAY_ be prefixed with Slashdash (Section 3.17.3) to "comment
+   out" the entire node, including its properties, arguments, and
+   children, and make it act as plain whitespace, even if it spreads
+   across multiple lines.
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 4]
+
+                                   KDL                      January 2025
+
+
+   Finally, a node is terminated by either a Newline (Section 3.18), a
+   semicolon (;), the end of a child block (}) or the end of the file/
+   stream (an EOF).
+
+3.2.1.  Example
+
+   // `foo` will have an Argument value list like `[1, 3]`.
+   foo 1 key=val 3 {
+       bar
+       (role)baz 1 2
+   }
+
+3.3.  Line Continuation
+
+   Line continuations allow Nodes (Section 3.2) to be spread across
+   multiple lines.
+
+   A line continuation is a \ character followed by zero or more
+   whitespace items (including multiline comments) and an optional
+   single-line comment.  It must be terminated by a Newline
+   (Section 3.18) (including the Newline that is part of single-line
+   comments).
+
+   Following a line continuation, processing of a Node can continue as
+   usual.
+
+3.3.1.  Example
+
+   my-node 1 2 \  // comments are ok after \
+           3 4    // This is the actual end of the Node.
+
+3.4.  Property
+
+   A Property is a key/value pair attached to a Node (Section 3.2).  A
+   Property is composed of a String (Section 3.9), followed immediately
+   by an equals sign (=, U+003D), and then a Value (Section 3.7).
+
+   Properties should be interpreted left-to-right, with rightmost
+   properties with identical names overriding earlier properties.  That
+   is:
+
+   node a=1 a=2
+
+   In this example, the node's a value must be 2, not 1.
+
+   No other guarantees about order should be expected by implementers.
+   Deserialized representations may iterate over properties in any order
+   and still be spec-compliant.
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 5]
+
+                                   KDL                      January 2025
+
+
+   Properties _MAY_ be prefixed with /- to "comment out" the entire
+   token and make it act as plain whitespace, even if it spreads across
+   multiple lines.
+
+3.5.  Argument
+
+   An Argument is a bare Value (Section 3.7) attached to a Node
+   (Section 3.2), with no associated key.  It shares the same space as
+   Properties (Section 3.4), and may be interleaved with them.
+
+   A Node may have any number of Arguments, which should be evaluated
+   left to right.  KDL implementations _MUST_ preserve the order of
+   Arguments relative to each other (not counting Properties).
+
+   Arguments _MAY_ be prefixed with /- to "comment out" the entire token
+   and make it act as plain whitespace, even if it spreads across
+   multiple lines.
+
+3.5.1.  Example
+
+   my-node 1 2 3 a b c
+
+3.6.  Children Block
+
+   A children block is a block of Nodes (Section 3.2), surrounded by {
+   and }. They are an optional part of nodes, and create a hierarchy of
+   KDL nodes.
+
+   Regular node termination rules apply, which means multiple nodes can
+   be included in a single-line children block, as long as they're all
+   terminated by ;.
+
+3.6.1.  Example
+
+   parent {
+       child1
+       child2
+   }
+
+   parent { child1; child2; }
+
+3.7.  Value
+
+   A value is either: a String (Section 3.9), a Number (Section 3.14), a
+   Boolean (Section 3.15), or Null (Section 3.16).
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 6]
+
+                                   KDL                      January 2025
+
+
+   Values _MUST_ be either Arguments (Section 3.5) or values of
+   Properties (Section 3.4).  Only String (Section 3.9) values may be
+   used as Node (Section 3.2) names or Property (Section 3.4) keys.
+
+   Values (both as arguments and in properties) _MAY_ be prefixed by a
+   single Type Annotation (Section 3.8).
+
+3.8.  Type Annotation
+
+   A type annotation is a prefix to any Node Name (Section 3.2) or Value
+   (Section 3.7) that includes a _suggestion_ of what type the value is
+   _intended_ to be treated as, or as a _context-specific elaboration_
+   of the more generic type the node name indicates.
+
+   Type annotations are written as a set of ( and ) with a single String
+   (Section 3.9) in it.  It may contain Whitespace after the ( and
+   before the ), and may be separated from its target by Whitespace.
+
+   KDL does not specify any restrictions on what implementations might
+   do with these annotations.  They are free to ignore them, or use them
+   to make decisions about how to interpret a value.
+
+   Additionally, the following type annotations MAY be recognized by KDL
+   parsers and, if used, SHOULD interpret these types as follows:
+
+3.8.1.  Reserved Type Annotations for Numbers Without Decimals:
+
+   Signed integers of various sizes (the number is the bit size):
+
+   *  i8
+
+   *  i16
+
+   *  i32
+
+   *  i64
+
+   *  i128
+
+   Unsigned integers of various sizes (the number is the bit size):
+
+   *  u8
+
+   *  u16
+
+   *  u32
+
+   *  u64
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 7]
+
+                                   KDL                      January 2025
+
+
+   *  u128
+
+   Platform-dependent integer types, both signed and unsigned:
+
+   *  isize
+
+   *  usize
+
+3.8.2.  Reserved Type Annotations for Numbers With Decimals:
+
+   IEEE 754 floating point numbers, both single (32) and double (64)
+   precision:
+
+   *  f32
+
+   *  f64
+
+   IEEE 754-2008 decimal floating point numbers
+
+   *  decimal64
+
+   *  decimal128
+
+3.8.3.  Reserved Type Annotations for Strings:
+
+   *  date-time: ISO8601 date/time format.
+
+   *  time: "Time" section of ISO8601.
+
+   *  date: "Date" section of ISO8601.
+
+   *  duration: ISO8601 duration format.
+
+   *  decimal: IEEE 754-2008 decimal string format.
+
+   *  currency: ISO 4217 currency code.
+
+   *  country-2: ISO 3166-1 alpha-2 country code.
+
+   *  country-3: ISO 3166-1 alpha-3 country code.
+
+   *  country-subdivision: ISO 3166-2 country subdivision code.
+
+   *  email: RFC5322 email address.
+
+   *  idn-email: RFC6531 internationalized email address.
+
+   *  hostname: RFC1123 internet hostname (only ASCII segments)
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 8]
+
+                                   KDL                      January 2025
+
+
+   *  idn-hostname: RFC5890 internationalized internet hostname (only xn
+      ---prefixed ASCII "punycode" segments, or non-ASCII segments)
+
+   *  ipv4: RFC2673 dotted-quad IPv4 address.
+
+   *  ipv6: RFC2373 IPv6 address.
+
+   *  url: RFC3986 URI.
+
+   *  url-reference: RFC3986 URI Reference.
+
+   *  irl: RFC3987 Internationalized Resource Identifier.
+
+   *  irl-reference: RFC3987 Internationalized Resource Identifier
+      Reference.
+
+   *  url-template: RFC6570 URI Template.
+
+   *  uuid: RFC4122 UUID.
+
+   *  regex: Regular expression.  Specific patterns may be
+      implementation-dependent.
+
+   *  base64: A Base64-encoded string, denoting arbitrary binary data.
+
+3.8.4.  Examples
+
+   node (u8)123
+   node prop=(regex).*
+   (published)date "1970-01-01"
+   (contributor)person name="Foo McBar"
+
+3.9.  String
+
+   Strings in KDL represent textual UTF-8 Values (Section 3.7).  A
+   String is either an Identifier String (Section 3.10) (like foo), a
+   Quoted String (Section 3.11) (like "foo") or a Multi-Line String
+   (Section 3.12).  Both Quoted and Multiline strings come in normal and
+   Raw String (Section 3.13) variants (like #"foo"#):
+
+   *  Identifier Strings let you write short, "single-word" strings with
+      a minimum of syntax
+
+   *  Quoted Strings let you write strings "like normal", with
+      whitespace and escapes.
+
+   *  Multi-Line Strings let you write strings across multiple lines and
+      with indentation that's not part of the string value.
+
+
+
+Marchán & KDL Contributors    Experimental                      [Page 9]
+
+                                   KDL                      January 2025
+
+
+   *  Raw Strings don't allow any escapes, allowing you to not worry
+      about the string's content containing anything that might look
+      like an escape.
+
+   Strings _MUST_ be represented as UTF-8 values.
+
+   Strings _MUST NOT_ include the code points for disallowed literal
+   code points (Section 3.19) directly.  Quoted and Multi-Line Strings
+   may include these code points as _values_ by representing them with
+   their corresponding \u{...} escape.
+
+3.10.  Identifier String
+
+   An Identifier String (sometimes referred to as just an "identifier")
+   is composed of any Unicode Scalar Value (https://unicode.org/
+   glossary/#unicode_scalar_value) other than non-initial characters
+   (Section 3.10.1), followed by any number of Unicode Scalar Values
+   other than non-identifier characters (Section 3.10.2).
+
+   A handful of patterns are disallowed, to avoid confusion with other
+   values:
+
+   *  idents that appear to start with a Number (Section 3.14) (like
+      1.0v2 or -1em) or the "almost a number" pattern of a decimal point
+      without a leading digit (like .1).
+
+   *  idents that are the language keywords (inf, -inf, nan, true,
+      false, and null) without their leading #.
+
+   Identifiers that match these patterns _MUST_ be treated as a syntax
+   error; such values can only be written as quoted or raw strings.  The
+   precise details of the identifier syntax is specified in the Full
+   Grammar in Section 4.
+
+3.10.1.  Non-initial characters
+
+   The following characters cannot be the first character in an
+   Identifier String (Section 3.10):
+
+   *  Any decimal digit (0-9)
+
+   *  Any non-identifier characters (Section 3.10.2)
+
+   Additionally, the following initial characters impose limitations on
+   subsequent characters:
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 10]
+
+                                   KDL                      January 2025
+
+
+   *  the + and - characters can only be used as an initial character if
+      the second character is _not_ a digit.  If the second character is
+      ., then the third character must _not_ be a digit.
+
+   *  the . character can only be used as an initial character if the
+      second character is _not_ a digit.
+
+   This allows identifiers to look like --this or .md, and removes the
+   ambiguity of having an identifier look like a number.
+
+3.10.2.  Non-identifier characters
+
+   The following characters cannot be used anywhere in a Identifier
+   String (Section 3.10):
+
+   *  Any of (){}[]/\"#;=
+
+   *  Any Whitespace (Section 3.17) or Newline (Section 3.18).
+
+   *  Any disallowed literal code points (Section 3.19) in KDL
+      documents.
+
+3.11.  Quoted String
+
+   A Quoted String is delimited by " on either side of any number of
+   literal string characters except unescaped " and \.
+
+   Literal Newline (Section 3.18) characters can only be included if
+   they are Escaped Whitespace (Section 3.11.1.1), which discards them
+   from the string value.  Actually including a newline in the value
+   requires using a newline escape sequence, like \n, or using a Multi-
+   Line String (Section 3.12) which is actually designed for strings
+   stretching across multiple lines.
+
+   Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the
+   disallowed literal code-points (Section 3.19) as code points in their
+   body.
+
+   Quoted Strings have a Raw String (Section 3.13) variant, which
+   disallows escapes.
+
+3.11.1.  Escapes
+
+   In addition to literal code points, a number of "escapes" are
+   supported in Quoted Strings.  "Escapes" are the character \ followed
+   by another character, and are interpreted as described in the
+   following table:
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 11]
+
+                                   KDL                      January 2025
+
+
+   +==============+=========+=========================================+
+   | Name         | Escape  | Code Pt                                 |
+   +==============+=========+=========================================+
+   | Line Feed    | \n      | U+000A                                  |
+   +--------------+---------+-----------------------------------------+
+   | Carriage     | \r      | U+000D                                  |
+   | Return       |         |                                         |
+   +--------------+---------+-----------------------------------------+
+   | Character    | \t      | U+0009                                  |
+   | Tabulation   |         |                                         |
+   | (Tab)        |         |                                         |
+   +--------------+---------+-----------------------------------------+
+   | Reverse      | \\      | U+005C                                  |
+   | Solidus      |         |                                         |
+   | (Backslash)  |         |                                         |
+   +--------------+---------+-----------------------------------------+
+   | Quotation    | \"      | U+0022                                  |
+   | Mark (Double |         |                                         |
+   | Quote)       |         |                                         |
+   +--------------+---------+-----------------------------------------+
+   | Backspace    | \b      | U+0008                                  |
+   +--------------+---------+-----------------------------------------+
+   | Form Feed    | \f      | U+000C                                  |
+   +--------------+---------+-----------------------------------------+
+   | Space        | \s      | U+0020                                  |
+   +--------------+---------+-----------------------------------------+
+   | Unicode      | \u{(1-6 | Code point described by hex characters, |
+   | Escape       | hex     | as long as it represents a Unicode      |
+   |              | chars)} | Scalar Value (https://unicode.org/      |
+   |              |         | glossary/#unicode_scalar_value)         |
+   +--------------+---------+-----------------------------------------+
+   | Whitespace   | See     | N/A                                     |
+   | Escape       | below   |                                         |
+   +--------------+---------+-----------------------------------------+
+
+                                 Table 1
+
+3.11.1.1.  Escaped Whitespace
+
+   In addition to escaping individual characters, \ can also escape
+   whitespace.  When a \ is followed by one or more literal whitespace
+   characters, the \ and all of that whitespace are discarded.  For
+   example,
+
+   "Hello World"
+
+   and
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 12]
+
+                                   KDL                      January 2025
+
+
+   "Hello \    World"
+
+   are semantically identical.  See whitespace (Section 3.17) and
+   newlines (Section 3.18) for how whitespace is defined.
+
+   Note that only literal whitespace is escaped; whitespace escapes (\n
+   and such) are retained.  For example, these strings are all
+   semantically identical:
+
+   "Hello\       \nWorld"
+
+       "Hello\n\
+       World"
+
+   "Hello\nWorld"
+
+   """
+     Hello
+     World
+     """
+
+3.11.1.2.  Invalid escapes
+
+   Except as described in the escapes table, above, \ _MUST NOT_ precede
+   any other characters in a string.
+
+3.12.  Multi-line String
+
+   Multi-Line Strings support multiple lines with literal, non-escaped
+   Newlines.  They must use a special multi-line syntax, and they
+   automatically "dedent" the string, allowing its value to be indented
+   to a visually matching level as desired.
+
+   A Multi-Line String is opened and closed by _three_ double-quote
+   characters, like """.  Its first line _MUST_ immediately start with a
+   Newline (Section 3.18) after its opening """.  Its final line _MUST_
+   contain only whitespace before the closing """.  All in-between lines
+   that contain non-newline, non-whitespace characters _MUST_ start with
+   _at least_ the exact same whitespace as the final line (precisely
+   matching codepoints, not merely counting characters or "size"); they
+   may contain additional whitespace following this prefix.  The lines
+   in between may contain unescaped " (but no unescaped """ as this
+   would close the string).
+
+   The value of the Multi-Line String omits the first and last Newline,
+   the Whitespace of the last line, and the matching Whitespace prefix
+   on all intermediate lines.  The first and last Newline can be the
+   same character (that is, empty multi-line strings are legal).
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 13]
+
+                                   KDL                      January 2025
+
+
+   In other words, the final line specifies the whitespace prefix that
+   will be removed from all other lines.
+
+   Multi-line Strings that do not immediately start with a Newline and
+   whose final """ is not preceeded by optional whitespace and a Newline
+   are illegal.  This also means that """ may not be used for a single-
+   line String (e.g. """foo""").
+
+3.12.1.  Newline Normalization
+
+   Literal Newline sequences in Multi-line Strings must be normalized to
+   a single U+000A (LF) during deserialization.  This means, for
+   example, that CR LF becomes a single LF during parsing.
+
+   This normalization does not apply to non-literal Newlines entered
+   using escape sequences.  That is:
+
+   multi-line """
+       \r\n[CRLF]
+       foo[CRLF]
+       """
+
+   becomes:
+
+   single-line "\r\n\nfoo"
+
+   For clarity: this normalization applies to each individual Newline
+   sequence.  That is, the literal sequence CRLF CRLF becomes LF LF, not
+   LF.
+
+3.12.2.  Examples
+
+3.12.2.1.  Indented multi-line string
+
+   multi-line """
+           foo
+       This is the base indentation
+               bar
+       """
+
+   This example's string value will be:
+
+       foo
+   This is the base indentation
+           bar
+
+   which is equivalent to
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 14]
+
+                                   KDL                      January 2025
+
+
+   "    foo\nThis is the base indentation\n        bar"
+
+   when written as a single-line string.
+
+3.12.2.2.  Shorter last-line indent
+
+   If the last line wasn't indented as far, it won't dedent the rest of
+   the lines as much:
+
+   multi-line """
+           foo
+       This is no longer on the left edge
+               bar
+     """
+
+   This example's string value will be:
+
+         foo
+     This is no longer on the left edge
+             bar
+
+   Equivalent to
+
+   "      foo\n  This is no longer on the left edge\n          bar"
+
+3.12.2.3.  Empty lines
+
+   Empty lines can contain any whitespace, or none at all, and will be
+   reflected as empty in the value:
+
+   multi-line """
+       Indented a bit
+
+       A second indented paragraph.
+       """
+
+   This example's string value will be:
+
+   Indented a bit.
+
+   A second indented paragraph.
+
+   Equivalent to
+
+   "Indented a bit.\n\nA second indented paragraph."
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 15]
+
+                                   KDL                      January 2025
+
+
+3.12.2.4.  Syntax errors
+
+   The following yield *syntax errors*:
+
+   multi-line """can't be single line"""
+
+   multi-line """
+     closing quote with non-whitespace prefix"""
+
+   multi-line """stuff
+     """
+
+   // Every line must share the exact same prefix as the closing line.
+   multi-line """[\n]
+   [tab]a[\n]
+   [space][space]b[\n]
+   [space][tab][\n]
+   [tab]"""
+
+3.12.3.  Interaction with Whitespace Escapes
+
+   Multi-line strings support the same mechanism for escaping whitespace
+   as Quoted Strings.
+
+   When processing a Multi-line String, implementations MUST dedent the
+   string _after_ resolving all whitespace escapes, but _before_
+   resolving other backslash escapes.  This means a whitespace escape
+   that attempts to escape the final line's newline and/or whitespace
+   prefix can be invalid: if removing escaped whitespace places the
+   closing """ on a line with non-whitespace characters, this escape is
+   invalid.
+
+   For example, the following example is illegal:
+
+     """
+     foo
+     bar\
+     """
+
+     // equivalent to
+     """
+     foo
+     bar"""
+
+   while the following example is allowed
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 16]
+
+                                   KDL                      January 2025
+
+
+     """
+     foo \
+   bar
+     baz
+     \   """
+
+     // equivalent to
+     """
+     foo bar
+     baz
+     """
+
+3.13.  Raw String
+
+   Both Quoted (Section 3.11) and Multi-Line Strings (Section 3.12) have
+   Raw String variants, which are identical in syntax except they do not
+   support \-escapes.  This includes line-continuation escapes (\ + ws
+   collapsing to nothing).  They otherwise share the same properties as
+   far as literal Newline (Section 3.18) characters go, multi-line
+   rules, and the requirement of UTF-8 representation.
+
+   The Raw String variants are indicated by preceding the strings's
+   opening quotes with one or more # characters.  The string is then
+   closed by its normal closing quotes, followed by a _matching_ number
+   of # characters.  This means that the string may contain any
+   combination of " and # characters other than its closing delimiter
+   (e.g., if a raw string starts with ##", it can contain " or "#, but
+   not "## or "###).
+
+   Like other Strings, Raw Strings _MUST NOT_ include any of the
+   disallowed literal code-points (Section 3.19) as code points in their
+   body.  Unlike with Quoted Strings, these cannot simply be escaped,
+   and are thus unrepresentable when using Raw Strings.
+
+3.13.1.  Example
+
+   just-escapes #"\n will be literal"#
+
+   The string contains the literal characters \n will be literal.
+
+   quotes-and-escapes ##"hello\n\r\asd"#world"##
+
+   The string contains the literal characters hello\n\r\asd"#world
+
+
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 17]
+
+                                   KDL                      January 2025
+
+
+   raw-multi-line #"""
+       Here's a """
+           multiline string
+           """
+       without escapes.
+       """#
+
+   The string contains the value
+
+   Here's a """
+       multiline string
+       """
+   without escapes.
+
+   or equivalently,
+
+   "Here's a \"\"\"\n    multiline string\n    \"\"\"\nwithout escapes."
+
+   as a Quoted String.
+
+3.14.  Number
+
+   Numbers in KDL represent numerical Values (Section 3.7).  There is no
+   logical distinction in KDL between real numbers, integers, and
+   floating point numbers.  It's up to individual implementations to
+   determine how to represent KDL numbers.
+
+   There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal,
+   Octal, and Binary.
+
+   *  All non-Keyword (Section 3.14.1) numbers may optionally start with
+      one of - or +, which determine whether they'll be positive or
+      negative.
+
+   *  Binary numbers start with 0b and only allow 0 and 1 as digits,
+      which may be separated by _. They represent numbers in radix 2.
+
+   *  Octal numbers start with 0o and only allow digits between 0 and 7,
+      which may be separated by _. They represent numbers in radix 8.
+
+   *  Hexadecimal numbers start with 0x and allow digits between 0 and
+      9, as well as letters A through F, in either lower or upper case,
+      which may be separated by _. They represent numbers in radix 16.
+
+   *  Decimal numbers are a bit more special:
+
+      -  They have no radix prefix.
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 18]
+
+                                   KDL                      January 2025
+
+
+      -  They use digits 0 through 9, which may be separated by _.
+
+      -  They may optionally include a decimal separator ., followed by
+         more digits, which may again be separated by _.
+
+      -  They may optionally be followed by E or e, an optional - or +,
+         and more digits, to represent an exponent value.
+
+   Note that, similar to JSON and some other languages, numbers without
+   an integer digit (such as .1) are illegal.  They must be written with
+   at least one integer digit, like 0.1.  (These patterns are also
+   disallowed from Identifier Strings (Section 3.10), to avoid
+   confusion.)
+
+3.14.1.  Keyword Numbers
+
+   There are three special "keyword" numbers included in KDL to
+   accomodate the widespread use of IEEE 754
+   (https://en.wikipedia.org/wiki/IEEE_754) floats:
+
+   *  #inf - floating point positive infinity.
+
+   *  #-inf - floating point negative infinity.
+
+   *  #nan - floating point NaN/Not a Number.
+
+   To go along with this and prevent foot guns, the bare Identifier
+   Strings (Section 3.10) inf, -inf, and nan are considered illegal
+   identifiers and should yield a syntax error.
+
+   The existence of these keywords does not imply that any numbers be
+   represented as IEEE 754 floats.  These are simply for clarity and
+   convenience for any implementation that chooses to represent their
+   numbers in this way.
+
+3.15.  Boolean
+
+   A boolean Value (Section 3.7) is either the symbol #true or #false.
+   These _SHOULD_ be represented by implementation as boolean logical
+   values, or some approximation thereof.
+
+3.15.1.  Example
+
+   my-node #true value=#false
+
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 19]
+
+                                   KDL                      January 2025
+
+
+3.16.  Null
+
+   The symbol #null represents a null Value (Section 3.7).  It's up to
+   the implementation to decide how to represent this, but it generally
+   signals the "absence" of a value.
+
+3.16.1.  Example
+
+   my-node #null key=#null
+
+3.17.  Whitespace
+
+   The following characters should be treated as non-Newline
+   (Section 3.18) white space
+   (https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt):
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 20]
+
+                                   KDL                      January 2025
+
+
+                  +===========================+=========+
+                  | Name                      | Code Pt |
+                  +===========================+=========+
+                  | Character Tabulation      | U+0009  |
+                  +---------------------------+---------+
+                  | Space                     | U+0020  |
+                  +---------------------------+---------+
+                  | No-Break Space            | U+00A0  |
+                  +---------------------------+---------+
+                  | Ogham Space Mark          | U+1680  |
+                  +---------------------------+---------+
+                  | En Quad                   | U+2000  |
+                  +---------------------------+---------+
+                  | Em Quad                   | U+2001  |
+                  +---------------------------+---------+
+                  | En Space                  | U+2002  |
+                  +---------------------------+---------+
+                  | Em Space                  | U+2003  |
+                  +---------------------------+---------+
+                  | Three-Per-Em Space        | U+2004  |
+                  +---------------------------+---------+
+                  | Four-Per-Em Space         | U+2005  |
+                  +---------------------------+---------+
+                  | Six-Per-Em Space          | U+2006  |
+                  +---------------------------+---------+
+                  | Figure Space              | U+2007  |
+                  +---------------------------+---------+
+                  | Punctuation Space         | U+2008  |
+                  +---------------------------+---------+
+                  | Thin Space                | U+2009  |
+                  +---------------------------+---------+
+                  | Hair Space                | U+200A  |
+                  +---------------------------+---------+
+                  | Narrow No-Break Space     | U+202F  |
+                  +---------------------------+---------+
+                  | Medium Mathematical Space | U+205F  |
+                  +---------------------------+---------+
+                  | Ideographic Space         | U+3000  |
+                  +---------------------------+---------+
+
+                                  Table 2
+
+3.17.1.  Single-line comments
+
+   Any text after //, until the next literal Newline (Section 3.18) is
+   "commented out", and is considered to be Whitespace (Section 3.17).
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 21]
+
+                                   KDL                      January 2025
+
+
+3.17.2.  Multi-line comments
+
+   In addition to single-line comments using //, comments can also be
+   started with /* and ended with */. These comments can span multiple
+   lines.  They are allowed in all positions where Whitespace
+   (Section 3.17) is allowed and can be nested.
+
+3.17.3.  Slashdash comments
+
+   Finally, a special kind of comment called a "slashdash", denoted by
+   /-, can be used to comment out entire _components_ of a KDL document
+   logically, and have those elements not be included as part of the
+   parsed document data.
+
+   Slashdash comments can be used before the following, including before
+   their type annotations, if present:
+
+   *  A Node (Section 3.2): the entire Node is treated as Whitespace,
+      including all props, args, and children.
+
+   *  An Argument (Section 3.5): the Argument value is treated as
+      Whitespace.
+
+   *  A Property (Section 3.4) key: the entire property, including both
+      key and value, is treated as Whitespace.  A slashdash of just the
+      property value is not allowed.
+
+   *  A Children Block (Section 3.6): the entire block, including all
+      children within, is treated as Whitespace.  Only other children
+      blocks, whether slashdashed or not, may follow a slashdashed
+      children block.
+
+   A slashdash may be be followed by any amount of whitespace, including
+   newlines and comments (other than other slashdashes), before the
+   element that it comments out.
+
+3.18.  Newline
+
+   The following character sequences should be treated as new lines
+   (https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-
+   5/#G41643):
+
+
+
+
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 22]
+
+                                   KDL                      January 2025
+
+
+       +=========+===============================+=================+
+       | Acronym | Name                          | Code Pt         |
+       +=========+===============================+=================+
+       | CRLF    | Carriage Return and Line Feed | U+000D + U+000A |
+       +---------+-------------------------------+-----------------+
+       | CR      | Carriage Return               | U+000D          |
+       +---------+-------------------------------+-----------------+
+       | LF      | Line Feed                     | U+000A          |
+       +---------+-------------------------------+-----------------+
+       | NEL     | Next Line                     | U+0085          |
+       +---------+-------------------------------+-----------------+
+       | VT      | Vertical tab                  | U+000B          |
+       +---------+-------------------------------+-----------------+
+       | FF      | Form Feed                     | U+000C          |
+       +---------+-------------------------------+-----------------+
+       | LS      | Line Separator                | U+2028          |
+       +---------+-------------------------------+-----------------+
+       | PS      | Paragraph Separator           | U+2029          |
+       +---------+-------------------------------+-----------------+
+
+                                  Table 3
+
+   Note that for the purpose of new lines, the specific sequence CRLF is
+   considered _a single newline_.
+
+3.19.  Disallowed Literal Code Points
+
+   The following code points may not appear literally anywhere in the
+   document.  They may be represented in Strings (but not Raw Strings)
+   using Unicode Escapes (Section 3.11.1) (\u{...}, except for non
+   Unicode Scalar Value, which can't be represented even as escapes).
+
+   *  The codepoints U+0000-0008 or the codepoints U+000E-001F (various
+      control characters).
+
+   *  U+007F (the Delete control character).
+
+   *  Any codepoint that is not a Unicode Scalar Value
+      (https://unicode.org/glossary/#unicode_scalar_value)
+      (U+D800-DFFF).
+
+   *  U+200E-200F, U+202A-202E, and U+2066-2069, the unicode "direction
+      control" characters (https://www.w3.org/International/questions/
+      qa-bidi-unicode-controls)
+
+   *  U+FEFF, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark
+      (BOM), except as the first code point in a document.
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 23]
+
+                                   KDL                      January 2025
+
+
+4.  Full Grammar
+
+   This is the full official grammar for KDL and should be considered
+   authoritative if something seems to disagree with the text above.
+   The grammar language syntax is defined in Section 4.1.
+
+   document := bom? version? nodes
+
+   // Nodes
+   nodes := (line-space* node)* line-space*
+
+   base-node := slashdash? type? node-space* string
+       (node-space+ slashdash? node-prop-or-arg)*
+       // slashdashed node-children must always be after props and args.
+       (node-space+ slashdash node-children)*
+       (node-space+ node-children)?
+       (node-space+ slashdash node-children)*
+       node-space*
+   node := base-node node-terminator
+   final-node := base-node node-terminator?
+
+   // Entries
+   node-prop-or-arg := prop | value
+   node-children := '{' nodes final-node? '}'
+   node-terminator := single-line-comment | newline | ';' | eof
+
+   prop := string node-space* '=' node-space* value
+   value := type? node-space* (string | number | keyword)
+   type := '(' node-space* string node-space* ')'
+
+   // Strings
+   string := identifier-string | quoted-string | raw-string ¶
+
+   identifier-string := unambiguous-ident | signed-ident | dotted-ident
+   unambiguous-ident :=
+       ((identifier-char - digit - sign - '.') identifier-char*)
+       - disallowed-keyword-strings
+   signed-ident :=
+       sign ((identifier-char - digit - '.') identifier-char*)?
+   dotted-ident :=
+       sign? '.' ((identifier-char - digit) identifier-char*)?
+   identifier-char :=
+       unicode - unicode-space - newline - [\\/(){};\[\]"#=]
+       - disallowed-literal-code-points
+   disallowed-keyword-identifiers :=
+       'true' | 'false' | 'null' | 'inf' | '-inf' | 'nan'
+
+   quoted-string :=
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 24]
+
+                                   KDL                      January 2025
+
+
+       '"' single-line-string-body '"' |
+       '"""' newline
+       (multi-line-string-body newline)?
+       (unicode-space | ws-escape)* '"""'
+   single-line-string-body := (string-character - newline)*
+   multi-line-string-body := (('"' | '""')? string-character)*
+   string-character :=
+       '\\' (["\\bfnrts] |
+       'u{' hex-unicode '}') |
+       ws-escape |
+       [^\\"] - disallowed-literal-code-points
+   ws-escape := '\\' (unicode-space | newline)+
+   hex-digit := [0-9a-fA-F]
+   hex-unicode := hex-digit{1, 6} - surrogates
+   surrogates := [dD][8-9a-fA-F]hex-digit{2}
+   // U+D800-DFFF: D  8         00
+   //              D  F         FF
+
+   raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
+   raw-string-quotes :=
+       '"' single-line-raw-string-body '"' |
+       '"""' newline
+       (multi-line-raw-string-body newline)?
+       unicode-space* '"""'
+   single-line-raw-string-body :=
+       '' |
+       (single-line-raw-string-char - '"')
+           single-line-raw-string-char*? |
+       '"' (single-line-raw-string-char - '"')
+           single-line-raw-string-char*?
+   single-line-raw-string-char :=
+       unicode - newline - disallowed-literal-code-points
+   multi-line-raw-string-body :=
+       (unicode - disallowed-literal-code-points)*?
+
+   // Numbers
+   number := keyword-number | hex | octal | binary | decimal
+
+   decimal := sign? integer ('.' integer)? exponent?
+   exponent := ('e' | 'E') sign? integer
+   integer := digit (digit | '_')*
+   digit := [0-9]
+   sign := '+' | '-'
+
+   hex := sign? '0x' hex-digit (hex-digit | '_')*
+   octal := sign? '0o' [0-7] [0-7_]*
+   binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 25]
+
+                                   KDL                      January 2025
+
+
+   // Keywords and booleans.
+   keyword := boolean | '#null'
+   keyword-number := '#inf' | '#-inf' | '#nan'
+   boolean := '#true' | '#false'
+
+   // Specific code points
+   bom := '\u{FEFF}'
+   disallowed-literal-code-points :=
+       See Table (Disallowed Literal Code Points)
+   unicode := Any Unicode Scalar Value
+   unicode-space := See Table
+       (All White_Space unicode characters which are not `newline`)
+
+   // Comments
+   single-line-comment := '//' ^newline* (newline | eof)
+   multi-line-comment := '/*' commented-block
+   commented-block :=
+       '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
+   slashdash := '/-' line-space*
+
+   // Whitespace
+   ws := unicode-space | multi-line-comment
+   escline := '\\' ws* (single-line-comment | newline | eof)
+   newline := See Table (All Newline White_Space)
+   // Whitespace where newlines are allowed.
+   line-space := node-space | newline | single-line-comment
+   // Whitespace within nodes,
+   // where newline-ish things must be esclined.
+   node-space := ws* escline ws* | ws+
+
+   // Version marker
+   version :=
+       '/-' unicode-space* 'kdl-version' unicode-space+ ('1' | '2')
+       unicode-space* newline
+
+4.1.  Grammar language
+
+   The grammar language syntax is a combination of ABNF with some regex
+   spice thrown in.  Specifically:
+
+   *  Single quotes (') are used to denote literal text. \ within a
+      literal string is used for escaping other single-quotes, for
+      initiating unicode characters using hex values (\u{FEFF}), and for
+      escaping \ itself (\\).
+
+
+
+
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 26]
+
+                                   KDL                      January 2025
+
+
+   *  * is used for "zero or more", + is used for "one or more", and ?
+      is used for "zero or one".  Per standard regex semantics, * and +
+      are _greedy_; they match as many instances as possible without
+      failing the match.
+
+   *  *? (used only in raw strings) indicates a _non-greedy_ match; it
+      matches as _few_ instances as possible without failing the match.
+
+   *  ¶ is a _cut point_. It always matches and consumes no characters,
+      but once matched, the parser is not allowed to backtrack past that
+      point in the source.  If a parser would rewind past the cut point,
+      it must instead fail the overall parse, as if it had run out of
+      options.  (This is only used with the raw-string production, to
+      ensure the first instance of the appropriate closing quote
+      sequence is guaranteed to be the end of the raw string, rather
+      than allowing it to potentially consume more of the document
+      unexpectedly.)
+
+   *  () can be used to group matches that must be matched together.
+
+   *  a | b means a or b, whichever matches first.  If multiple items
+      are before a |, they are a single group. a b c | d is equivalent
+      to (a b c) | d.
+
+   *  [] are used for regex-style character matches, where any character
+      between the brackets will be a single match. \ is used to escape
+      \, [, and ].  They also support character ranges (0-9), and
+      negation (^)
+
+   *  - is used for "except for" or "minus" whatever follows it.  For
+      example, a - 'x' means "any a, except something that matches the
+      literal 'x'".
+
+   *  The prefix ^ means "something that does not match" whatever
+      follows it.  For example, ^foo means "must not match foo".
+
+   *  A single definition may be split over multiple lines.  Newlines
+      are treated as spaces.
+
+   *  // followed by text on its own line is used as comment syntax.
+
+Authors' Addresses
+
+   Katerina Zoé Marchán Salvá
+   Microsoft
+
+
+   The KDL Contributors
+
+
+
+Marchán & KDL Contributors    Experimental                     [Page 27]