Simplify the cql2-text grammar (future version improvements?) #705

jerstlouis · 2022-06-14T06:55:43Z

This is feedback from trying to implement cql2-text.
Implementers (or at least us) face struggles with the current grammar.

I think it comes down mainly to these two things:

Some of the capabilities from extension conformance classes are defined as separate rules. I think it would be much easier to simply define new possible values for operators or pre-defined functions identifiers (using the same grammar rule as function calls) for operators using a function call syntax (i.e., array/spatial/temporal operators and predicates). This would cut down the number of rules dramatically, and I think would also allow to make the requirements in each conformance class clearer.
Some rules seem to exist only to restrict the data types (e.g., numericExpression, characterExpression, temporalExpression...). However, this is purely a runtime concept, since the data type that a certain expression (e.g., a property) will evaluate to will depend on the queryables. Therefore I would not have used grammar rules (which are about the syntax) to make this distinction. Instead, I think what is needed for this is to have requirements and/or permissions that specify the interpretation if an unexpected data type is used in such a context.

I think simplifying these two aspects of the grammar would directly result in simpler parser implementations, greater ease of implementation and greater interoperability.

cportele · 2022-06-20T14:21:53Z

Meeting 2022-06-20: It would be good to understand why this would result in an easier implementation. We need to discuss this in a meeting when @jerstlouis is present.

jerstlouis · 2022-06-20T22:20:25Z

Thanks @cportele . I should be attending the next meeting in a couple weeks.

As a summary, from a syntactic point of view, I think the two things I suggested above would result in fewer grammar rules (simpler grammar), and parser node classes would be a more direct / natural match to the rules. We would implement the function/operator name validation / data types checking separately from the parsing, since some of it is only known at runtime (e.g., available functions, queryable data types). e.g., in our implementation we have a CQL2CallExp node class which we plan to use to handle the array / spatial / temporal operators which syntactically look like function calls. We are hand-writing a Recursive Descent parser, borrowing heavily from our ECCSS/CMSS parser.

jerstlouis · 2022-06-28T02:08:16Z

The following excerpt from our internal CQL2 design document mapping CQL2 conformance classes and providing a concise summary of the CQL2 syntax might be insightful. A simpler grammar could potentially closely match those CQL2* AST node classes to rules. We could eventually prototype such a simpler grammar together with railroad diagrams demonstrating the idea.

Basic CQL2

Defines predicate expressions evaluating to a boolean value, which we parse as the following eC AST node classes:
- CQL2Identifier for identifiers, which are sequences of UTF-8 characters. Identifiers can also be double-quoted to include any arbitrary characters. As in ECCSS, true, false and null will be treated as identifiers in our implementation (with the drawback that they cannot be used for identifiers even double-quoted).
  - Valid identifier starting characters: ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
  - Additional valid identifier continuing characters: "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
- CQL2Expression for a generic expression class from which other CQL2Exp* are derived:
  - Sub-expressions enclosed in parentheses ( ) to override default operator priorities
  - CQL2ExpIdentifier for expressions consisting of an identifier (CQL2Identifier).
  - CQL2ExpConstant for decimal numeric literals, integer or fractional using ., no suffixes used, including support for scientific notation (E separating power of 10 exponent)
  - CQL2ExpString for UTF-8 character string literals enclosed in single quote ('); single quote characters within a string literal are represented by two consecutive single-quote characters ('')
    - Defines the concept of a date and date-time as string literals (CQL2ExpString) following RFC 3339 (profile of ISO 8601).
  - CQL2ExpCall with support for:
    - DATE as a well-known function taking a string literal (CQL2ExpString) defining a date instant
    - TIMESTAMP as a well-known function taking a string literal (CQL2ExpString) defining a datetime instant
  - CQL2ExpOperation, with support for the following operators (note that all CQL2 keywords are case-insensitive):
    - Unary operator NOT followed by an operand
    - Binary logical operators AND and OR
    - Binary relational operators =, <, >, <=, >=, <>
    - Binary relational operator IS and IS NOT (ignore extra spaces between IS and NOT) for checking against NULL identifier.
    - For relational operators with Basic CQL2, CQL2ExpIdentifier (other than true, false and null) are only supported for the left operand, while true, false, null and literals are supported in second operands only.

Property-Property

Removes the limitation in which operands of relational operators identifiers or literals can be used

Arithmetic Expressions

Adds support for the following binary operators in CQL2ExpOperation: +, -, *, / (fractional division, see also Features#711), ^ (exponent)
Adds support for the - unary operator? (Features#709)

Advanced Comparison Operators

Adds LIKE and NOT LIKE relational operators (ignore extra spaces between NOT and LIKE) that accepts a pattern where % matches 0..n arbitrary characters, _ matches exactly one arbitrary character (and those characters can be escaped by using a \ character); expects text expressions only, and string literals in right operand.
- NOTE: Equivalent but different functionality in ECCSS is provided by the ^ (starts with), $ (ends with), ~ (contains) text operators and their negated counterparts.
Adds BETWEEN and NOT BETWEEN ternary relational operators (e.g., depth BETWEEN 100.0 AND 150.0); expects numeric expressions only.
Adds IN and NOT IN relational operators taking a comma-separated list of expressions (CQL2ExpList) within parentheses as second operand; items in the list are expected to be of same type as value being tested.

Functions

Adds CQL2ExpCall with support for implementation-defined custom functions, taking a list of expressions within parentheses ( ) as arguments following an identifier (CQL2Identifier) for the function to call
Implies use of CQL2ExpList for function arguments separated by commas
Although the CQL2 specification and grammar does not currently define it as such, syntactically all of the following extended conformance classes could have been defined using the functions calls grammar rule, and our parser implement it as such using an CQL2ExpCall AST node. This demonstrates that functions are a mechanism by which CQL2 could be extended independently from the specification.
- Except for WKT, only the array literals would require the addition of a new grammar rule since it uses [...] rather than e.g., ARRAY(...)). My suggestion in Features#718 is to use (1,2,3) for array literals instead. To support WKT, support for space-separated tuples are also required e.g., 10 30 in POLYGON((10 30, 40 20, 50 80, 10 30)).

Case-insensitive Comparison

Adds CQL2ExpCall with support for the CASEI well-known function returning a case-desensitised version of a string.

Accent-insensitive Comparison

Adds CQL2ExpCall with support for the ACCENTI well-known function returning an accent-desensitised version of a string.

Basic Spatial Operators

Adds CQL2ExpCall with support for the POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION and ENVELOPE well-known functions defining vector geometry objects following the simple features model (WKT encoding).
- Also implies support for space-separated tuples and array literals using ( ) to support the WKT notation as arguments to those function calls
Adds the S_INTERSECTS well-known function for spatial intersection operator
Implies use of CQL2ExpList for function arguments separated by commas.

Spatial Operators

Implies Basic Spatial operator support, and adds the following well-known functions for additional spatial operators:
- S_CONTAINS, S_CROSSES, S_DISJOINT, S_EQUALS, S_OVERLAPS, S_TOUCHES, S_WITHIN

Temporal Operators

Adds CQL2ExpCall with support for:
- INTERVAL as a well-known function taking two instants string literals (CQL2ExpString) defining a temporal interval object
- the following operators taking both instants and intervals as arguments: T_AFTER, T_BEFORE, T_DISJOINT, T_EQUALS, T_INTERSECTS
- the following operators taking only intervals as arguments: T_CONTAINS, T_DURING, T_FINISHEDBY, T_FINISHES, T_MEETS, T_METBY, T_OVERLAPPEDBY, T_OVERLAPS, T_STARTEDBY, T_STARTS
Implies use of CQL2ExpList for function arguments separated by commas

Array Operators

Adds CQL2ExpArray (array literals as a list of expressions (CQL2ExpList) within [ ])
Adds CQL2ExpCall with support for the A_CONTAINEDBY, A_CONTAINS, A_EQUALS and A_OVERLAPS array operators as well-known functions
Implies use of CQL2ExpList for expressions array and for function arguments separated by commas

@pvretano

jerstlouis · 2022-07-01T19:28:02Z

See first draft of proposed simpler grammar rules in #723 (comment).

jerstlouis · 2022-07-19T03:14:11Z

Note that in the approach I suggest in defining the grammar production rules, operators / functions are not really keywords, but regular identifiers used in function call expressions (or spatial/literal/array literals definitions using same syntax as function calls). For example, this means that a date or s_intersects queryable would not require to be double-quoted (as in the current abstract tests), since date would only take its meaning of a temporal literal when it is followed an opening parenthesis (, and therefore there really is no ambiguity to date<>DATE('2022-04-16').

In my opinion this makes it much easier to extend the language with additional functions / operators, since those additions would not introduce additional keywords that break implementations not previously requiring queryables with the same name to be double-quoted. The list of keywords in 8.2 (which would need to be double-quoted, if allowed at all) would be reduced to:

AND
BETWEEN
DIV
FALSE
IN
IS
LIKE
NOT
NULL
OR
TRUE

All of the other ones would get tokenized by the lexer as an identifier which can be used as operators/function calls, or to define literals and only get resolved in the contexts where they apply. This is the approach taken in C-like languages where standard functions and data types/structs (or classes in C++) are not classified as keywords.

Also note that SQL keywords (or "reserved" words) do not seem to include any function-like keywords either. Things like UPPER() changing case are described as functions instead.

See somewhat related comment here: opengeospatial#705 (comment) that led me to this discovery.

jerstlouis · 2024-04-01T23:09:10Z

See the CartoSym-CSS BNF lexer / grammar for ANTLR4 which should (in theory) be a true superset of CQL2:

https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Lexer.g4

https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Grammar.g4

The starting rule for CQL2 is expression (e.g., you can paste the Lexer and Grammar at http://lab.antlr.org/ and test any CQL2 expression with expression as the start rule).

When I have a chance I will extract only the CQL2 relevant part.

cportele added the CQL2 label Jun 20, 2022

jerstlouis added a commit to jerstlouis/ogcapi-features that referenced this issue Jul 19, 2022

IS was missing from list of keywords

ec59b84

See somewhat related comment here: opengeospatial#705 (comment) that led me to this discovery.

This was referenced Jul 19, 2022

IS was missing from list of keywords #747

Merged

Spatial operators #745

Merged

WillGunther mentioned this issue Jul 20, 2022

Consider Removing Left Recursion From CQL 2 BNF grammar #751

Closed

cportele mentioned this issue Sep 26, 2022

CQL2: Remove - from list of valid identifier characters? #766

Closed

jerstlouis mentioned this issue Mar 10, 2023

Change names of some "literal" production because they are not stricly literals. #790

Merged

cportele added the Future work support in an additional part of OGC API Features label Dec 29, 2023

jerstlouis mentioned this issue Feb 8, 2024

What is everybody going to be working on, at the 2024 Joint OGC - ASF - OSGeo Sprint? opengeospatial/developer-events#127

Open

cportele added this to Features Part 3: Filtering / Common Query Language (CQL2) Jun 3, 2024

cportele moved this to Future work in Features Part 3: Filtering / Common Query Language (CQL2) Jun 3, 2024

This was referenced Nov 13, 2024

Part 3, "CQL2 functions"... only for CQL2? #964

Open

/functions: Adding a well-known function URI for identification #966

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the cql2-text grammar (future version improvements?) #705

Simplify the cql2-text grammar (future version improvements?) #705

jerstlouis commented Jun 14, 2022 •

edited

Loading

cportele commented Jun 20, 2022

jerstlouis commented Jun 20, 2022 •

edited

Loading

jerstlouis commented Jun 28, 2022 •

edited

Loading

jerstlouis commented Jul 1, 2022

jerstlouis commented Jul 19, 2022 •

edited

Loading

jerstlouis commented Apr 1, 2024

Simplify the cql2-text grammar (future version improvements?) #705

Simplify the cql2-text grammar (future version improvements?) #705

Comments

jerstlouis commented Jun 14, 2022 • edited Loading

cportele commented Jun 20, 2022

jerstlouis commented Jun 20, 2022 • edited Loading

jerstlouis commented Jun 28, 2022 • edited Loading

jerstlouis commented Jul 1, 2022

jerstlouis commented Jul 19, 2022 • edited Loading

jerstlouis commented Apr 1, 2024

jerstlouis commented Jun 14, 2022 •

edited

Loading

jerstlouis commented Jun 20, 2022 •

edited

Loading

jerstlouis commented Jun 28, 2022 •

edited

Loading

jerstlouis commented Jul 19, 2022 •

edited

Loading