Skip to content

FeatureProductionSubclasses

Per Cederberg edited this page Mar 13, 2015 · 1 revision

Feature - Subclass Generation

Generate subclasses of the Production class for each entry in the grammar.

Rationale

The Analyzer interface generated by Grammatica uses the generic Token and Production classes. Using these classes to extract specific child productions or tokens is cumbersome and error-prone, since values and child nodes are accessed by their positional index or by their unique constant names. When grammars are changed, any analyzer errors in handling the child nodes will therefore only be visible at run-time.

Generating and using specified subclasses to the Production class would possibly alleviate this a bit. Then each child node would primarily be accessed via an accessor method (a getter), making child access errors visible at compile-time. Also, the actual analyzer code would probably be more straight-forward to read and write.

Here is an example of what code generated by the grammar.grammar might look like:

// HeaderPart = "%header%" HeaderDeclaration* ;
public class HeaderPartProduction extends Production {
    public HeaderToken getHeader() { ... }
    public List<HeaderDeclarationProduction> getHeaderDeclarations() { ... }
}

// HeaderDeclaration = IDENTIFIER "=" QUOTED_STRING ;
public class HeaderDeclarationProduction extends Production {
    public IdentifierToken getIdentifier() { ... }
    public EqualsToken getEquals() { ... }
    public QuotedStringToken getQuotedString() { ... }
}

Discussion

In order to perfectly map grammar productions onto classes, the productions in the grammar should be characterized into one of two classes:

  1. Alternatives - Listing zero or more productions as alternatives.
  2. Sequences - Listing zero or more tokens or productions as a sequence.

Alternative productions would then map onto an abstract superclass in a class hierarchy. This has the advantage of automatically compacting the generated parse trees. But it would also put a strict requirement on using exactly one production for each alternative, as tokens cannot inherit the superclass.

Alternative = SequenceOne
            | SequenceTwo ;

SequenceOne = ProductionOne ;

SequenceTwo = "Token"+ ProductionTwo ;

This structure could turn out to be inhibiting for grammar writers, so perhaps a more flexible model has to be offered as a fallback for productions not matching the above structure perfectly. Either by generating intermediate productions automatically (as is done today in a few cases) or by providing flexible accessors methods in the generated classes.

Related Work

The Grammatica 1.5 feature that introduced a factory method for productions is probably related to this. Perhaps the default implementation of it should just be changed. Or perhaps the API should be changed entirely and pushed into a 2.0 release of Grammatica.

Also of interest when discussing this feature would be the possibility of having more tree traversal alternatives available than the current copying depth-first alternative.