Skip to content
otac0n edited this page Sep 18, 2012 · 18 revisions

Grammar

A Pegasus grammar consists of a text file with two sections, in order:

  1. The "Settings" section.
  2. The "Rules" section.

Settings

Settings are specified in one of three ways:

  • @setting value For simple values, just write the setting value out. This is parsed as a type name.
  • @setting { value } For more complex values, wrap the setting value in curly braces. This is parsed as a code section.
  • @setting "value" An alternative to using curly braces is to use a string.

Supported settings

  • @namespace Specifies the namespace in which the parser class will be placed.
  • @accessibility Specifies the accessibility of the generated class.
  • @classname Specifies the name of the generated class.
  • @using Adds a using directive to the generated class file. (Multiple Allowed)
  • @members Allows for the definition of additional class members.

Combined Example

@namespace MyProject.Parsers
@accessibility internal
@classname MyParser
@using System.Linq
@using { Foo = System.String }
@members
{
    private static bool HelperFunction()
    {
    }
}

Rules

Basic Syntax

The basic syntax of a rule is:

name = expression

Rule Types

By default, rules have a return type of string. This can be modified by specifying a type for the rule, like so:

name <type> = expression

Rule Flags

Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:

rule -flag = expression
rule <type> -flag = expression

Currently, the only supported rule flag is the -memoize flag, which enables memoization for a particular rule.

Expressions

Character Matching Expressions

  • String 'foo' or "bar": String expressions match a string literally.
  • Character Class [a-z]: Matches a single character that is within the character class.
  • Wildcard .: Wildcard expressions match any single character.

Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i. For example, "foo"i 'bar'i [baz]i

Control Flow Expressions

  • Name a: Name expressions refer to a rule by name.
  • Labeled foo:a: Labeled expressions store a parse result for use in code assertions and expressions.
  • Sequence a b c: Sequence expressions match each component consecutively.
  • Choice a / b / c: Choice expressions provide options for parsing. They are evaluated consecutively.
  • Assertions !a &b: Assertion expressions act as look-aheads. They only peek at the parsing subject, they do not advance the cursor.
  • Code Assertions !{foo} &{bar}: Code assertions are similar to regular assertions. They represent C# code that returns a Boolean value, rather than performing a look-ahead.
  • Repetition a? b+ c*: Repetition expressions allow another expression to be repeated.
  • Parenthesis ( ... ): Parenthesis are used to group expressions.
  • Type (<type> ... ): Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.

State and Error Handling Expressions

  • Code { code }: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence.
  • Error #ERROR{ code }: Error-type code expressions are a special type of code expressions. The result of an error expression becomes an error message for an exception. Error-type code expressions must also come at the end of sequences.
  • State #STATE{ code; }: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify the state object in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition.

Miscellaneous

  • /* ... */ Multi-line comment
  • // ... Single-line comment
Clone this wiki locally