Skip to content
John Gietzen edited this page Aug 9, 2016 · 18 revisions

This is a guide to the basic syntax of Pegasus. For more advanced topics, see the "How Do I... ?" article.

Grammar

A Pegasus grammar consists of a text file with two sections, in order:

  1. The "Settings" section.
  2. The "Rules" section.

Settings

Settings are specified in one of three ways:

  • @setting value For simple values, just write the setting value out. This is parsed as a type name.
  • @setting { value } For more complex values, wrap the setting value in curly braces. This is parsed as a code section.
  • @setting "value" An alternative to using curly braces is to use a string.

Supported settings

  • @namespace Specifies the namespace in which the parser class will be placed.
  • @accessibility Specifies the accessibility of the generated class.
  • @classname Specifies the name of the generated class.
  • @ignorecase Specifies that the default behavior of the parser with regards to case sensitivity.
  • @resources Specifies the resources class to be used for resource based strings.
  • @start Specifies the starting rule. Defaults to the first rule in the grammar.
  • @trace Enables or disables tracing. Defaults to false.
  • @using Adds a using directive to the generated class file. (Multiple Allowed)
  • @members Allows for the definition of additional class members.

Combined Example

@namespace MyProject.Parsers
@accessibility internal
@classname MyParser
@ignorecase true
@resources MyProject.Properties.Resources
@start startingRule
@trace true
@using System.Linq
@using { Foo = System.String }
@members
{
    private static bool HelperFunction()
    {
    }
}

Rules

Basic Syntax

The basic syntax of a rule is:

name = expression

Rule Types

By default, rules infer their return type. For sequence expressions this is string, but this can be modified by specifying a type for the rule, like so:

name <type> = expression { ... }

Rule Flags

Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:

rule -flag = expression
rule <type> -flag = expression

Supported flags

  • -memoize Enables memoization for the rule.
  • -lexical Specifies that the rule should be included in the lexicalElements collection whenever it is successfully parsed.
  • -public Specifies that this rule will be made a public entry point for the grammar.
  • -export Specifies that this rule will be included in this grammar's exported rules.

Expressions

Character Matching Expressions

  • String 'foo' or "bar": String expressions match a string literally.
  • Character Class [a-z]: Matches a single character that is within the character class.
  • Wildcard .: Wildcard expressions match any single character.

Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i. For example, "foo"i 'bar'i [baz]i Or, they can be marked as case-sensitive by suffixing the string or class with the letter s.

Strings can be read from resources by suffixing the string with the letter r. The string to be parsed is then read from the grammar's resources, specified via the @resources setting described above.

Control Flow Expressions

  • Name a: Name expressions refer to a rule by name.
  • Labeled foo:a: Labeled expressions store a parse result for use in code assertions and expressions.
  • Sequence a b c: Sequence expressions match each component consecutively.
  • Choice a / b / c: Choice expressions provide options for parsing. They are evaluated consecutively.
  • Assertions !a &b: Assertion expressions act as look-aheads. They only peek at the parsing subject, they do not advance the cursor.
  • Code Assertions !{foo} &{bar}: Code assertions are similar to regular assertions. They represent C# code that returns a Boolean value, rather than performing a look-ahead.
  • Repetition a? b+ c* d<3> e<2,> f<1,5>: Repetition expressions allow another expression to be repeated.
  • Delimited Repetition a<0,,",">: Repetition expressions also support a delimiter that will match (and consume) in between each repeated match.
  • Parenthesis ( ... ): Parenthesis are used to group expressions.
  • Type (<type> ... ): Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.

State and Error Handling Expressions

  • Code { code }: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence.
  • Error #error{ code }: Error-type code expressions throw a System.FormatException with the error message specified by the code section. The exception that is thrown will also have the Data["cursor"] property set, so that the location of the error can be determined.
  • State #{ code; }: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify the state object in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition.
  • Parse #parse{ code }: Parse-type code expressions not only allow mutation of the cursor like state expressions, but also return a ParseResult<T>, allowing the integration of more complex parsing logic. The canonical example of this would be using an exported rule from another Pegasus parser.

Miscellaneous

  • /* ... */ Multi-line comment
  • // ... Single-line comment
Clone this wiki locally