-
Notifications
You must be signed in to change notification settings - Fork 155
Indentation Based Grammars
Sometimes a DSL provides for block scoping by indentation levels (something also know as the off-side rule). Languages exhibiting this behavior are, for example, Python or YAML. These languages are classic examples of the “non-context free” kind and generally require some custom logic input preprocessing in order to simplify parser grammar design. Traditionally this job is left to the lexer but since PEGs (and therefore parboiled) are lexer-less another solution is needed.
It would of course be possible to provide for an input preprocessor that is completely separate from parboileds parsing engine. However, this level of indirection would greatly complicate error reporting, since users still expect to receive error messages for their original input text and not some intermediate representation.
Starting with release 0.9.9 parboiled provides for a dedicated InputBuffer implementation that takes care of the required indentation preprocessing while maintaining full original input transparency for error messages and recovery. You can use it by passing a newly created IndentDedentInputBuffer instance to the ParseRunner of your choice (Java) or call the “transformIndents” method on the (probably implicitly) created Input instance (Scala).
Java Example:
new ReportingParseRunner(rule).run(new IndentDedentInputBuffer("... input ...", 2))
Scala Example:
ReportingParseRunner(rule).run("... input ...".transformIndents(2))
Upon creation the IndentDedentInputBuffer collapses all line indentations, i.e. space and tab characters at the beginning of a line, replacing them with special INDENT or DEDENT characters where required. All other characters in the input are left untouched, including the “newline” sequences at the end of the lines. Specifically the InputBuffer collapses indentations into
- nothing, if the line has the same indentation level as the previous line
- one INDENT, if the line has a greater indentation level than previous line
- one or more DEDENTs, if the line has a lower indentation level than the previous line
Here is an example of how the IndentDedentBuffer transforms indented input text:
level 1
level 2
still level 2
level 3
also 3
2 again
another 3
and back to 1
another level 2 again
will be turned into the following (the brackets indicating the special, unprintable INDENT and DEDENT characters)
level 1
>level 2
still level 2
>level 3
also 3
<2 again
>another 3
<<and back to 1
>another level 2 again
<
With the line indentation transformed in the described fashion grammar specification becomes much simpler since INDENTs and DEDENTs can now be matched just like brace pairs in other languages.
- Introduction
- ... Motivation
- ... Features
- ... Simple Java Example
- ... Simple Scala Example
- ... RegEx vs. parboiled vs. Parser Generators
- ... Projects using parboiled
- Installation
- Concepts
- ... The Big Picture
- ... The Rule Tree
- ... The Value Stack
- ... The Parse Tree
- ... AST Construction
- ... Parse Error Handling
- parboiled for Java
- ... Rule Construction in Java
- ... Parser Action Expressions
- ... Working with the Value Stack
- ... Action Variables
- ... Parser Extension in Detail
- ... Style Guide
- ... Java Examples
- ...... ABC Grammar
- ...... Calculators
- ...... Time Parser
- ...... Java Parser
- ...... Markdown processor
- parboiled for Scala
- ... Rule Construction in Scala
- ... Parser Actions in Scala
- ... Parser Testing in Scala
- ... Scala Examples
- ...... Simple Calculator
- ...... JSON Parser
- Advanced Topics
- ... Handling Whitespace
- ... Parsing Performance Tuning
- ... Indentation Based Grammars
- ... The ProfilingParseRunner
- ... Grammar and Parser Debugging
- ... Thread Safety
- Building parboiled
- parboiled-core Javadoc API
- parboiled-java Javadoc API
- parboiled-scala Scaladoc API
- Change Log
- Patch Policy