-
Notifications
You must be signed in to change notification settings - Fork 155
Handling Whitespace
One disadvantage of PEGs over lexer based CFGs can be the handling of white space. In a traditional CFG based parser with a separate lexer (scanner) phase this lexer might simply skip all white space and only generate tokens for the actual parser to operate on. This can free the actual parser grammar from all white space treatment. Since PEGs do not have a lexer but directly operate on the raw input they have to deal with white space in the grammar itself. Language designers with little experience in PEGs can sometime be unsure of how to best handle white space in their grammar.
A common and highly recommended pattern is to match white space always immediately after a terminal (a single character or string) but not in any other place.
With parboiled you can take this rule even one step further and factor out most whitespace handling to only one helper method. One way to go is shown in the CalculatorParser3 example for parboiled for Java and the JSON Parser example for parboiled for Scala.
The technique is to override the default String-to-Rule conversion method and inject custom logic. The two examples listed above define a special rule building construct for string literals ending with a blank. These literals are wrapped in a sequence rule that automatically matches all trailing whitespace after the string (or character).
The result is that everywhere you use string literals ending with a blank in your grammar any trailing white space will automatically be consumed as well. This can make your grammar rules much more compact, readable and therefore maintainable.
However there are a few things to remember when you use this solution:
- All input text matched for rules containing “whitespace-enabled” string literals will now also have an unknown number of white space in their matched input texts, which can in some cases throw off parser action methods expecting otherwise.
- CharRange rules and AnyOf (String) are not affected by this solution, i.e. for them you still have to “manually” take care of matching trailing white space.
- Introduction
- ... Motivation
- ... Features
- ... Simple Java Example
- ... Simple Scala Example
- ... RegEx vs. parboiled vs. Parser Generators
- ... Projects using parboiled
- Installation
- Concepts
- ... The Big Picture
- ... The Rule Tree
- ... The Value Stack
- ... The Parse Tree
- ... AST Construction
- ... Parse Error Handling
- parboiled for Java
- ... Rule Construction in Java
- ... Parser Action Expressions
- ... Working with the Value Stack
- ... Action Variables
- ... Parser Extension in Detail
- ... Style Guide
- ... Java Examples
- ...... ABC Grammar
- ...... Calculators
- ...... Time Parser
- ...... Java Parser
- ...... Markdown processor
- parboiled for Scala
- ... Rule Construction in Scala
- ... Parser Actions in Scala
- ... Parser Testing in Scala
- ... Scala Examples
- ...... Simple Calculator
- ...... JSON Parser
- Advanced Topics
- ... Handling Whitespace
- ... Parsing Performance Tuning
- ... Indentation Based Grammars
- ... The ProfilingParseRunner
- ... Grammar and Parser Debugging
- ... Thread Safety
- Building parboiled
- parboiled-core Javadoc API
- parboiled-java Javadoc API
- parboiled-scala Scaladoc API
- Change Log
- Patch Policy