Skip to content

Potential bug with indirect left-recursion #247

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
LPeter1997 opened this issue Nov 6, 2019 · 2 comments
Open

Potential bug with indirect left-recursion #247

LPeter1997 opened this issue Nov 6, 2019 · 2 comments

Comments

@LPeter1997
Copy link

LPeter1997 commented Nov 6, 2019

I've isolated a 3-rule pattern, where I'd expect the parser to succeed for a given input, but it fails instead. The sample grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleC ::= RuleB
        | RuleA
        .

The implementation:

import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical.StandardTokenParsers

object SimpleParser extends StandardTokenParsers with PackratParsers {
  lexical.delimiters ++= List(".", "$")

  lazy val ruleA: PackratParser[String] = ruleB <~ "$"
  lazy val ruleB: PackratParser[String] = (ident
                                       ||| ruleC ~> "." ~> ident)
  lazy val ruleC: PackratParser[String] = (ruleB
                                       ||| ruleA)

  def main(args: Array[String]) = {
    println(ruleA(new PackratReader(new lexical.Scanner("x.x$"))))
  }
}

It fails for input x.x$, telling me that it expects a $ instead of the ..

For me, this seems to be a problem with how indirect left-recursion is handled. I'm not sure if the original algorithm is incapable of handling this pattern, or this is an implementation bug.

Edit:
I've accidentally used | (first matching alt.) instead of ||| (longest matching alt.), I've fixed that in the code, but doesn't change the outcome.

@LPeter1997
Copy link
Author

Perhaps an interesting/helpful finding: Introducing an "alias" rule for RuleB seems to successfully parse x.x$. The grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleB' ::= IDENTIFIER
         | RuleC '.' IDENTIFIER
         .
RuleC ::= RuleB'
        | RuleA
        .

Implementation:

lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
                                     ||| ruleC ~> "." ~> ident)
lazy val ruleBvar: PackratParser[String] = (ident
                                        ||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleBvar
                                     ||| ruleA)

However, it still rejects a valid input, x.x.x$.
This makes me further believe that the problem is with indirect left-recursion.

@toddobryan
Copy link

I think the problem is the one identified in this paper: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/

That talks about an interaction with left- and right-hand recursion and identifies a problem with the Warth et al algorithm used in the PackratParsers.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants