Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Monadic parsers and error-correction #2

Open
noughtmare opened this issue Jan 14, 2021 · 4 comments
Open

Monadic parsers and error-correction #2

noughtmare opened this issue Jan 14, 2021 · 4 comments

Comments

@noughtmare
Copy link
Contributor

I am running into an issue with monadic parsers and error correction.

Take for example this parser that parses natural numbers, but only if their value is below 42:

pUpTo42 :: Parser Integer
pUpTo42 = do
  n <- pNatural
  if n <= 42 then return n else pFail

If we now try to run this parser on a string that contains a number that is too large we get into an infinite loop:

λ parse ((,) <$> pUpTo42 <*> pEnd) (createStr (LineColPos 0 0 0) "43")
(^CInterrupted.

I think this is caused by the error correction trying to add more digits until it parses correctly, but no amount of digit-adding will correct this issue. If I change the type to Parser Int you can clearly see this:

λ (\(x, y) -> print x *> mapM_ print y) $ parse ((,) <$> pUpTo42Int <*> pEnd) (createStr (LineColPos 0 0 0) "43")
-6071859544651202560
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]
--    Inserted  '0' at position LineColPos 0 2 2 expecting one of ['0'..'9', Whitespace]

Ideally, I would like the error correction to replace any larger number by the number 42. Is there a way to accomplish that?

Also note that this is not my actual problem, I've simplified it to this problem. My original motivation was this blog post: https://juliu.is/permutate-parsers/, which reminded me of this library.

P.S. Is anybody still watching this repo? (you can add the eyes emoji as reaction to let me know without filling up the thread)

@noughtmare
Copy link
Contributor Author

noughtmare commented Jan 14, 2021

One pretty nice solution is to use a more granular function:

pUpTo :: Integer -> Parser Integer
pUpTo n = go 0
  where
    go :: Integer -> Parser Integer
    go s
      | n - 10 * s < 0 = pFail
      | otherwise = addLength 1 $ do
        s' <- (+ 10 * s) . fromIntegral . digitToInt
          <$> pRange ('0', intToDigit (fromIntegral (min 9 (n - 10 * s))))
        go s' `opt` s'

@jbransen
Copy link
Member

Getting error correction working together perfectly in the context of monads in such examples will not work. The problem is that there is no way for the parsing machinery to know about the usage of the values. The expression n <= 42 is not actually happing in "parser world" but rather outside of it, and variants of it such as "n > 100" or "isPrime n" would give equal amounts of information to the parsing machinery, but the error correction needed to get to a succesful parse is quite different.

I believe there are however ways of influencing the error correction. Indeed you could define primitives in a different way, but instead of doing pFail you can probably also insert some cost saying that this is a valid alternative but to a certain cost. It has been a while since I used this library though. Alternatively, the set of natural numbers up to a certain limit can be describe as a regular language, so you could construct an applicative parser for them, which is kind of similar to your pUpTo example.

@noughtmare
Copy link
Contributor Author

Alternatively, the set of natural numbers up to a certain limit can be describe as a regular language

Oh yeah, it is even a finite language, so you could just list all options. And that could be optimized similarly to my pUpTo function by looking at each digit individually. But, I guess my larger question concerns cases where this is not possible.

instead of doing pFail you can probably also insert some cost saying that this is a valid alternative but to a certain cost.

This is more what I intended with the question. I was looking for a way to instruct the library to perform certain corrections. In the original example you can see that the library tries to insert 0, which is not a good idea, so I was wondering if I could instead tell the library in some way to remove or change existing digits.

@jbransen
Copy link
Member

so I was wondering if I could instead tell the library in some way to remove or change existing digits.

Yes, by showing what format would be valid, e.g. by listing all options or doing something like pUpTo. I don't think you can expect to have anything better in the context of monadic parsing. When you build your parser as such, the pNatural has consumed part of its input, and then you use the result to decide what to do next, but trying to send data back to the pNatural for backtracking is not possible.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants