Skip to content

Latest commit

 

History

History
146 lines (123 loc) · 6.72 KB

File metadata and controls

146 lines (123 loc) · 6.72 KB

Is There Some Way to Influence Error Recovery?

One of tree-sitter's notable features is its ability to "recover from errors". Anecdotal evidence suggests that this works well in many cases.

Unfortunately, according to the tree-sitter creator, for the other cases, there does not appear to be much / anything one can do at the moment:

Yeah, there is intentionally not much that you as a grammar author can do to manipulate the error recovery.

though he has mentioned that:

I do think there is a clear path to fixing these things within Tree-sitter, and I do plan to work on this eventually, but it'll take some significant focused time, so it may be a while. [...elided...] I wish I had more time to fix this right now.

Discussion

One of the explanations of error-recovery given by the tree-sitter creator is:

The basic idea is that the parser accepts any string. If the string does not match the grammar, the parser corrects it to match the grammar. Currently, there are two types of corrections:

  1. Skip some number of tokens or subtrees
  2. Insert one missing token

Obviously, for a given syntax error, there are multiple combinations of corrections that would work. The parser handles this decision similarly to how ambiguities are handled by the classic GLR algorithm: the parse stack forks into multiple branches, and tries a different possibility on each branch.

We maintain two quantities associated with each branch of the parse stack:

  • error_cost - an integer measure of the number of subtrees skipped, the total size of the subtrees skipped, and the number of missing tokens inserted.
  • node_count - the number of valid syntax tree nodes that have been added, since the last error.

There is some heuristic logic for deciding when to discard branches of the parse stack based on their error cost and their node count. When the node count is small, it means that we've just recently encountered an error, so we want to allow a fairly large number of different branches to "compete", to see which one will turn out the best. When the node count is large, it means we have progressed far past the error, so we should allow very few different branches (eventually just one), in order to restore good parsing performance.

At one point, he gave a sketch of an approach to providing some "customization":

Yeah, I'd still like to add support for generating error messages in Tree-sitter. I think it would "fit" in fine to the current library; it'd probably be a function that takes an ERROR node and returns some structured information about the location where the error was first detected, and which state the parser was in when the error was detected. From there, you could call another function to iterate over all of the valid symbols in that parse state.

Video Clips

Similar content, but perhaps nice for fans or those looking for tiny hints? (^^;

Papers and the Like

Tree-sitter Issues