Better support for parsing LLM output #355

verhovsky · 2023-12-12T09:05:06Z

cmark-gfm is used by a number of apps that interface with text-generating Large Language Models (LLMs) (this one and this one are the ones I know). These models produce a few characters of Markdown every 200ms (on my machine) and cmark-gfm is used continuously to render the output text so far as Markdown. This is inefficient because (as far as I can tell) the entire generated Markdown has to be re-parsed from the beginning for every generated token, even though it has already been parsed except for the latest token.

cmark-gfm has a streaming interface of cmark_parser_feed and cmark_parser_finish but it seems like I need to call cmark_parser_finish every time I actually want to parse and I need to re-create a parser after that, I can't feed more tokens and re-parse. I would have expected there to be a way to cmark_parser_feed and then cmark_parser_parse and then doing cmark_parser_feed again, or a more complicated interface for editing the parse tree like tree-sitter has.

Also, while we're at it, the other issue is that the syntax isn't stable when it hasn't yet seen the entire input. Namely, a trailing single backtick ` should open a code block until the end of the line/input even if there's no closing backtick. The way it is now leads to jittering in the UI, where the UI first prints a backtick and a few seconds later removes it and re-renders everything after it in monospace when the LLM generates the closing backtick. This is also a problem for horizontal rules and bold/italic but definitely the latter isn't doable because many people use single * characters for multiplication.

The text was updated successfully, but these errors were encountered:

Otherwise we can get quadratic increase in size with deeply nested structures. See github#355.

nerocui pushed a commit to nerocui/cmark-gfm that referenced this issue May 29, 2024

Add MAX_INDENT for xml.

f7e31f8

Otherwise we can get quadratic increase in size with deeply nested structures. See github#355.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for parsing LLM output #355

Better support for parsing LLM output #355

verhovsky commented Dec 12, 2023

Better support for parsing LLM output #355

Better support for parsing LLM output #355

Comments

verhovsky commented Dec 12, 2023