You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mwparserfromhell fails to parse this page properly, my suspicion is that's because of the un-escaped tag-like syntax used for various keybindings, e.g. <TAB>, <RET>, <backspace>, <f1>, <f2> etc.
MediaWiki has some elaborate fallbacks for invalid tags so the page is rendered correctly; does mwparserfromhell try to do the same?
The text was updated successfully, but these errors were encountered:
This issue appears to also affect the .us page from wikipedia as well. After some number of <locality>.<state> blobs in the text, parsing appears to skip over wikilinks, identifying them as text.
For my application, it wouldn't be too bad to just skip pages where this parse issue occurs. Not quite sure what the best approach to identifying that case though.
Unfortunately the usual solution of skip_style_tags=True to mwparserfromhell.parse() doesn't work there because they're not style tags.
I don't have a simple solution in mind right now. You could muck around in the parser internals to make skip_style_tags skip all tags, if that works for you, but it'll be a bit annoying.
To add, this isn't as simple as hard-coding a large list of valid tags like in definitions.py since they are extension-dependent. It would need to be a configuration option.
mwparserfromhell fails to parse this page properly, my suspicion is that's because of the un-escaped tag-like syntax used for various keybindings, e.g.
<TAB>
,<RET>
,<backspace>
,<f1>
,<f2>
etc.MediaWiki has some elaborate fallbacks for invalid tags so the page is rendered correctly; does mwparserfromhell try to do the same?
The text was updated successfully, but these errors were encountered: