Fallback on invalid tags? #102

lahwaacz · 2015-05-31T08:05:07Z

mwparserfromhell fails to parse this page properly, my suspicion is that's because of the un-escaped tag-like syntax used for various keybindings, e.g. <TAB>, <RET>, <backspace>, <f1>, <f2> etc.
MediaWiki has some elaborate fallbacks for invalid tags so the page is rendered correctly; does mwparserfromhell try to do the same?

The text was updated successfully, but these errors were encountered:

earwig · 2015-06-05T04:36:57Z

I'm going to go out on a limb here and say this will be resolved by #42. Keeping this open so we can verify when that's resolved.

timwu · 2015-10-26T21:19:32Z

This issue appears to also affect the .us page from wikipedia as well. After some number of <locality>.<state> blobs in the text, parsing appears to skip over wikilinks, identifying them as text.

For my application, it wouldn't be too bad to just skip pages where this parse issue occurs. Not quite sure what the best approach to identifying that case though.

earwig · 2015-10-26T21:22:58Z

Unfortunately the usual solution of skip_style_tags=True to mwparserfromhell.parse() doesn't work there because they're not style tags.

I don't have a simple solution in mind right now. You could muck around in the parser internals to make skip_style_tags skip all tags, if that works for you, but it'll be a bit annoying.

earwig · 2017-06-04T21:34:03Z

To add, this isn't as simple as hard-coding a large list of valid tags like in definitions.py since they are extension-dependent. It would need to be a configuration option.

earwig self-assigned this Jun 5, 2015

earwig added this to the version 0.5 milestone Jun 5, 2015

earwig added aspect: parser priority: low labels Jun 5, 2015

lahwaacz mentioned this issue Jun 4, 2017

rewrite and extend Caveats #180

Merged

earwig closed this as completed in 8a9c922 Jun 23, 2017

earwig added result: fixed and removed priority: low labels Jun 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback on invalid tags? #102

Fallback on invalid tags? #102

lahwaacz commented May 31, 2015

earwig commented Jun 5, 2015

timwu commented Oct 26, 2015

earwig commented Oct 26, 2015

earwig commented Jun 4, 2017

Fallback on invalid tags? #102

Fallback on invalid tags? #102

Comments

lahwaacz commented May 31, 2015

earwig commented Jun 5, 2015

timwu commented Oct 26, 2015

earwig commented Oct 26, 2015

earwig commented Jun 4, 2017