Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fallback on invalid tags? #102

Closed
lahwaacz opened this issue May 31, 2015 · 4 comments
Closed

Fallback on invalid tags? #102

lahwaacz opened this issue May 31, 2015 · 4 comments

Comments

@lahwaacz
Copy link
Contributor

mwparserfromhell fails to parse this page properly, my suspicion is that's because of the un-escaped tag-like syntax used for various keybindings, e.g. <TAB>, <RET>, <backspace>, <f1>, <f2> etc.
MediaWiki has some elaborate fallbacks for invalid tags so the page is rendered correctly; does mwparserfromhell try to do the same?

@earwig
Copy link
Owner

earwig commented Jun 5, 2015

I'm going to go out on a limb here and say this will be resolved by #42. Keeping this open so we can verify when that's resolved.

@earwig earwig self-assigned this Jun 5, 2015
@earwig earwig added this to the version 0.5 milestone Jun 5, 2015
@timwu
Copy link

timwu commented Oct 26, 2015

This issue appears to also affect the .us page from wikipedia as well. After some number of <locality>.<state> blobs in the text, parsing appears to skip over wikilinks, identifying them as text.

For my application, it wouldn't be too bad to just skip pages where this parse issue occurs. Not quite sure what the best approach to identifying that case though.

@earwig
Copy link
Owner

earwig commented Oct 26, 2015

Unfortunately the usual solution of skip_style_tags=True to mwparserfromhell.parse() doesn't work there because they're not style tags.

I don't have a simple solution in mind right now. You could muck around in the parser internals to make skip_style_tags skip all tags, if that works for you, but it'll be a bit annoying.

@earwig
Copy link
Owner

earwig commented Jun 4, 2017

To add, this isn't as simple as hard-coding a large list of valid tags like in definitions.py since they are extension-dependent. It would need to be a configuration option.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants