Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

C tokenizer exited with BAD_ROUTE. #165

Closed
halfak opened this issue Sep 23, 2016 · 1 comment
Closed

C tokenizer exited with BAD_ROUTE. #165

halfak opened this issue Sep 23, 2016 · 1 comment
Assignees

Comments

@halfak
Copy link

halfak commented Sep 23, 2016

Error while processing 2006 French Open – Girls' Singles @ 685719491:

ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
Traceback (most recent call last):
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 244, in _solve
    value = dependent(*args)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/dependent.py", line 52, in __call__
    return self.process(*args, **kwargs)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/features/wikitext/datasources/parsed.py", line 210, in _process_wikicode
    return mwparserfromhell.parse(text)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
$ python
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mwparserfromhell
>>> mwparserfromhell.__version__
'0.4.3'
>>> import mwapi
>>> text = mwapi.Session("https://en.wikipedia.org").get(action='query', prop='revisions', revids=685719491, rvprop=['content'], formatversion=2)['query']['pages'][0]['revisions'][0]['content']
Sending requests with default User-Agent.  Set 'user_agent' on mwapi.Session to quiet this message.
>>> wikicode = mwparserfromhell.parse(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
@lahwaacz
Copy link
Contributor

Most likely a duplicate of #40 or #65, the skip_style_tags=True workaround works here as well.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants