Skip to content

Commit

Permalink
Speed up parsing deeply nested syntax by caching bad routes (fixes #42)
Browse files Browse the repository at this point in the history
Also removed the max cycles stop-gap, allowing much more complex pages
to be parsed quickly without losing nodes at the end

Also fixes #65, fixes #102, fixes #165, fixes #183
Also fixes #81 (Rafael Nadal parsing bug)
Also fixes #53, fixes #58, fixes #88, fixes #152 (duplicate issues)
  • Loading branch information
earwig committed Jun 23, 2017
1 parent 08e5f7e commit 8a9c922
Show file tree
Hide file tree
Showing 15 changed files with 1,337 additions and 53 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ v0.5 (unreleased):
contained within another Wikicode object.
- Added Wikicode.get_ancestors() and Wikicode.get_parent() to find all
ancestors and the direct parent of a Node, respectively.
- Fixed a long-standing performance issue with deeply nested, invalid syntax
(issue #42). The parser should be much faster on certain complex pages. The
"max cycle" restriction has also been removed, so some situations where
templates at the end of a page were being skipped are now resolved.
- Made Template.remove(keep_field=True) behave more reasonably when the
parameter is already empty.
- Added the keep_template_params argument to Wikicode.strip_code(). If True,
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com>
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
9 changes: 7 additions & 2 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ Unreleased
object.
- Added :meth:`.Wikicode.get_ancestors` and :meth:`.Wikicode.get_parent` to
find all ancestors and the direct parent of a :class:`.Node`, respectively.
- Fixed a long-standing performance issue with deeply nested, invalid syntax
(`issue #42 <https://github.com/earwig/mwparserfromhell/issues/42>`_). The
parser should be much faster on certain complex pages. The "max cycle"
restriction has also been removed, so some situations where templates at the
end of a page were being skipped are now resolved.
- Made :meth:`Template.remove(keep_field=True) <.Template.remove>` behave more
reasonably when the parameter is already empty.
- Added the *keep_template_params* argument to :meth:`.Wikicode.strip_code`.
Expand Down Expand Up @@ -54,7 +59,7 @@ v0.4.3
v0.4.2
------

`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.2>`_
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.2>`__
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.4.1...v0.4.2>`__):

- Fixed setup script not including header files in releases.
Expand All @@ -63,7 +68,7 @@ v0.4.2
v0.4.1
------

`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.1>`_
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.1>`__
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.4...v0.4.1>`__):

- The process for building Windows binaries has been fixed, and these should be
Expand Down
6 changes: 5 additions & 1 deletion mwparserfromhell/parser/contexts.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com>
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -100,6 +100,8 @@
* :const:`TABLE_TH_LINE`
* :const:`TABLE_CELL_LINE_CONTEXTS`
* :const:`HTML_ENTITY`
Global contexts:
* :const:`GL_HEADING`
Expand Down Expand Up @@ -176,6 +178,8 @@
TABLE = (TABLE_OPEN + TABLE_CELL_OPEN + TABLE_CELL_STYLE + TABLE_ROW_OPEN +
TABLE_TD_LINE + TABLE_TH_LINE)

HTML_ENTITY = 1 << 37

# Global contexts:

GL_HEADING = 1 << 0
Expand Down
Loading

0 comments on commit 8a9c922

Please # to comment.