-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
wbr element shouldn't be balanced #488
Comments
hmm yeah I can reproduce. wbr is listed as a self closing tag on: bleach/bleach/_vendor/html5lib/html5parser.py Lines 964 to 965 in a06cd77
and should have: token["selfClosingAcknowledged"] = True but I get
at https://github.com/mozilla/bleach/blob/master/bleach/sanitizer.py#L271 so I'm thinking one of these things might be going on:
but I'll need to find more time to look into it further. |
OK this is a bug in html5lib (v1.1 at least): » python
Python 3.8.2 (default, Mar 26 2020, 12:39:19)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bleach._vendor.html5lib as html5lib
>>> html5lib.__version__
'1.1'
>>> html5lib.serialize(html5lib.parseFragment('<area>')) # this is correct
'<area>'
>>> html5lib.serialize(html5lib.parseFragment('<wbr>')) # should be <wbr>
'<wbr></wbr>'
>>> html5lib.serialize(html5lib.parseFragment('<keygen>')) # HTML 5.2 deprecates the tag
'<keygen></keygen>'
>>> html5lib.serialize(html5lib.parseFragment('<menuitem>')) # https://github.com/html5lib/html5lib-python/issues/203 mentions this but https://developer.mozilla.org/en-US/docs/Web/HTML/Element/menuitem shows non-void examples and says HTML 5.2 deprecates it
'<menuitem></menuitem>' the upstream issue is html5lib/html5lib-python#203 Not sure what html5lib's position on deprecated elements is. |
This is now addressed in html5lib: |
Waiting on an html5lib release with this fix. Then we can update the vendored html5lib and test everything. |
The
<wbr>
element is balanced bybleach.clean
even though it is an empty element.Using the list of empty tags from MDN:
The output includes
<wbr></wbr>
when it should just be<wbr>
like the others.keygen
has the same problem, but that's deprecated so I'm not sure if it's worth including.The text was updated successfully, but these errors were encountered: