-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
retain authored HTML for empty elements #95
Comments
Note that “retain” is actually the wrong behaviour. You should normalise to having no closing trailing solidus. The microformats parsing specification for |
BS4 recently added an "html5" formatter that apparently does this: I propose switching output to that, since we do not make any other promise about the HTML output AFAIK? |
Beautifulsoup 4.6.2 introduced the ability to control the slashes in void elements through formatters. This adds a formatter that does this, but otherwise only does minimal encoding (as the previous, default, formatter did)
@snarfed While investigating another issue I discovered today that granary appears to rely on mf2py to produce somewhat XHTML-compatible output, which this would break. We could expose the HTML formatter in the API to allow granary to force the old behavior? EDIT: alternatively, some variation of #97 that allows granary to force the serialization downstream, which might give it more options to generate proper XML. |
thanks for the heads up! i'm not entirely sure how this would affect granary yet, but i can always update it. let me know when you have an mf2py PR you want me to test! |
#136 is said PR. Granary produces Atom feeds with the content declared to be XHTML and unless I missed something taken straight from mf2 parsing when turning an mf2 html feed into Atom. Not closing void elements isn't allowed in XHTML as far as I know, and would make those feeds invalid? |
thanks! yeah, i got that part, i just don't remember the exact transformation steps in granary. no matter, i'll try it and see. |
@sknebel you're right. thanks for thinking of granary! it does pass the HTML content pretty much straight through to Atom. ideally yes, i'd love a flag or exposed HTML formatter in mf2py so i can control this in granary. i tested just now though, and if i change the Atom to |
If you use html5lib you can reserialize with options turned on - I think use_trailing_solidus is the one to autoclose null elements; I don't know that it can guarantee full XHTML compliance though, which is what you need to inline them in Atom. |
Currently mf2py due to using BeautifulSoup closes empty HTML tags. e.g.
<br>
gets converted to<br/>
and<hr>
gets converted into<hr/>
. This makes thee-content[html]
different from the authored one.This does not seem to be an issue in actual use but will be for any tests. So I am documenting this here.
Details
html5lib by default does not do this see: https://github.com/html5lib/html5lib-python/blob/5e6b61b4630165dd4765fff41d0f855534d5e2fe/html5lib/serializer.py#L114
The relevant lines in BeautifulSoup which explicitly do this are https://github.com/waylan/beautifulsoup/blob/480367ce8c8a4d1ada3012a95f0b5c2cce4cf497/bs4/element.py#L1106-L1107 (Note that this is not the canonial source for BS4)
The text was updated successfully, but these errors were encountered: