You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When generating documentation with rustdoc, it appears to percent-encode ~ in link destinations. For an example, see moka 0.8.3. At the bottom of the crate documentation is a link with the title "hierarchical timer wheel". The href in the HTML is http://www.cs.columbia.edu/%7Enahum/w6998/papers/ton97-timing-wheels.pdf, note the %7E, whereas the source is
RFC 1738 declared ~ to be an "unsafe" character, but that was obsoleted 17 years ago by RFC 3986 which explicitly lists ~ as an unreserved character and says that unreserved characters should not be percent-encoded.
I've also noticed that rustdoc percent-encodes ^, which is annoying when trying to use a link like https://docs.rs/parking_lot/^0.12/parking_lot/type.Mutex.html as it ends up looking ugly. RFC 3986 disallows ^ inside URLs, but the HTML5 spec extends the URL syntax to add ^ to the set of unreserved characters (along with other characters that RFC 3986 omitted). As such, rustdoc should target HTML5's notion of what constitutes a valid URL rather than RFC 3986's definition, as the URLs it produces will be parsed according to the HTML spec.
More generally, rustdoc should attempt to preserve the URL as it was written to the extent possible. This may in fact mean not adding any percent-encoding at all, as the URL is written directly in the markdown and RFC 3986 §2.4 specifies that under normal circumstances, URL-encoding should only be done when producing a URL from its component parts. As rustdoc is not producing a URL from component parts it should probably just leave the URL alone.
Meta
This occurs both in rust 1.60.0 and in the unstable compiler used by docs.rs (currently 1.63.0-nightly (c52b9c10b 2022-05-16)).
The text was updated successfully, but these errors were encountered:
rustdoc use pulldown-cmark for it's markdown parsing and writing.
As I understand the library use a very simple and a bit naive algorithm to determine which character to encode based on this table https://github.com/raphlinus/pulldown-cmark/blob/9bfba94ca849c7d9d75b53ba1f505761954e6290/src/escape.rs#L29-L38 where 1 represent true and the table follows the ascii standard.
We can clearly see on line 7, entry 14 (~ = 0x7E = 126 => 126/16 = 7.875 = 7 + (14/16)) that the entry is 1 instead of 0. The fix is their for to simply put it at 0.
~
When generating documentation with rustdoc, it appears to percent-encode
~
in link destinations. For an example, seemoka 0.8.3
. At the bottom of the crate documentation is a link with the title "hierarchical timer wheel". The href in the HTML ishttp://www.cs.columbia.edu/%7Enahum/w6998/papers/ton97-timing-wheels.pdf
, note the%7E
, whereas the source is//! [timer-wheel]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
RFC 1738 declared
~
to be an "unsafe" character, but that was obsoleted 17 years ago by RFC 3986 which explicitly lists~
as an unreserved character and says that unreserved characters should not be percent-encoded.The fact that rustdoc encodes this is a problem because it actually breaks links. Case in point, the link from the motivating example here is broken by the percent-encoding. It shouldn't be, but not all servers percent-decode paths before interpreting them. If you click on http://www.cs.columbia.edu/%7Enahum/w6998/papers/ton97-timing-wheels.pdf you get a 404, but if you click on the originally-specified http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf it works.
^
and other charactersI've also noticed that rustdoc percent-encodes
^
, which is annoying when trying to use a link likehttps://docs.rs/parking_lot/^0.12/parking_lot/type.Mutex.html
as it ends up looking ugly. RFC 3986 disallows^
inside URLs, but the HTML5 spec extends the URL syntax to add^
to the set of unreserved characters (along with other characters that RFC 3986 omitted). As such, rustdoc should target HTML5's notion of what constitutes a valid URL rather than RFC 3986's definition, as the URLs it produces will be parsed according to the HTML spec.More generally, rustdoc should attempt to preserve the URL as it was written to the extent possible. This may in fact mean not adding any percent-encoding at all, as the URL is written directly in the markdown and RFC 3986 §2.4 specifies that under normal circumstances, URL-encoding should only be done when producing a URL from its component parts. As rustdoc is not producing a URL from component parts it should probably just leave the URL alone.
Meta
This occurs both in rust 1.60.0 and in the unstable compiler used by docs.rs (currently
1.63.0-nightly (c52b9c10b 2022-05-16)
).The text was updated successfully, but these errors were encountered: