Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reference-style hyperlinks #11032

Open
knowler opened this issue Feb 15, 2025 · 9 comments
Open

Reference-style hyperlinks #11032

knowler opened this issue Feb 15, 2025 · 9 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@knowler
Copy link

knowler commented Feb 15, 2025

What problem are you trying to solve?

Raw URLs within regular text content make that content more difficult to read and therefore write, edit, and maintain.

Example:

<p>Here’s an <a href="https://example.com/painfully/long/link/maybe/with/gibberish">example</a>. Here’s <a href="https://example.com/another/painfully/long/link/maybe/with/gibberish">another one</a>.</p>

Often this is a contributing reason that authors turn to “simpler” markup languages that compile to HTML, namely Markdown. Unfortunately, when using languages such as Markdown, there’s a loss of fidelity as Markdown syntax only represents a subset of HTML and HTML must be used within Markdown documents to avoid that fidelity loss (e.g. instead of using Markdown’s blockquote syntax, in order to set the cite attribute on the <blockquote> element, one must use an HTML <blockquote> element instead).

I won’t digress into the unstandardized mess that is Markdown, but what I’m trying to highlight is that just using HTML itself becomes more appealing at a certain point when dealing with some of these issues and I think it’s worth addressing some of the reasons why folks turn to alternatives like Markdown in the first place, improving the HTML authoring experience.

What solutions exist today?

Within HTML, there are no solutions without abusing <form> elements but at that point it’s not semantically a link anymore.

Incredibly hacky workaround with incorrect semantics
<p>Here’s an <button form=example>example</button>. Here’s <button name=another>another one</button>.</p>

<form id=example action="https://example.com/painfully/long/link/maybe/with/gibberish"></form>
<form id=another action="https://example.com/another/painfully/long/link/maybe/with/gibberish"></form>

How would you solve it?

Markdown has a syntax feature — one might even say its killer feature — that solves this issue called “reference-style links” which allow a link within content to use an identifier that’s later mapped to a URL outside of the content. This also means that multiple links can reference the same identifier.

Here’s an [example][example]. Here’s [another one][another].

[example]: https://example.com/painfully/long/link/maybe/with/gibberish
[another]: https://example.com/another/painfully/long/link/maybe/with/gibberish

I propose that HTML adds a syntax for reference-style links. I’d either use an attribute, perhaps the name attribute (maybe too much historical baggage?), and if that references some element with a matching id, perhaps a <link> element (maybe with some rel attribute type), then that <a> element should become a link with an implicit href of the referenced URL.

<!-- `link` attribute here isn’t a full proposal. -->
<p>Here’s an <a link=example>example</a>. Here’s <a link=another>another one</a>.</p>

<link id=example href="https://example.com/painfully/long/link/maybe/with/gibberish">
<link id=another href="https://example.com/another/painfully/long/link/maybe/with/gibberish">

Anything else?

No response

@knowler knowler added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Feb 15, 2025
@KeithHenry
Copy link

Great idea!

However, the name syntax is already used for anchors...

<a href="#example">clicking here</a>

...

<a name="example">jumps the scroll to here in the same document</a>

You can use that to build citations - the anchor contains the long form.

It would have to be sympathetic to documents already using that.

@schalkneethling
Copy link

schalkneethling commented Feb 15, 2025

@knowler I wonder, other than potentially for search engines, would there be a downside if these links are not in the <body> of the HTML document? The reason I ask is because these could then be meta information, something like:

<head>
  <meta id="github-11032" name="reference-link" href="https://github.com/whatwg/html/issues/11032#issue-2855295872">
</head>

<body>
  <a href="ref:github-11032">Reference-style hyperlinks</a>
</body>

Update: Then again, the browser could replace the reference with the raw URL during parsing avoiding the potential downside for search crawlers mentioned earlier.

@knowler
Copy link
Author

knowler commented Feb 15, 2025

In #11032 (comment), @KeithHenry said:

the name syntax is already used for anchors...

I had that in mind, but I forgot how they specifically worked. I thought maybe it didn’t conflict, but I see now that it could. I think a different attribute would be needed then. I avoided href itself since almost any microsyntax for references would still be a valid URL.

@Merri
Copy link

Merri commented Feb 15, 2025

As HTML already has support for multiple ID references in a few places (such as in aria-labelledby) would that be supported?

For example:

<a link="english finnish japanese">Wikipedia</a>

<link id="english" hreflang="en" href="https://en.wikipedia.org" content="...">
<link id="finnish" hreflang="fi" href="https://fi.wikipedia.org" content="...">
<link id="japanese" hreflang="ja" href="https://ja.wikipedia.org" content="...">

This is of course kind of a feature of it's own, but I'm just merely pointing out the possibility from syntax perspective that could have additional valid use cases. Say, often in blog posts you can see people linking to a bunch of stuff and currently the way to solve it "neatly" is to choose one word or a few to represent one link so that the paragraph doesn't become visually too cluttered or lengthy when including all the links in the text (but it does become tedious for a screen reader user).

Anyway I leave this thought here and escape back to the other parts of the Internet.

@knowler
Copy link
Author

knowler commented Feb 15, 2025

@Merri It’s very interesting to consider how something like this could be extended. For example, partly following your example, what if multiple links could be defined for the same identifier and depending on the document language or if it was auto-translated, the browser could pick the most relevant option:

<a link="wikipedia">Wikipedia</a>

<link name="wikipedia" hreflang="en" href="https://en.wikipedia.org">
<link name="wikipedia" hreflang="fi" href="https://fi.wikipedia.org">
<link name="wikipedia" hreflang="ja" href="https://ja.wikipedia.org">

That’d be different than the case you mentioned:

Say, often in blog posts you can see people linking to a bunch of stuff and currently the way to solve it "neatly" is to choose one word or a few to represent one link so that the paragraph doesn't become visually too cluttered or lengthy when including all the links in the text (but it does become tedious for a screen reader user).

In that case the links would all be relevant and I wonder if something like foot/end-notes were easier if that’d be a better solution. As you allude to, the current authoring pattern is certainly an anti-pattern. Solutions to that might be worth exploring elsewhere (I’m interested though).

@zzzzBov
Copy link

zzzzBov commented Feb 15, 2025

Thinking about malicious use cases, I could see potential for abuse if content is injected with an identifier that was already in use, causing the links to redirect to a malicious website.

That said, I'm not sure it's a real concern as it would imply that the site was not correctly sanitizing inputs, in which case XSS would have already been a problem due to the ability to inject a <script>.


Another approach to the same problem that could be worth considering is to not introduce any new attributes, but instead introduce a new protocol, such as ref: or link: or [insert your ideal name here].

<a href="link:example">Example</a>

I think referencing a <link> element makes the most sense, as this feature would probably work well with other existing features like prefetch or preload.

@knowler
Copy link
Author

knowler commented Feb 15, 2025

In #11032 (comment), @zzzzBov said:

Another approach to the same problem that could be worth considering is to not introduce any new attributes, but instead introduce a new protocol, such as ref: or link: or [insert your ideal name here].

I think that a new protocol would be the only way to make the attribute href work. Otherwise, almost any content would be parseable as relative to the document’s base URL. It would add extra noise and I imagine it might be harder to achieve (i.e. likely involves a different standards authority and might need use cases beyond the web).

@schalkneethling
Copy link

@Merri It’s very interesting to consider how something like this could be extended. For example, partly following your example, what if multiple links could be defined for the same identifier and depending on the document language or if it was auto-translated, the browser could pick the most relevant option:

That is interesting and would be very useful. With my meta tag idea I was also thinking using hreflang would be a perfect fit:

<meta name="reference-link" hrefname="wikipedia" hreflang="en" href="https://en.wikipedia.org">
<meta name="reference-link" hrefname="wikipedia" hreflang="fi" href="https://fi.wikipedia.org">

I also think a link element might be even better than a meta tag as you proposed so something like:

<link rel="reference" hrefname="wikipedia" hreflang="en" href="https://en.wikipedia.org">
<link rel="reference" hrefname="wikipedia" hreflang="fi" href="https://fi.wikipedia.org">

I think that a new protocol would be the only way to make the attribute href work. Otherwise, almost any content would be parseable as relative to the document’s base URL.

100% agree @knowler which is why I included that in my original comment:

<a href="ref:wikipedia">Reference-style hyperlinks</a>

@KeithHenry
Copy link

I avoided href itself since almost any microsyntax for references would still be a valid URL.

@knowler They can already work together, sort of...

<a href="#cite1">APA reference</a>

... lots of content

<a name="cite1" href="https://apastyle.apa.org/style-grammar-guidelines/references/examples/webpage-website-references">
  APA Style Guide (2020, February) 
  <i>Webpage on a Website References</i>
</a>

I think the issue is that clicking on <a href="#cite1"> jumps to the place in the document with the anchor, but you'd want it to follow the link too.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

5 participants