Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Link parsing doesn't correctly match URLs when they include brackets #116

Closed
grafana-dee opened this issue Sep 16, 2014 · 6 comments · Fixed by #222
Closed

Link parsing doesn't correctly match URLs when they include brackets #116

grafana-dee opened this issue Sep 16, 2014 · 6 comments · Fixed by #222

Comments

@grafana-dee
Copy link
Contributor

For a given piece of Markdown:
[disambiguation](http://en.wikipedia.org/wiki/Disambiguation_(disambiguation))

I would expect to see the output:
<p><a href="http://en.wikipedia.org/wiki/Disambiguation_(disambiguation)">disambiguation</a></p>

However blackfriday moves the second ) outside of the link, thus breaking the URL:
<p><a href="http://en.wikipedia.org/wiki/Disambiguation_(disambiguation">disambiguation</a>)</p>

You can recreate this with this example code:

package main

import (
    "fmt"

    "github.com/russross/blackfriday"
)

func main() {
    md := []byte(`[disambiguation](http://en.wikipedia.org/wiki/Disambiguation_(disambiguation))`)

    html := blackfriday.MarkdownCommon(md)

    fmt.Println(`Output  : ` + string(html))
    fmt.Println(`Expected: <p><a href="http://en.wikipedia.org/wiki/Disambiguation_(disambiguation)">disambiguation</a></p>`)
}

The correct output is generated by most other Markdown parsers, here's Pandoc as the online link makes it the easiest to share:
http://johnmacfarlane.net/pandoc/try/?text=[disambiguation]%28http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FDisambiguation_%28disambiguation%29%29&from=markdown&to=html

Bracketed links are very popular on Wikipedia and a few other top-100 internet sites.

@grafana-dee
Copy link
Contributor Author

This also affects images:

![](http://www.broadgate.co.uk/Content/Upload/DetailImages/Cyclus700(1).jpg)

Becomes:

<p><img src="http://www.broadgate.co.uk/Content/Upload/DetailImages/Cyclus700%281" alt=""/> .jpg)</p>

When it should become:

<p><img src="http://www.broadgate.co.uk/Content/Upload/DetailImages/Cyclus700%281%29.jpg" alt=""/></p>

@rtfb
Copy link
Collaborator

rtfb commented Sep 20, 2014

This is definitely a bug, but a fix is a bit unclear. Skipping balanced pairs of parentheses is easy, but how far ahead should we look? Until EOL? Until a blank line?

@grafana-dee
Copy link
Contributor Author

Gruber encountered this problem with the original Markdown library and helping to support others, he basically used a URL matching regexp that handled pairs of brackets in URLs which resolves the vast majority of instances in which it happens:
http://daringfireball.net/2010/07/improved_regex_for_matching_urls

\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

Regular expression visualization

Debuggex Demo

Note example 6 of the linked demo of that regex.

@rtfb
Copy link
Collaborator

rtfb commented Sep 22, 2014

Ooh, what a beastie :-)

I'll try to take a look if I can plug regexp in the middle of our other parsing and how will that affect the performance. Thanks!

miekg added a commit to miekg/mmark that referenced this issue Dec 22, 2014
Links like:
![](http://www.broadgate.co.uk/Content/Upload/DetailImages/Cyclus700(1).jpg)"
are not correctly parsed because the closing brace of (1) is seen as the
end of the link, which it isn't.

Add code that detects opening braces and tries to find a matching
closing one. If we are left with an uneven pair, fail the link
detections.

Add tests from blackfriday issue #116
russross#116
@sdebruyn
Copy link

@miekg, can you do a PR to blackfriday with your fix?

@icco
Copy link
Contributor

icco commented Jul 23, 2015

Yeah, I run into this almost daily still, right now I just hand replace parens with %28 and %29, but it's not a great fix.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants