-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Linkify punycode-encodes em-dash #28
Comments
Hmm that link doesn't load for me, could you provide a copy here? |
Ah, sorry -- I shuffled things around a bit on my site. Here's the fixed link: |
I see. That's an interesting case, because
In that case, the whole text including em-dash would get linked. Also note that GitHub behaves the same way here: https://www.example.com— We could fix the case where it's part of the domain, see also #29 which has some discussion around that. But what would you expect with the case where it's part of the path? |
I think that if it's part of the path it should be treated as such. I guess this is a broader question, whether this library should reject invalid TLDs? like: https://lists.flounder.online/test/threads/YgYAU45J+dZURu1F@localhost.lan.html I think that the tradeoffs that you've made with the library as written are reasonable though |
Came up in a couple of places: #41, #29, #38, #28. Hopefully we can fix all of these with these changes. Not done yet, still want to have domain checking for URLs with certain schemes (https) but allow everything for others. If we do that, we may be able to unify the email and plain domain parsing with the scheme one too.
I've reworked domain parsing in 0.9.0 (see https://github.com/robinst/linkify/blob/main/CHANGELOG.md#090---2022-07-11), but I haven't addressed this yet. I think we could now do this by rejecting TLDs that contain non-alphanumeric Unicode characters. Note that there are TLDs that contain non-ASCII characters, see examples here (but they would be alphanumeric): https://en.wikipedia.org/wiki/Internationalized_country_code_top-level_domain |
Hi! Thanks for this library -- I use it in my new mailing list software to detect links in emails. Someone brought what appears to be a bug to my attention: https://lists.flounder.online/crabmail/threads/1beaffd2384b.html
Here's my code:
https://git.alexwennerberg.com/crabmail/file/src/utils.rs.html#l22
I think that this could unambiguously be parsed, but I'm not 100% sure. What do you think?
The text was updated successfully, but these errors were encountered: