Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

normalized_encode incorrectly replaces %3A and %2F in path #472

Open
davidtaylorhq opened this issue Aug 4, 2022 · 2 comments
Open

normalized_encode incorrectly replaces %3A and %2F in path #472

davidtaylorhq opened this issue Aug 4, 2022 · 2 comments

Comments

@davidtaylorhq
Copy link
Contributor

Given a valid URL like https://example.com/article/id%3A1.2%2F1/bar, addressable's normalization replaces %3A with a literal :, and %2F with a literal /. Both of these replacements change the URL, meaning that requests for the URL may fail.

> Addressable::URI.normalized_encode("https://example.com/article/id%3A1.2%2F1/bar")
=> "https://example.com/article/id:1.2/1/bar"

Using URI#normalize has the same issue with %3A, but seems to preserve the %2F correctly:

> Addressable::URI.parse("https://example.com/article/id%3A1.2%2F1/bar").normalize.to_s
=> "https://example.com/article/id:1.2%2F1/bar"

My understanding of RFC3986 is that reserved characters (including : and /) should not be decoded during normalization.

davidtaylorhq pushed a commit to discourse/discourse that referenced this issue Aug 9, 2022
normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472

To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.

Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.

This commit also adds support for unicode domain names and emoji domain names with escape_uri.

This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.
davidtaylorhq pushed a commit to discourse/discourse that referenced this issue Aug 9, 2022
normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472

To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.

Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.

This commit also adds support for unicode domain names and emoji domain names with escape_uri.

This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.
@sporkmonger
Copy link
Owner

I agree w/ your reading of the spec, only unreserved characters should be getting decoded.

@dentarg
Copy link
Collaborator

dentarg commented Jul 19, 2023

I think this issue is the same as #366 but I'll keep this one open until it has been addressed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants