Skip to content

reuse url encoding from url crate, don't use separate percent-encoding #11750

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Feb 21, 2023

Conversation

klensy
Copy link
Contributor

@klensy klensy commented Feb 21, 2023

Reuse encoding from url, don't use separate percent-encoding.

@rustbot
Copy link
Collaborator

rustbot commented Feb 21, 2023

r? @ehuss

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-interacts-with-crates.io Area: interaction with registries S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 21, 2023
@klensy
Copy link
Contributor Author

klensy commented Feb 21, 2023

There is similar one use with percent-encoding

pub fn search(&mut self, query: &str, limit: u32) -> Result<(Vec<Crate>, u32)> {
let formatted_query = percent_encode(query.as_bytes(), NON_ALPHANUMERIC);
let body = self.req(
&format!("/crates?q={}&per_page={}", formatted_query, limit),
None,
Auth::Unauthorized,
)?;

and url

pub fn is_url_crates_io(url: &str) -> bool {
Url::parse(url)
.map(|u| u.host_str() == Some("crates.io"))
.unwrap_or(false)
}

but i don't have good (and simple) idea what to do.

@arlosi
Copy link
Contributor

arlosi commented Feb 21, 2023

Thanks! The example in the crates-io is more difficult since it's a relative URL, and the Url crate can't handle that. The other example doesn't use URL encoding at all, so I think it's fine as-is.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 21, 2023

📌 Commit 37d429c has been approved by arlosi

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 21, 2023
@bors
Copy link
Contributor

bors commented Feb 21, 2023

⌛ Testing commit 37d429c with merge b6c7fbe...

@bors
Copy link
Contributor

bors commented Feb 21, 2023

☀️ Test successful - checks-actions
Approved by: arlosi
Pushing b6c7fbe to master...

@bors bors merged commit b6c7fbe into rust-lang:master Feb 21, 2023
@Eh2406
Copy link
Contributor

Eh2406 commented Feb 21, 2023

Are there tests for the encoding of +? I don't want to break existing use cases re: rust-lang/crates.io#4891

@klensy
Copy link
Contributor Author

klensy commented Feb 21, 2023

Are there tests for the encoding of +? I don't want to break existing use cases re: rust-lang/crates.io#4891

No, i don't see tests for that.

cargo search "hello+f" --limit 101
cargo run search "hello+f" --limit 101

both: (go to https://crates.io/search?q=hello%2Bf to see more)

but here the diff (For that usecase, new behavior for is actually more accurate?)

 cargo search "hello f" --limit 101
(go to https://crates.io/search?q=hello%20f to see more)

cargo run search "hello f" --limit 101
(go to https://crates.io/search?q=hello+f to see more)

@arlosi
Copy link
Contributor

arlosi commented Feb 21, 2023

crates.io can handle both encodings of (space) in the query string (%20 and and +). + must always be encoded as %2B

I think there's a copy-paste error in your diff. Searching for "hello f" produces %20 on existing Cargo.

 cargo search "hello f" --limit 101
(go to https://crates.io/search?q=hello%20f to see more)

@Eh2406
Copy link
Contributor

Eh2406 commented Feb 21, 2023

crates.io can handle both encodings of (space) in the query string (%20 and and +).

Yes.

+ must always be encoded as %2B

I think you have it backwards. https://static.crates.io/crates/libgit2-sys/libgit2-sys-0.12.25+1.3.0.crate works but https://static.crates.io/crates/libgit2-sys/libgit2-sys-0.12.25%2B1.3.0.crate dose not. This is a bug in crate.io, or really a bug in s3, but we can not start hitting it.

weihanglo added a commit to weihanglo/rust that referenced this pull request Feb 23, 2023
15 commits in 17b3d0de0897e1c6b8ca347bd39f850bb0a5b9f6..9d5b32f503fc099c4064298465add14d4bce11e6
2023-02-17 19:45:09 +0000 to 2023-02-22 23:04:16 +0000

- refactor(job_queue): docs and move types around (rust-lang/cargo#11758)
- Scrub more of the test environment (rust-lang/cargo#11757)
- Make more reads of environment variables go through the `Config` (rust-lang/cargo#11754)
- Revert "Update curl-sys to use libcurl 7.88.1" (rust-lang/cargo#11755)
- use consistent case (rust-lang/cargo#11748)
- Switch some tests from `build` to `check` (rust-lang/cargo#11725)
- Fix typo in sparse-registry warning message (rust-lang/cargo#11753)
- reuse url encoding from `url` crate, don't use separate `percent-encoding` (rust-lang/cargo#11750)
- Read environment variables through `Config` instead of `std::env::var(_os)` (rust-lang/cargo#11727)
- Update curl-sys to use libcurl 7.88.1 (rust-lang/cargo#11749)
- mdman: update pretty_assertions to reduce deps (rust-lang/cargo#11747)
- Cleanup tests (rust-lang/cargo#11745)
- Enhance help texts of position args (rust-lang/cargo#11740)
- Fix typo (rust-lang/cargo#11741)
- Update comment about cargo-ok (rust-lang/cargo#11724)
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 23, 2023
Update cargo

15 commits in 17b3d0de0897e1c6b8ca347bd39f850bb0a5b9f6..9d5b32f503fc099c4064298465add14d4bce11e6
2023-02-17 19:45:09 +0000 to 2023-02-22 23:04:16 +0000

- refactor(job_queue): docs and move types around (rust-lang/cargo#11758)
- Scrub more of the test environment (rust-lang/cargo#11757)
- Make more reads of environment variables go through the `Config` (rust-lang/cargo#11754)
- Revert "Update curl-sys to use libcurl 7.88.1" (rust-lang/cargo#11755)
- use consistent case (rust-lang/cargo#11748)
- Switch some tests from `build` to `check` (rust-lang/cargo#11725)
- Fix typo in sparse-registry warning message (rust-lang/cargo#11753)
- reuse url encoding from `url` crate, don't use separate `percent-encoding` (rust-lang/cargo#11750)
- Read environment variables through `Config` instead of `std::env::var(_os)` (rust-lang/cargo#11727)
- Update curl-sys to use libcurl 7.88.1 (rust-lang/cargo#11749)
- mdman: update pretty_assertions to reduce deps (rust-lang/cargo#11747)
- Cleanup tests (rust-lang/cargo#11745)
- Enhance help texts of position args (rust-lang/cargo#11740)
- Fix typo (rust-lang/cargo#11741)
- Update comment about cargo-ok (rust-lang/cargo#11724)
@Eh2406
Copy link
Contributor

Eh2406 commented Mar 2, 2023

I did not notice that this was just for search. Please ignore my rant.

@arlosi
Copy link
Contributor

arlosi commented Mar 2, 2023

This is a bug in crate.io, or really a bug in s3, but we can not start hitting it.

I think it's a bug in Cargo & crates.io. We should be encoding the + in the URL. However, fixing it would be a significant breaking change. The actual files in S3 are storing the + as a .

@ehuss ehuss added this to the 1.69.0 milestone Mar 9, 2023
@Turbo87
Copy link
Member

Turbo87 commented Jun 6, 2023

I think it's a bug in Cargo & crates.io. We should be encoding the + in the URL. However, fixing it would be a significant breaking change.

FWIW I looked into the situation today. I agree that it would be a breaking change for cargo, but I don't agree that it is a bug in cargo.

the myth that + needs to be encoded apparently comes from https://datatracker.ietf.org/doc/html/rfc1866#section-8.2.1, but that is only relevant for the application/x-www-form-urlencoded MIME type and for query parameters and
form submission request payloads. it is essentially irrelevant for URL paths from what I can tell.

I've started a more detailed write-up today, but it's still WIP.

@Turbo87
Copy link
Member

Turbo87 commented Jun 7, 2023

for cross-referencing, I've posted my investigation results at rust-lang/crates.io#4891 (comment) :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-interacts-with-crates.io Area: interaction with registries S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants