Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Mysterious 404 on a few sites #30

Closed
JoshOrndorff opened this issue Mar 23, 2020 · 5 comments
Closed

Mysterious 404 on a few sites #30

JoshOrndorff opened this issue Mar 23, 2020 · 5 comments

Comments

@JoshOrndorff
Copy link

I use linkcheck for the Substrate Recipes. Thank you for the excellent backend.

So far I've encountered two links that regularly cause the link checker to fail, despite loading fine in a normal web browser. You can see the more recent occurence in this PR JoshOrndorff/substrate-recipes#180 And you can see that I've worked around the issue by adding the url to my exclude list.

Ultimately I'd prefer to properly diagnose the failure rather than excluding them.

@Michael-F-Bryan
Copy link
Owner

I don't think this is specific to the linkchecker. Running curl against the num-traits crate returns a 404 for the same URL.

$ curl -I https://crates.io/crates/num-traits
HTTP/2 404
content-type: application/json; charset=utf-8
content-length: 35
server: nginx
date: Tue, 24 Mar 2020 02:45:06 GMT
set-cookie: cargo_session=sJIiNcfM9yvCHoGNENQaO8JrPoTF1c7xuZ6xe/LTieY=; HttpOnly; Secure; Path=/
strict-transport-security: max-age=31536000
via: 1.1 vegur, 1.1 6e19875b14d906dfd0ef8e65e8726f1d.cloudfront.net (CloudFront)
x-cache: Error from cloudfront
x-amz-cf-pop: PER50-C1
x-amz-cf-id: yBCN032584y1tHHrOzh9Er41QMS01bZ4OZ1IeCBJHpjwwlyH7Y2n9A==
age: 63

I have a feeling this is because crates.io is built using a JavaScript framework like ember or react. When you open it in your browser it'll fall back to / and then the JS router will change the URL to /crates/num-traits. The linkchecker essentially calls reqwest::get(), so we don't run any JS.

This is probably related rust-lang/crates.io#788 (see rust-lang/rustc-dev-guide#184 (comment)).

@mark-i-m
Copy link
Contributor

Yes, this is true for any crates.io URL. We have explicitly blacklisted URLs to crates.io in the rustc-dev-guide.

@JoshOrndorff
Copy link
Author

Okay, guess not much to do here then. Thanks for the explanation.

@dogweather
Copy link

dogweather commented Jan 26, 2024

I found a workaround for link-checking to crates.io. Check docs.rs instead:

Instead of

curl --head https://crates.io/crates/num-complex

Do:

curl --head https://docs.rs/num-complex/latest/num_complex/

@dlaehnemann
Copy link

I had the same problem with a couple of domains / websites and found a different GitHub Action that works for me for link checking: linkspector

It seems to do the checks with mocking up some kind of credible browser session, and then all the websites I currently have in there, give a proper response. Also, it checks internal MarkDown links correctly, and also offers to check links in other formats (like RestructuredText).

For the maintainer, maybe there are good ideas in there? Or this also solves your needs in a more general way? In any case, many thanks for your efforts on this linkchecker, it was very useful!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants