Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Network errors when installing hellow world #52

Closed
matklad opened this issue Jun 4, 2024 · 6 comments
Closed

Network errors when installing hellow world #52

matklad opened this issue Jun 4, 2024 · 6 comments

Comments

@matklad
Copy link
Contributor

matklad commented Jun 4, 2024

If I try

cargo run -r  -- install -r hello_world

I get a timeout:

        ERROR run_install:bake:bake_inner:run_bake: brioche_core::bake: error=Request error: error sending request for url (https://registry.brioche.dev/v0/blobs/85b898d9dda3158822f5a0c25e71bc5dceb83c06620b4a0f7bf40370a362556f.zst?brioche=0.1.0): error sending request for url (https://registry.brioche.dev/v0/blobs/85b898d9dda3158822f5a0c25e71bc5dceb83c06620b4a0f7bf40370a362556f.zst?brioche=0.1.0): operation timed out scope=Project { project_hash: ProjectHash(Hash("d0aeff98c15b839e2d99e04c023e84414c48e249eb6e2c695696685c42e3d355")), export: "default" } recipe_hash=39cf103288f270b8c774b4d9c5d5b4d10fff20c5da3faf8a0e2721257cc4773e recipe_kind=Process recipe_hash=39cf103288f270b8c774b4d9c5d5b4d10fff20c5da3faf8a0e2721257cc4773e recipe_kind=Process

If I manually bump in-code timeouts to 60s from 10s, I then get some error about "temporary DNS failure" (sadly, lost the exact text of error somewhere in the git history)

@kylewlacy
Copy link
Member

Huh, interesting... I hadn't seen that particular error for the handful of times I've tested from a fresh environment (although I've only really tested on Ubuntu and my good home internet connection). I remember you mentioned that you were using NixOS, so I might see if I could reproduce through that, but I have a few questions if you feel comfortable answering:

All of the downloads are handled with Reqwest, and the blob fetches specifically don't have a global timeout set (they do have a connect timeout and a read timeout at least). The registry is hosted in Fly.io, but the blobs get redirected and served from Cloudflare R2

@kylewlacy
Copy link
Member

Actually, now that I think about it, I only have Fly.io instances near Seattle, maybe I just need to expand to more Fly.io regions...

@matklad
Copy link
Contributor Author

matklad commented Jun 4, 2024

I am in Lisbon. I can curl the file manually, but it feels very slow for 4 kilobytes (about 10 seconds actually).

@kylewlacy
Copy link
Member

Okay, I've come up with a few possible explanations, and I've also made a few changes to try and address them. Let me try and summarize:

  • One theory I had was that the roundtrip time to the Seattle Fly.io instances was the problem. I've reduced the total number of instances but distributed them over more regions, so if that was the cause, then that should hopefully help
  • Another theory was that the roundtrip time to Cloudflare R2 on the US west coast was the problem. I needed to do some reconfiguration, but the bucket itself is now sitting behind Cloudflare's cache, so that should help too
  • My final theory is just that the Fly.io startup time takes so long that requests will time out. I currently have all the instances set to scale to 0, and they can take a few seconds to spin back up. I'm currently paying the bill out of pocket, so if this is the problem I think the best option would be to just increase the timeouts, unfortunately...

Could you try curling the URL from before again, ideally twice in a row? (the first time will likely be a cold boot, and the second should then hit the already-running instance)


If it seems like things have improved, then the final test would be trying the hello world installation from scratch again (also with some extra debugging for good measure):

  1. Run chmod -R +w ~/.local/share/brioche && rm -rf ~/.local/share/brioche to remove all the locally-stored files
  2. Re-run the install command with env vars: BRIOCHE_LOG_OUTPUT='./brioche.log' BRIOCHE_LOG_DEBUG='[]=debug' brioche install -r hello_world

If that still fails, then brioche.log should at least give some insights

@matklad
Copy link
Contributor Author

matklad commented Jun 5, 2024

Could you try curling the URL from before again, ideally twice in a row? (the first time will likely be a cold boot, and the second should then hit the already-running instance)

Yup, the first time around it took a minute, the second one was fast. I guess it might make sense to bump default timeouts to something like 120 seconds, rather than just 10? 10 is a reasonable number for the steady state, but with cold boots, network topology changes and what not, I think P100 could go higher than that.

@kylewlacy
Copy link
Member

Oh yeah, it sounds like this was resolved by #54

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants