-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ipfs disk io going haywire on running ipfs-cluster-follower #6921
Comments
I think I understand a bit why the network drops out. I see a lot of those lines in the debug output:
Then the dials fail, because the port cannot be reused, and the log suggests, that the daemon tries to use a random port, but just uses the default port instead - which obviously fails:
Which leads to a completely disconnected node: and
On the other hand, the cluster-follower has reached the latest version of the cluster, starting a lot of repinning operations (of blakeb2-256 stuff - which is actually no longer part of the cluster - since there's a bug: ipfs-cluster/ipfs-cluster#1006)
So they cannot complete, because there's no connection to the DHT and so there's probably no timeout for these operations, to suggest, that there's no source anymore. (Why are they even try to be fetched in the first place, if they are no longer part of the cluster-data-set? Maybe related to this: ipfs-cluster/ipfs-cluster#1006) Additionally, the IO hasn't stopped, while there's (apparently) no new input from the cluster-follower and the network connection is basically completely down. (IPFS is still reporting 131 connections, but the log states that the DHT is not available) But the cluster-follower is still consuming a lot of CPU-time for "doing nothing" in the log.
After a while IPFS seems to have some degree of connectivity back, but very very limited and there's still a lot of IO read going on. A longer debug output log: It's been now 5 hours after starting the cluster-follower, and IPFS have constantly had the disk on the IO-limit without interruption. Some more graphs: I'll let the daemon and ipfs-cluster-follower running in this condition, to see how it will look like in some hours (after the night 11pm here). Wondering if the IO is looping or maybe it's doing a |
I let process run for additionally 2 days, in this time ipfs encountered a memory leak and nearly filled up my swap space... Stopping the service with systemd failed:
And left ipfs as a zombie process behind, still issuing IOs like crazy and consuming quite a lot of cpu power...
|
Please follow https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md and post the resulting dump. That'll help us see what's going on. |
@Stebalien I suspect that the ipfs-cluster-follower did cause or trigger this bug, if there are cluster pins with a CIDv1 and a blake2b-256 hash in the cluster state. Wiping all databases and importing everything with CIDv1, but with sha2-256 made the issue disappear. |
Interesting.... Ah, ok, that makes sense. Cluster is repeatedly asking IPFS to do something. |
Just very strange that the log of ipfs even on debug isn't showing what's going on 🤔 Maybe we could add a test in the long term to IPFS-Cluster to run thru all hash-options to make sure they work in the setup. I mean there are that many that there might be edge cases nobody has checked on specific ones. |
Well, if it's just something repeatedly hitting the go-ipfs API, nothing is wrong as far as go-ipfs is concerned. |
True, but regardless what's hitting the API, the ipfs daemon should avoid to push the system load to > 300. When I should point a finger at the metric which is too high, I would select This might be hard to implement as hardlimit without running into dead locks, but maybe limit additional I/O to one per process every 500ms or something like this. |
IPFS is designed to do the thing you tell it to do, as fast as it can. We do need better internal load balancing (and rate limiting in certain cases), but if you repeatedly tell it to do something, it's going to do it. |
The main issue in this case is, that ipfs opens that many files that it cannot hold any network connection open, since any reconnect runs into the fd limit. I guess the first step would be to ratelimit file openings when it's going to impact opening new connections, like 'high water' plus 20% or something like this. Anyway, to close this ticket (since nothing is particularly wrong with IPFS in this case), if there's a ticket for file access/file opening rate limiting somewhere feel free to add this as an example why it's needed. Else a ticket in the backlog might be nice, with a link to this one as an example. Additionally there's a memory leak somewhere. Which can be obviously triggered by hitting the API with requests (somehow). Don't know if it's reproduceable, but it's might be worth running a ipfs-cluster-follower like I did which has trouble to process to blake2b-256 sums and see if it happens again. I've got no spare VMs/hardware to set something up which can run for days, maybe someone else likes to give this a try. |
That could just be go holding on to memory, go does that a lot. It could also be the peerstore if you're connecting to a ton of nodes (which doesn't currently garbage collect). I'd need a pprof profile to tell. |
I'm going to close this for now as we'd really need a pprof profile to understand exactly what happened. And, as you say, it's probably going to come down to rate limiting. |
Testing for this will be added (hopefully in the short-term) As a side note, cluster pintracker runs, by default, as much as 10 concurrent pin operations (with a 24h timeout). Everything else should be queued in memory (but it seems it was not cluster's memory going loose). |
@hsanjuan it's not writing, it's reading. |
Okay guys, I started my cluster setup again yesterday: Fresh IPFS datastore on both sides, fresh blockstore on both sides, and now on the follower and the cluster service side I got IPFS trying to read as fast as the disk can provide data. On the server it's reading over 100 MB/s from the storage, while the network is pretty silent with below 3 Mbit/s. Both IPFS clients still do work properly, they are just pretty slow. Server is called On loki's side the ipfs log is after the startup empty. The ipfs-cluster-log just shows normal pinning operations, which do happen from time to time. On i3's side, the ipfs log is also empty except one message, cited below and the cluster follower log just shows normal pinning operations, exactly as expected.
Both sides run arch linux, and have the ipfs-daemon installed from the community repo. cluster-follower, cluster-ctl and cluster-service are all binaries from All IPFS data (cluster and node) is stored on a ZFS storage (non-raid/mirror) and atime is deactivated. Load on loki: 70,83, 90,54, 104,17 This is how disk IO and network IO on loki looks like: I'm not terrible good at debugging network communication, but the communication looks perfectly normal to me towards port 5001 on i3. There are many concurrent pinning operations, but it's basically always "POST /api/v0/pin/add?arg=&recursive=true&progress=true HTTP/1.1" by the client and IPFS answers with 4 progress report every second. Just to recap what my script is doing, to make sure I'm not holding IPFS entirely wrong:
This do create a lot of new 'versions' because each file operation is a commit to the cluster, but this doesn't really feel like an issue to me. I could pin the folders non-recursively I would only push the actually changed stuff to the cluster, avoid that all other stuff has to be reviewed if it has changed. There are currently like 4-5 new versions per hour, with like 10-15 changed files average. |
Damnit. I got the issue.
I don't pin the folder, but I pin the folder recursively with a "--expire-in" in the "pin add". I bet this is triggering somehow that the whole data-structure is read again, to make sure IPFS got all blocks I think I'll take the "pinning the root folder" out again, and hope this fixes the issue. |
Version information:
go-ipfs version: 0.4.23-6ce9a355f
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7
Description:
I started the ipfs-cluster-follower on my 'pacman.store' cluster on a notebook with a fresh slate, but I have some of the data already pinned, so the full state needs to run over ipfs and determine which data needs to be unpinned and which data needs to be pinned.
IPFS/ipfs-cluster-follower is running on a dedicated HDD and the repo has no space pressure:
I'm running this now for like 3 hours, and IPFS started to do a lot IO and I wasn't really concerned, but it looks like the process got really IO locked doing only single-digit kbit/s in/out on the network.
If I try to load a website from IPFS (which is stored locally) like
http://127.0.0.1:8080/ipns/ipfs.io/
the page loads for 3-4 minutes.The system load:
While the system is responsive like usually, but the CPU is 'doing' 350% IOWait, because of all the IO.
On the HDD there's basically just read's being done, ZFS's
iostat 1
shows something like this all the time (so 42 read requests with a total of 5.5 MByte, 0 write requests and 0 byte written in the last second).But since ZFS does cache constant IO to a single target extremely we, IPFS is probably reading on the full repo size up and down, so most of it cannot be cached. (just an educated guess).
iotop
shows that the actual disk read bandwidth is a lot lower than the reads done by ipfs, so obviously a lot of the read requests are delivered by zfs off its cache:'ps aux' shows the following numbers for ipfs (which are very steady):
systemd status on ipfs:
fatrace
doesn't seem to work with ZFS, so I couldn't see which reads/write actually going on by zfs.I've set the loglevel of IPFS to warning, but I don't see anything related to IO in the log popping up, just a bunch of 'network connecting failed' etc.
Thoughts:
If ipfs wouldn't run on a dedicated hard drive, the extremely high system load would probably make the system completely unresponsive.
Additionally, the network performance is basically non-existing, while such a process is running, which means you can stall a server/ipfs-node by just issuing a lot of IO-bound API calls, which is somewhat concerning.
I think IPFS needs some basic levels of accounting, how high the IO latency of its operations currently is, or how much its processes are spending currently on waiting for IO. If this number is too high, IPFS should throttle it's IO to the disk to give the system some breathing room...
Some munin-graphs:
You can see, I'd run that cluster-follower before, but cleaned it's state and started it again, to see how it would run with a fresh database. So around 18:00 I've restarted the process.
The text was updated successfully, but these errors were encountered: