Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ipfs disk io going haywire on running ipfs-cluster-follower #6921

Closed
RubenKelevra opened this issue Feb 21, 2020 · 16 comments
Closed

ipfs disk io going haywire on running ipfs-cluster-follower #6921

RubenKelevra opened this issue Feb 21, 2020 · 16 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@RubenKelevra
Copy link
Contributor

Version information:

go-ipfs version: 0.4.23-6ce9a355f
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

Description:

I started the ipfs-cluster-follower on my 'pacman.store' cluster on a notebook with a fresh slate, but I have some of the data already pinned, so the full state needs to run over ipfs and determine which data needs to be unpinned and which data needs to be pinned.

IPFS/ipfs-cluster-follower is running on a dedicated HDD and the repo has no space pressure:

NumObjects: 460034
RepoSize:   82 GB
StorageMax: 135 GB
RepoPath:   /mnt/data/ipfs
Version:    fs-repo@7

I'm running this now for like 3 hours, and IPFS started to do a lot IO and I wasn't really concerned, but it looks like the process got really IO locked doing only single-digit kbit/s in/out on the network.

If I try to load a website from IPFS (which is stored locally) like http://127.0.0.1:8080/ipns/ipfs.io/ the page loads for 3-4 minutes.

The system load:

$ uptime
 20:23:46 up 6 days,  2:59, 14 users,  load average: 337,79, 334,95, 333,55
$ egrep '' /proc/pressure/*
/proc/pressure/cpu:some avg10=4.47 avg60=6.77 avg300=4.81 total=16938566618
/proc/pressure/io:some avg10=99.70 avg60=99.47 avg300=99.80 total=82896961387
/proc/pressure/io:full avg10=68.67 avg60=63.31 avg300=71.07 total=67733411744
/proc/pressure/memory:some avg10=0.05 avg60=0.16 avg300=0.12 total=76255322
/proc/pressure/memory:full avg10=0.00 avg60=0.03 avg300=0.04 total=45394303

While the system is responsive like usually, but the CPU is 'doing' 350% IOWait, because of all the IO.

On the HDD there's basically just read's being done, ZFS's iostat 1 shows something like this all the time (so 42 read requests with a total of 5.5 MByte, 0 write requests and 0 byte written in the last second).

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data        96,0G  53,0G     42      0  4,49M      0

But since ZFS does cache constant IO to a single target extremely we, IPFS is probably reading on the full repo size up and down, so most of it cannot be cached. (just an educated guess).

iotop shows that the actual disk read bandwidth is a lot lower than the reads done by ipfs, so obviously a lot of the read requests are delivered by zfs off its cache:

Screenshot_20200221_211321

'ps aux' shows the following numbers for ipfs (which are very steady):

$ ps aux | grep -E "^USER|ipfs" | grep -vE "journalctl|grep"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ruben    14358 12.1  6.1 4097620 744788 pts/6  Sl+  17:04  26:44 ./ipfs-cluster-follow cluster.pacman.store run --init cluster.pacman.store/default.json
ruben    14426 33.8 12.9 10264444 1564436 ?    Ssl  15:39 103:43 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment

systemd status on ipfs:

# systemctl status ipfs@ruben
● ipfs@ruben.service - InterPlanetary File System (IPFS) daemon
   Loaded: loaded (/usr/lib/systemd/system/ipfs@.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/ipfs@.service.d
           └─override.conf
   Active: active (running) since Fri 2020-02-21 15:39:06 CET; 5h 31min ago
 Main PID: 14426 (ipfs)
    Tasks: 610 (limit: 4915)
   Memory: 1.4G
   CGroup: /system.slice/system-ipfs.slice/ipfs@ruben.service
           └─14426 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment

fatrace doesn't seem to work with ZFS, so I couldn't see which reads/write actually going on by zfs.

I've set the loglevel of IPFS to warning, but I don't see anything related to IO in the log popping up, just a bunch of 'network connecting failed' etc.

Thoughts:

If ipfs wouldn't run on a dedicated hard drive, the extremely high system load would probably make the system completely unresponsive.

Additionally, the network performance is basically non-existing, while such a process is running, which means you can stall a server/ipfs-node by just issuing a lot of IO-bound API calls, which is somewhat concerning.

I think IPFS needs some basic levels of accounting, how high the IO latency of its operations currently is, or how much its processes are spending currently on waiting for IO. If this number is too high, IPFS should throttle it's IO to the disk to give the system some breathing room...

Some munin-graphs:

memory-day
load-day
cpu-day

You can see, I'd run that cluster-follower before, but cleaned it's state and started it again, to see how it would run with a fresh database. So around 18:00 I've restarted the process.

@RubenKelevra RubenKelevra added the kind/bug A bug in existing code (including security flaws) label Feb 21, 2020
@RubenKelevra
Copy link
Contributor Author

I think I understand a bit why the network drops out. I see a lot of those lines in the debug output:

Feb 21 22:03:16 i3 ipfs[14426]: 22:03:16.970 DEBUG swarm2: [limiter] adding a dial job through limiter:...

Then the dials fail, because the port cannot be reused, and the log suggests, that the daemon tries to use a random port, but just uses the default port instead - which obviously fails:

Feb 21 22:03:16 i3 ipfs[14426]: 22:03:16.977 DEBUG reuseport-: failed to reuse port, dialing with a random port: dial tcp4 0.0.0.0:4001->172.x.x.x:4001: connect: cannot assign requested address reuseport.go:60

Which leads to a completely disconnected node:
Feb 21 22:03:17 i3 ipfs[14426]: 22:03:17.221 DEBUG dht: error connecting: failed to dial : all dials failed

and

Feb 21 22:03:17 i3 ipfs[14426]: 22:03:17.221 DEBUG dht: not connected. dialing. query.go:244

On the other hand, the cluster-follower has reached the latest version of the cluster, starting a lot of repinning operations (of blakeb2-256 stuff - which is actually no longer part of the cluster - since there's a bug: ipfs-cluster/ipfs-cluster#1006)

21:29:42.134  INFO pintracker: Restarting pin operation for bafkreiee4izj7yezixz4rb4r7zxqokqwqirjs6ke6svlddhr2em6ngxkxe stateless.go:413
21:29:42.134  INFO pintracker: Restarting pin operation for bafkreiaunymn7uqqefsjcllq7dhzedmotloazmlago5oi6asti25lsuah4 stateless.go:413
21:29:42.134  INFO pintracker: Restarting pin operation for bafkreibhezrv6pst3oqcpd62lnatgulcvbyszg3wijzvjljfxhqyrkl6wi stateless.go:413
21:29:42.134  INFO pintracker: Restarting pin operation for bafkreidvwxokkoervj534wmc5fk6yhr4wxfvpwy2eweziawqcvwbek52qa stateless.go:413
21:29:42.135  INFO pintracker: Restarting pin operation for bafkreiegglmtlx4kvfpre72tcm42mpmyre75tguxzmi5tiy4b3cybaec2i stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreicqrgbjwsl6c4xnupgzyvp7rewyitq7abyjg4gp7vghijxrwp3zyy stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreic6masts3cw4wed7ld3iwyysich7cmxlnkhukfgwqlls4qjgogjhy stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreicyuznlfosf7carenskudtkfozqlxcdgsjtgogkt6ry3zc34pgwg4 stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreieej67zrpzlw2ixzdanzvlszd6c4japgljqixn24io2csfu6j5azi stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreibuqsy5zaxpybxhohtj2lpe4g47nvdcjymv3bgdvxyjrhjhtyjvna stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreib6xhnzviq2fxtmxhmubjawotoiuxl3xnm6wq4zr6ighxgzyllv4u stateless.go:413
21:29:42.138  INFO pintracker: Restarting pin operation for bafkreiasihysntcucscx5mmle3r3ppghl4b3cmb2czueur2y7bos3fevke stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreicmhnpfs5j7l4skdkejy5rsdo56jvqkxmsdjsyhbwulj3h7f7toki stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreibp2svlizv4u4zfk5qjrohju6liuyiujmrkmz2u6i5ng6vw6r4fku stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreialwgkk3uvh7vt3o4qqhu7gfzaut27mqj63i5daxmnsbpk6nkxfjq stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreib2vbna2dlej3p5ylno6t6xrjiclpae6xkf4uuyoyrsdu7esxqyze stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreibjv3vyil7grhhirjm5pcez46fz64ww4sanfbrwarvjfzn7ilffjq stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreic3mp2ds3rd4auv7jveq2oatvvhqysbdg6uqijkco5axinp7coxfm stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreiasnc5jnw6nojbbejsogf4stt4ijqt4lujb3l4hwwswx5rns3myvq stateless.go:413
21:29:42.139  INFO pintracker: Restarting pin operation for bafkreiawrcnqjxtuqyr5mqazowkh6dnz6v73pkxvsns3opgs3phuyan24q stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreiba3x5fxr2zv2xn2palrzppsp2jnpevai6myqrw4mivtqfmrgbuu4 stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreiafonqtekohldw57jllzmtoufpkgfbatowpiusnxcdtyxwop7nzay stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreicnxa6bs4mclvyy7fj5tqfzzfhzmor7bwflwmbemxqjgvhlxuoium stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreiesah5a3qkw5sefeas7vq2oiz5t3pd6xl3gdr3zvu2cxk5kq3xc4m stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreiawapv2yewmvq7b4yh2khinjcmkgn2rp7mcci4ovfno6362uen5uy stateless.go:413
21:29:42.140  INFO pintracker: Restarting pin operation for bafkreialsene74n4swqqt6rlhydud7pn5iqkk6e3lugngzpn3qswl6iks4 stateless.go:413
21:29:42.141  INFO pintracker: Restarting pin operation for bafkreickkrzclsnminbdeum7rjntsxv5nywm6uj76fbljxuyr7amnjtpha stateless.go:413
21:29:42.141  INFO pintracker: Restarting pin operation for bafkreidemqbwayvlh2llexto54bxidnjckvoid637awcve33d6gsh5odwe stateless.go:413
21:29:42.142  INFO pintracker: Restarting pin operation for bafkreicl5zulyiagikoipipsd4l4uowvdcriceweeayvar4yon3ts2dgxu stateless.go:413
21:29:42.151  INFO pintracker: Restarting pin operation for bafkreibajuj2k3t2ghuheobktkq2utz65n7l6exeojsdt3vpgdm3l64hg4 stateless.go:413
21:29:42.151  INFO pintracker: Restarting pin operation for bafkreiep4dp5u4xaw6c7twwz4xfbbfa22cejr7phpmral7ywqtubyorneq stateless.go:413
21:29:42.151  INFO pintracker: Restarting pin operation for bafkreicaoi33ozkymn5ld274td3j2pk2ykyb4zo2ogvggzvuhelge5qnda stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreice3efjijgalr4khy5ndzvum46x7vd7rk64raxzubhafbpoze4ldy stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreidvrpc7nublkqe56accwqdukv7wcapcscqxm2gh23jtliu7yjta7a stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreibxxrimsc4oljaledfeuwanlgu2ql7dvpwi4fpfeht74e4orwlmkq stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreidib5sjjkztgrdoyc4useywtj4qxkefpgk4pavi2hgxye36zyj4ce stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreibe3ltbkat4bhf65izs2wulavvg6qtl2drz3smi6uzvfxdqqwtg3q stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreidcdb5m2erhjzrylhlumsaljdopvgyuryqppb3mv7hoxfasahxm7e stateless.go:413
21:29:42.152  INFO pintracker: Restarting pin operation for bafkreidnmunbq5xuusabkggf3to34ff5gqbzzqqbxs7gvtm3ma57wwsbly stateless.go:413

So they cannot complete, because there's no connection to the DHT and so there's probably no timeout for these operations, to suggest, that there's no source anymore. (Why are they even try to be fetched in the first place, if they are no longer part of the cluster-data-set? Maybe related to this: ipfs-cluster/ipfs-cluster#1006)

Additionally, the IO hasn't stopped, while there's (apparently) no new input from the cluster-follower and the network connection is basically completely down. (IPFS is still reporting 131 connections, but the log states that the DHT is not available)

But the cluster-follower is still consuming a lot of CPU-time for "doing nothing" in the log.

$ ps aux | grep -E "^USER|ipfs" | grep -vE "journalctl|grep"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ruben    14358 12.3  6.2 4165528 756328 pts/6  Sl+  17:04  42:52 ./ipfs-cluster-follow cluster.pacman.store run --init cluster.pacman.store/default.json
ruben    14426 29.8 10.9 10265212 1332968 ?    Ssl  15:39 129:32 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub --enable-mplex-experiment

After a while IPFS seems to have some degree of connectivity back, but very very limited and there's still a lot of IO read going on.

A longer debug output log:
debug_output_ipfs.log

It's been now 5 hours after starting the cluster-follower, and IPFS have constantly had the disk on the IO-limit without interruption. Some more graphs:

sdc-day
sdc-day (1)
vmstat-day
cpu-day (1)
load-day (1)
memory-day (1)
netstat-day
if_wlp3s0-day
forks-day
threads-day

I'll let the daemon and ipfs-cluster-follower running in this condition, to see how it will look like in some hours (after the night 11pm here).

Wondering if the IO is looping or maybe it's doing a repo verify/pin verify times the cluster pins or something like this.

@RubenKelevra
Copy link
Contributor Author

I let process run for additionally 2 days, in this time ipfs encountered a memory leak and nearly filled up my swap space...

Stopping the service with systemd failed:

# systemctl status ipfs@ruben
● ipfs@ruben.service - InterPlanetary File System (IPFS) daemon
   Loaded: loaded (/usr/lib/systemd/system/ipfs@.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/ipfs@.service.d
           └─override.conf
   Active: failed (Result: timeout) since Mon 2020-02-24 19:56:22 CET; 44min ago
 Main PID: 14426

Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12040 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12042 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12047 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12049 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12050 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12051 (ipfs) with signal SIGKILL.
Feb 24 19:54:52 i3 systemd[1]: ipfs@ruben.service: Killing process 12059 (ipfs) with signal SIGKILL.
Feb 24 19:56:22 i3 systemd[1]: ipfs@ruben.service: Processes still around after final SIGKILL. Entering failed mode.
Feb 24 19:56:22 i3 systemd[1]: ipfs@ruben.service: Failed with result 'timeout'.
Feb 24 19:56:22 i3 systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.

And left ipfs as a zombie process behind, still issuing IOs like crazy and consuming quite a lot of cpu power...

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ruben    14426 40.8  0.0      0     0 ?        Zsl  Feb21 1887:42 [ipfs] <defunct>

memory-week
cpu-week

@Stebalien
Copy link
Member

Please follow https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md and post the resulting dump. That'll help us see what's going on.

@RubenKelevra
Copy link
Contributor Author

@Stebalien I suspect that the ipfs-cluster-follower did cause or trigger this bug, if there are cluster pins with a CIDv1 and a blake2b-256 hash in the cluster state.

Wiping all databases and importing everything with CIDv1, but with sha2-256 made the issue disappear.

@Stebalien
Copy link
Member

Interesting.... Ah, ok, that makes sense. Cluster is repeatedly asking IPFS to do something.

@RubenKelevra
Copy link
Contributor Author

Just very strange that the log of ipfs even on debug isn't showing what's going on 🤔

Maybe we could add a test in the long term to IPFS-Cluster to run thru all hash-options to make sure they work in the setup. I mean there are that many that there might be edge cases nobody has checked on specific ones.

@Stebalien
Copy link
Member

Just very strange that the log of ipfs even on debug isn't showing what's going on

Well, if it's just something repeatedly hitting the go-ipfs API, nothing is wrong as far as go-ipfs is concerned.

@RubenKelevra
Copy link
Contributor Author

True, but regardless what's hitting the API, the ipfs daemon should avoid to push the system load to > 300.

When I should point a finger at the metric which is too high, I would select I/O sleep shown on the VMstat graph. I think the ipfs daemon should limit itself to like 4 times the amount of cores or something like this (this would be 16 or 8, depending if you count the virtual cores or not).

This might be hard to implement as hardlimit without running into dead locks, but maybe limit additional I/O to one per process every 500ms or something like this.

@Stebalien
Copy link
Member

IPFS is designed to do the thing you tell it to do, as fast as it can. We do need better internal load balancing (and rate limiting in certain cases), but if you repeatedly tell it to do something, it's going to do it.

@RubenKelevra
Copy link
Contributor Author

The main issue in this case is, that ipfs opens that many files that it cannot hold any network connection open, since any reconnect runs into the fd limit.

I guess the first step would be to ratelimit file openings when it's going to impact opening new connections, like 'high water' plus 20% or something like this.

Anyway, to close this ticket (since nothing is particularly wrong with IPFS in this case), if there's a ticket for file access/file opening rate limiting somewhere feel free to add this as an example why it's needed.

Else a ticket in the backlog might be nice, with a link to this one as an example.

Additionally there's a memory leak somewhere. Which can be obviously triggered by hitting the API with requests (somehow).

Don't know if it's reproduceable, but it's might be worth running a ipfs-cluster-follower like I did which has trouble to process to blake2b-256 sums and see if it happens again.

I've got no spare VMs/hardware to set something up which can run for days, maybe someone else likes to give this a try.

@Stebalien
Copy link
Member

I guess the first step would be to ratelimit file openings when it's going to impact opening new connections, like 'high water' plus 20% or something like this.

libp2p/go-libp2p-swarm#165

Additionally there's a memory leak somewhere. Which can be obviously triggered by hitting the API with requests (somehow).

That could just be go holding on to memory, go does that a lot. It could also be the peerstore if you're connecting to a ton of nodes (which doesn't currently garbage collect).

I'd need a pprof profile to tell.

@Stebalien
Copy link
Member

I'm going to close this for now as we'd really need a pprof profile to understand exactly what happened. And, as you say, it's probably going to come down to rate limiting.

@hsanjuan
Copy link
Contributor

Maybe we could add a test in the long term to IPFS-Cluster to run thru all hash-options

Testing for this will be added (hopefully in the short-term)

As a side note, cluster pintracker runs, by default, as much as 10 concurrent pin operations (with a 24h timeout). Everything else should be queued in memory (but it seems it was not cluster's memory going loose). ipfs diag cmds should have shown how many API operations were ongoing. I guess if it was a very wide DAG and a directory node had the wrong CIDs it may trigger a very large amount of lookups for un-gettable blocks inside IPFS, so hopefully correcting ipfs-cluster/ipfs-cluster#1006 will fix that. But in that case I am left wondering what was ipfs writing to disk in the first place.

@Stebalien
Copy link
Member

@hsanjuan it's not writing, it's reading.

@RubenKelevra
Copy link
Contributor Author

Okay guys, I started my cluster setup again yesterday: Fresh IPFS datastore on both sides, fresh blockstore on both sides, and now on the follower and the cluster service side I got IPFS trying to read as fast as the disk can provide data.

On the server it's reading over 100 MB/s from the storage, while the network is pretty silent with below 3 Mbit/s.

Both IPFS clients still do work properly, they are just pretty slow.

Server is called loki, client is called i3.

On loki's side the ipfs log is after the startup empty. The ipfs-cluster-log just shows normal pinning operations, which do happen from time to time.

On i3's side, the ipfs log is also empty except one message, cited below and the cluster follower log just shows normal pinning operations, exactly as expected.

03:16:52.649 ERROR     swarm2: swarm listener accept error: unexpected flag swarm_listen.go:78

Both sides run arch linux, and have the ipfs-daemon installed from the community repo. cluster-follower, cluster-ctl and cluster-service are all binaries from dist.ipfs.io.

All IPFS data (cluster and node) is stored on a ZFS storage (non-raid/mirror) and atime is deactivated.

Load on loki: 70,83, 90,54, 104,17
Load on i3: 153,06, 156,34, 179,37

This is how disk IO and network IO on loki looks like:

Screenshot_20200227_205010
Screenshot_20200227_211018


I'm not terrible good at debugging network communication, but the communication looks perfectly normal to me towards port 5001 on i3.

There are many concurrent pinning operations, but it's basically always "POST /api/v0/pin/add?arg=&recursive=true&progress=true HTTP/1.1" by the client and IPFS answers with 4 progress report every second.


Just to recap what my script is doing, to make sure I'm not holding IPFS entirely wrong:

  • I sync two folders with rsync.
  • I process the rsync log file to see which files has been changed, added, deleted.
    • I go ahead and add new files via ipfs-cluster-ctl add and copy them by ipfs files cp in a MFS structure
    • Changed files get deleted from the MFS by ipfs files rm and I rerun the ipfs-cluster-ctl pin add with a pin timeout, to hold those files in the cluster for additionally 2 month, and then get rid of them automatically. Then I add the new version via ipfs-cluster-ctl add and create a link via ipfs files cp in the MFS.
    • Deleted files get the same threatment as changed files, just no new version gets added.
  • When this is done, I pin the root of them MFS folder to the cluster (this is only recursively possible at the moment, see a non-recursive option for the pin-command ipfs-cluster/ipfs-cluster#1009), to hold the folder structure in the cluster, and publish the new CID as IPFS.

This do create a lot of new 'versions' because each file operation is a commit to the cluster, but this doesn't really feel like an issue to me. I could pin the folders non-recursively I would only push the actually changed stuff to the cluster, avoid that all other stuff has to be reviewed if it has changed.

There are currently like 4-5 new versions per hour, with like 10-15 changed files average.

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Feb 27, 2020

Damnit. I got the issue.

When this is done, I pin the root of them MFS folder to the cluster (this is only recursively possible at the moment, see ipfs-cluster/ipfs-cluster#1009), to hold the folder structure in the cluster, and publish the new CID as IPFS.

I don't pin the folder, but I pin the folder recursively with a "--expire-in" in the "pin add".

I bet this is triggering somehow that the whole data-structure is read again, to make sure IPFS got all blocks

I think I'll take the "pinning the root folder" out again, and hope this fixes the issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

3 participants