Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Is possible to find out new added block numbers into local repo after each “ipfs add file”? #5826

Open
avatar-lavventura opened this issue Dec 7, 2018 · 13 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature

Comments

@avatar-lavventura
Copy link

avatar-lavventura commented Dec 7, 2018

$ ipfs version --all
go-ipfs version: 0.4.17-
Repo version: 7
System version: 386/linux
Golang version: go1.10.3


I would like to find new added block numbers into local repo, after each ipfs add file. This could be reported after each ipfs add is done on the output of the results.

First time if I do ipfs add file transferred data size should be hash object's original size. But later when I do ipfs add file with the same hash, since the object blocks are already cached it should be 0, where added block number will be 0 as well. Is it possible to find out that?

[Q] Is it possible to find out added block number into repo after each ipfs add file operation? From the block number I guess I can obtain the total size of blocks added into local repo.

Please note that I can do ipfs repo stat before and after ipfs add file to observe change on RepoSize. But if there is another ipfs add processing in parallel on the background, ipfs repo stat will not work, where it will affected from both parallel running ipfs add operations. So it will be efficient to provide output for each ipfs add to see new added blocks into the local repo.

@avatar-lavventura avatar-lavventura changed the title Is possible to find out newly added block numbers after each “ipfs add file”? Is possible to find out new added block numbers after each “ipfs add file”? Dec 7, 2018
@avatar-lavventura avatar-lavventura changed the title Is possible to find out new added block numbers after each “ipfs add file”? Is possible to find out new added block numbers into local repo after each “ipfs add file”? Dec 7, 2018
@eingenito
Copy link
Contributor

Hey @avatar-lavventura - thanks for the feature request. Can you clarify a bit what you're trying to achieve? Are you specifically interested in the number of blocks added? Or are you more interested in how much total new data was added to the repo for a specific ipfs add invocation? Blocks can vary in size dramatically so a count of blocks added would often be a poor indicator of data added.

I think you are right; I don't think there's any way to do either (meaning added blocks or added data size) right now (someone please correct me if I'm wrong). If you could provide additional information about how this data is useful to you that would be great for helping us learn about how people use ipfs.

@Stebalien
Copy link
Member

@eingenito is right. There's no way to do this right now.

Unfortunately, I'm not sure if there is a way to implement this without some pretty invasive changes to some internal APIs.

@magik6k
Copy link
Member

magik6k commented Dec 9, 2018

The easiest way to do this would be on the blockstore layer, since this is where we check if blocks already exist - https://github.com/ipfs/go-ipfs-blockstore/blob/master/blockstore.go#L148 - but it will be fairly hard to wrap in a way we would need in ipfs add - https://github.com/ipfs/go-ipfs/blob/master/core/coreapi/unixfs.go#L68

@Stebalien
Copy link
Member

@magik6k the issue is that that's racy (the same race OP is trying to avoid). To do this reliably, we'd effectively need to turn "add" into an atomic "check and add" (which isn't going to be free).

@magik6k
Copy link
Member

magik6k commented Dec 10, 2018

We already check if Put in blockstore should call datastore.Put, so it shouldn't be hugely expensive, we only need to:

  • In 'top-level' blockstore
    • Create something (like a map+lock) to cover time between Has and Put
    • Expose whether a block reached datastore.Put (or how many did in case of PutMany)
  • Create special blockstore which would count those results, wrap n.Blockstore into that in Uinxfs().Add, and after adding is done expose collected results somehow

@Stebalien
Copy link
Member

Yeah, you're right. We could do that. We could even use a sync.Map and make it lock-free.

Personally, I'd expose those results via a channel registered with the context, the way we do with DHT queries.

@Stebalien Stebalien added the kind/enhancement A net-new feature or improvement to an existing feature label Dec 13, 2018
@avatar-lavventura
Copy link
Author

avatar-lavventura commented May 18, 2019

I am using ipfs on my research , where I am passing large data files in between users and clusters. My objective is to find out how much data transmitted between user and cluster node using ipfs add and how much total new data was added to the repo for that specific ipfs add invocation.

Based on the size of the transmitted data (communication usage) and size of the new added data to the repo (cache usage), I will charge user for their communication and cache usage using a smart contract.

Example:
Step 1:

user_A:

$ mkdir folder
$ cd folder
$ fallocate -l 1G gentoo_root.img
$ fallocate -l 1G gentoo_root_static.img
$ ipfs add -r .
added QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm folder/gentoo_root.img
added QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm folder/gentoo_root_fix.img
added QmXrG14cBN4yz7wnkWCQgcApjiyMUszBP4oPftwGJawzxx folder
 2.00 GiB / 2.00 GiB [=================================================] 100.00%

cluster_B:

$ ipfs get QmXrG14cBN4yz7wnkWCQgcApjiyMUszBP4oPftwGJawzxx
Saving file(s) to QmXrG14cBN4yz7wnkWCQgcApjiyMUszBP4oPftwGJawzxx
 2.00 GB / 2.00 GB [==================================] 100.00% 49s

// Here I assume that transmitted data size should be 2 GB and repo size should be increase for 2 GB. So communication and cache cost will be for 2 GB.


Step 2:

user_A:

$ cd folder
$ echo 'hello' >> gentoo_root.img
$ ipfs add -r .
added Qmew8yVjNzs2r54Ti6R64W9psxYFd16X3yNY28gZS4YeM3 folder/gentoo_root.img
added QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm folder/gentoo_root_fix.img
added QmRT279mp9tE4vcgwpPZuxhxCgexKPecNoFLgxXwcJagmt junk
 2.00 GiB / 2.00 GiB [=================================================] 100.00%

cluster_B:

$ ipfs get QmRT279mp9tE4vcgwpPZuxhxCgexKPecNoFLgxXwcJagmt
Saving file(s) to QmRT279mp9tE4vcgwpPZuxhxCgexKPecNoFLgxXwcJagmt
 2.00 GB / 2.00 GB [=========================] 100.00% 45s

// Here as I understand from the answer of a question I asked before, if the data is already stored it does not re-downloaded again.

They will not fetch the full 1GB, only the block that contains the changed character

Since the data is already cached on the repo, ipfs only downloads the updated block which should be few bytes and repo size increased by few blocks. So communication and cache cost should be for few bytes, where I want to detect that information.

@avatar-lavventura
Copy link
Author

@Stebalien, @magik6k : I was wondering is there any update on this feature request.

@Stebalien
Copy link
Member

No. Unfortunately, this really isn't a high priority issue. If you'd like to see it happen, you'll probably need to implement it yourself.

@avatar-lavventura
Copy link
Author

avatar-lavventura commented Jul 24, 2019

=> Could you please guide me a point to implement it?

@Stebalien
Copy link
Member

The IPFS team is open to feature requests and discussions. I'm just setting expectations and stating that the core IPFS team won't have time to design or implement this feature.


There's a good starting point here: #5826 (comment). Unfortunately, I can't give you much more guidance without digging through the code myself.

@avatar-lavventura
Copy link
Author

Its easy to say that we don't have time but it would be more honest to say that IPFS team does not have capability to design this implementation since they didn't consider it at all.

@Stebalien
Copy link
Member

I'm having trouble parsing your statement but yes, we don't have capacity (people-time).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants