-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Verify uploaded files #130
Conversation
cc26aee
to
4abb0d2
Compare
Great work! This does miss one important scenario: since the file is still written in place, you can still end up with corrupt files:
In order to actually guarantee integrity, data should always be written to a temporary file, which is atomically moved in place once the entire file has been written and its hash verified. Additionally, I suggest adding a flag to disable this feature or making it optional. While sha256 is really fast on modern systems, there are plenty of cheap and old NAS servers where this feature would kill performance, like an old Synology DS213j NAS where rest-server is CPU limited to 20MB/s even without this feature. |
I'd like to handle atomic uploads in a separate PR. The idea for this PR was to ensure that successfully uploaded files are guaranteed to be intact.
My raspberrypi 4 agrees:
It would also be possible to let restic provide a md5 hash (see restic/restic#3178) which is about 4 times faster to verify on my raspberrypi. |
Great!
That would add an extra overhead on the client side without actually providing the integrity benefits that checking the object sha256 would give. It would detect errors introduced during the network transfer, but these are I believe fairly unlikely and protected against by checksums on the datas link layer (e.g. CRC32 in Ethernet). TLS may provide extra protection here, but perhaps you could still end up with garbage as long as the right number of bytes were written. But I am not sure about this part. |
BTW, if we want to add some hash to protect against data link layer errors, it may make more sense to pick a fast modern non-crypto hash like FNV instead of something like MD5, which is neither fast nor secure. |
The idea behind restic/restic#3178 is to verify the integrity from the time the pack file is assembled (at which point also the sha256/(md5) hashes are calculated) all the way until it is received at the server-side. That would detect all sorts of bit flips while uploading a pack file. A non-crypto hash would thus actually be enough. On the other hand using the sha256 hash would make it trivial to avoid all sorts of (possible) problems with mismatches between hashes etc. Could you test how slow the sha256 hash is on the NAS you've mentioned?
CRC32 will only filter out most, but by far not all damaged ethernet packets. As the checksum calculation is usually offloaded to the network card/adapter it won't detect bugs, like garbage data in otherwise valid network packets after waking up from standby (I have one network adapter which exhibits that exact problem.). TLS will detect such transmission errors as the MAC for the packets will be invalid. The MAC there is also long enough to reliably filter out corrupted network packets even when transferring TBs of data. On a large enough scale, we'll also see bitflips in the CPU / memory (just take a look at some of the prune failures in the restic bugtracker). Therefore the idea behind this PR is to lay the foundation for end-to-end checksums. The most important part is to ensure that the upload works correctly as we'd otherwise store broken data. Therefore the goal is ensure the integrity of the pack files from as early as possible (effectively the moment it was assembled) up to the moment when it is stored. That also allows distinguishing whether data corruption happened on the client or the rest-server side. |
OK, this makes sense. Handling bitflips in CPU/memory is a hard one though, there is always some part of the process that you cannot check.
I retired this NAS a few weeks ago and can no longer test, but a few datapoints:
So likely very very slow. |
In the end I think it's also fine for rest-server to just not offer verification on this kind of hardware. Bitflips in the CPU, non-ECC memory and a filesystem without data checksums and scrubbing are probably more likely to introduce errors. And since these kind of devices tend to run locally, it is doable to run restic verify once in a while. |
d121f92
to
9a14875
Compare
That looks like ~20MB/s SHA256 could be possible, such that the end result would also be 5-10MB/s.
It's definitely a much better user experience when at least some of these bitflips are already detected during a backup such that restic can retry saving data in case of a temporary bitflip (during transmission) or fail the backup run in case of a permanent one (the pack file content was already corrupted at the client). The nice part about having that verification is that it's much easier to pinpoint the component which is at fault: if the packfile content matches its hash but is invalid anyways, then the client is to blame. If the packfile content doesn't match its hash, then the error happened at the rest-server, as it only stores valid packfiles. (Ignoring that the save operation is currently not atomic). But for low-power hardware it's probably better to offer a way to skip the verification. I've added an option "--skip-upload-verification". |
I really like this feature. I could also create a separate PR for this change of library once this is merged. |
@buschjost I think a separate PR would be the way to go. It will probably help already fast servers the most. The biggest speed improvement I've seen is due to support for the x86 SHA extension (2.3 GB/s vs. 0.6GB/s with the standard library). For hosts without special hardware support, the performance advantage seems to be shrinking with recent go versions. |
9a14875
to
76cddb9
Compare
After talking to @rawtaz I've renamed the option to |
76cddb9
to
df464de
Compare
This pull request looks like it has been finished so far. Whats preventing it from beeing merged? |
The PR still needs a review and the plan was to merge it after #112 lands. |
Restic uses the sha256 hash to calculate filenames based on the file content. Check on the rest-server side that the uploaded file is intact and reject it otherwise.
I've ported the code on the master branch (with #112 included), please have a look! From my side, we still need to decide whether we will just use the file name (which does not work for |
Look good to me! Note that my concerns about this still being written in-place are still true, but this can be solved in a different PR (added issue #152). If
For the data files, it should really use the filename, or you could end up with a repo where the contents do not match the filename. Passing an sha256sum for the config file would be nice to have, but since this is only written once I think it's fine not to check it. There is a small chance of a repo becoming unusable due to a partial config write in combination with |
Huh? How can that happen? If the hash is checked and there's an error, the file is removed directly afterwards (independent of whether or not append-only mode is enabled), here [...]
// calculate hash for current request
hasher := sha256.New()
written, err = io.Copy(outFile, io.TeeReader(r.Body, hasher))
// reject if file content doesn't match file name
if err == nil && hex.EncodeToString(hasher.Sum(nil)) != objectID {
err = fmt.Errorf("file content does not match hash")
}
}
if err != nil {
_ = tf.Close()
_ = os.Remove(path)
[...]
} |
It can happen if the server crashes in the middle of a write. |
Ok, good point. We'll address it in the other PR. |
If a config file is only written partially, then a user won't be able to upload any data to the repository. It might not be the optimal user experience, but it shouldn't be much of a problem either. An additional header would either be used exclusively for the config file or add a few dozen bytes of overhead to each request. In total, I think we should just stick with the filename. |
Fine with me, thanks for looking at the code again @MichaelEischer! |
What is the purpose of this change? What does it change?
The restic uses the sha256 hash to calculate filenames based on the file content. Check on the rest-server side that the uploaded file is intact and reject it otherwise.
This PR complements restic/restic#3178 . However, this PR does not depend on the mentioned restic PR and should work with every restic version.
Was the change discussed in an issue or in the forum before?
Fixes #122
Checklist
[ ] I have added documentation for the changes (in the manual)changelog/unreleased/
that describes the changes for our users (template here)gofmt
on the code in all commits