-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add --inplace option for syncing e.g. 15G files with small changes, trading away safety for speed #65
Comments
For comparison: rsync --inplace takes about half the time (around 5minutes) for the example above. |
@klemens-u Have you checked if you have |
@klemens-u |
@brabalan: yes, I saw that copyprog works only for new files. Do you know why there is no option to use rsync also for subsequent syncs? |
I did some further investigating, and there is another big issue: I'm using btrfs and btrbk snapshot backup on the remote machine. Now because there is no "inplace" option, unison creates a separate 15GB file during transfer which finally replaces the original file when finished. This means that for btrfs all blocks have changed! So every btrfs snapshot of a changed virtual machine image will take the whole 15GB space instead of only a few changed MBs.... |
Ouch, this is painful. I think the reason we don't use rsync and don't do inplace transfer is to minimize the time where the filesystem is in an inconsistent state. In other words, if you interrupt unison, you either want the old file or the new file, and not something in between. I guess we could have an option to allow unsafe inplace transfer, but the code to do that needs to be written. |
@brabalan: you're right, for normal operation unison's attempt to minimize inconsistent states is very good and the way to go. I don't have any insight into the inner workings of the current "copyprog" option. But wouldn't it be a simple and clean way to extend the "copyprog / copythreshold" options to delegate all syncing of big files to rsync? This is how a future config file could look like:
Benefits:
Your thoughts? |
I think it would be great, but I don't know how big of a change this would be. One needs to change this line Line 757 in f30b3e3
Copy means the file is new), but I don't know where unison deals with temporary files.
@bcpierce00 : would this be difficult to implement? |
In principle, this wouldn’t be a huge change, but not completely trivial either, and the part of the codebase that deals with copying is rather complex — it always takes me some time to re-figure out how it works. If someone wants to have a go at it, the file copy.ml is the place to start reading.
- B
… On Feb 15, 2017, at 5:07 AM, Alan Schmitt ***@***.***> wrote:
I think it would be great, but I don't know how big of a change this would be. One needs to change this line https://github.com/bcpierce00/unison/blob/f30b3e3215a942a6566a79d09bb12c012825a9d1/src/copy.ml#L757 <https://github.com/bcpierce00/unison/blob/f30b3e3215a942a6566a79d09bb12c012825a9d1/src/copy.ml#L757> (there Copy means the file is new), but I don't know where unison deals with temporary files.
@bcpierce00 <https://github.com/bcpierce00> : would this be difficult to implement?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGsIC_Ihw28WN17kwtAYuTaDAUMkDpqFks5rcs54gaJpZM4MA5hj>.
|
Thanks @bcpierce00. I'd appreciate I someone could take a look at it. |
I believe that the introduction of |
This issue is old and there have been many improvements over the years. Please retest with 2.53.1 to see if the builtin sync is slower than using rsync, and post results and a repro recipe, preferably a script. Or, really, the question is to articulate the difference between how unison behaves and some rsync invocation. Without test results, I'll assume this issue is no longer relevant (standard 30-day feedback timer). (In addition, I'm not really comfortable with an optimization which can result in bad data.) |
There are several issues (or areas of potential improvement) here. First, the lack of inplace update, which not only causes a lot of extra I/O but also ruins fs block-level snapshots. An inplace update seems like a valuable option for expert users to have. It is still an extremely dangerous option but it could be useful for people who know what they're doing (can correctly re-run a sync, or can restore from a snapshot, for example), and it makes the next point below (verification) even more important. I can see this being implemented directly in Unison or delegated to Note that, while not exactly inplace updates, some work has already been done to make block-level snapshots easier and reduce I/O loads. See #577. That code is currently working for whole-file copies. At least on some systems, this work could be hopefully easily adapted to simulate inplace updates. (I don't know if any fs actually support this; if not then a "real" inplace could be implemented.) The good thing about such simulated inplace update is that it would be safer than a real inplace update because it would never leave the target file in an inconsistent state. Then, the last bullet from the original report:
Yes, this is done to verify that 1) the transfer was correct; and 2) that the source file hasn't changed during the transfer. I can only agree that this is very time and I/O consuming but I can't see turning this off either, even for expert users. |
I've opened a PR with a proof-of-concept alternative solution to this request. See #876. Granted, it only works in some configurations (must have platform and filesystem support) but when it does work then it is completely safe, unlike rsync's It works as you'd expect, not copying unchanged data and not breaking snapshots. Anyone interested in this is welcome to test the PR. Even though it is supposed to be safe, please do initial tests on non-production data. |
@klemens-u Have you been able to update to recent and test the draft PR? |
Hello, after some investigation I was happy to see that unison uses an intelligent algorithm which transfers only the modified part e.g. of a 15GB large virtual disk image.
But still, the progress is quite slow and generates a lot of I/O.
My test scenario:
This process takes more than 10 minutes although only a few megabytes of data are transferred over the network.
Therefore I monitored I/O on both machines and the network traffic.
Here is what I found:
Apropos rsync - in reference to "Making Unison Faster on Large Files" (http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html#speeding):
Regarding the "copyprog" option: I couldn't detect any reference to rsync in the debug output (debug=all), or as a separate process. How can I make sure Unison is delegating sync of large files to rsync?
Furthermore, as I understand, rsync is only used for the initial first-time transfer. Is it possible to use rsync also for subsequent syncs?
Thanks!
The text was updated successfully, but these errors were encountered: