Pack multiple stdin outputs into a single snapshot #2133

dhoffend · 2019-01-02T14:07:28Z

Output of `restic version`

restic 0.9.3 compiled with go1.11.1 on linux/amd64

What are you trying to do?

I would like to run several 100 mysqldumps commands (backup separate tables instead the whole db) and I run into several issue that makes backups very unpractical and hard to use

Every stdin backup becomes its own snapshot making the list very cluttered and hard to read. The only way to pack them together would be to use tags with dates included. Not very practical tbh.
The main problem: the execution of restic takes 10-20s before it actually starts doing its job (due to index loading
Instead of restoring all mysqldump at once you have to take every single snapshot ...

What should restic do differently? Which functionality do you think we should add?

I would like to propose an alternative way to backup multiple stdin outputs into a single snapshot

Please provide a --stdin-commands-file <file> to the backup task
The --stdin-commands-file would contain a list of backup jobs/commands (one per line) with the resulting filename as first argmument: <filename><whitespace><command that stdin should be saved><newline> (one filename + command per line). A config file with other syntax is also okay.
Example:

db01.sql mysqldump [...] "db01"
db02.sql mysqldump [...] "db02"
db03.sql mysqldump [...] "db03"

The path name for the whole snapshot could be the basename of the stdin-command-file or the --stdin-filename parameter
restic itself would execute every single command and pipes the stdout into the archiver code internally with the given filename

This way a commandfile can be prepared prior to the execution of restic and a single restic instance could save multiple stdins into a single snapshot. Not only is it more easy to handle mysqldump backup jobs (or something similar), you also end up with faster executing (only 1x the index loading instead of multiple hundred) and be done in minutes rather then hours.

Maybe this could help #1873 aswell. In my case I would like to avoid piping mysqldumps to the disks first before backuping them as a single snapshot.

Did restic help you or made you happy in any way?

Sure. I'm about to switch to restic for my private servers (from rsnapshot) and I'm already using restic in a different environment to backup 100+ servers, but struggling with database dumps and other performance related things (like loading of indexes, memory usage etc. But it has become far better in the last year)

The text was updated successfully, but these errors were encountered:

dhoffend · 2019-01-02T15:45:15Z

Maybe it makes more sense to use a --commands-file parameter to not confuse with the --stdin mode. Basically the backup would execute commands and then backup the stdout of the output. This has nothing to do anymore with the original stdin.

dhoffend · 2019-01-02T17:22:19Z

Looking at the code for too long I can see a following possible way to go:

Clone the fs_reader Code to provide a directory of fake files with their commands, or enhance the fs_reader object to support multiple fake files incl. optional stdout of commands instead of stdin.

The scanner will get a list of all fake files which are defined in the commands-file and hands them over to the Archiver. When the Archiver code calls Open() or OpenFile()? and fs.Command is given for this fake file entry, we would have to execute the given command and pipe the cmd.stdout to the read output so the archiver code can actually store it until the cmd execution reaches EOF. Then the scanner would continue with the next given fake entry and call the next Open()

It sounds possible without too much of a change, apart from messing up with the fs.Reader for fake stdin files.

Any ideas?

fd0 · 2019-01-06T20:10:25Z

Thanks for taking the time to submit this idea. To be honest, I'm not convinced this is a good thing to add for restic. It makes the (already complex) code for reading something from stdin even more complicated. After all, restic is intended to be a tool to backup files.

Would you mind elaborating why the straightforward way (running mysqldump to create files, then backup those files) does not work for you?

I'm sorry if my comment comes across as negative, it's not meant that way. We are a Free Software project for which most people (at least me) work in their spare time, and our development/maintenance/debugging time is very limited. So we're trying to keep restic's scope as small as possible. This also applies to #1873. :)

dhoffend · 2019-01-08T09:36:11Z

Hi fd0 thank for taking your time to comment this. I know that restic strength is to backup files. The feature to backup stuff from stdin makes it also a great tool to backup non-file based stuff.

The main reason why I prefer backups via mysqldump ... | restic backup-stdin ... is IO usage. Imagine you have one or multiple database that have a size of multiple gigabytes (10GB+). Running mysqldump to the filesystem first then running the backup creates a lot of write I/O operations while the system is doing lots of reads as well. This reduces the overall performance quite a lot. In the past I've encountered this by pipeing it through gzip first ... A second problems comes in when you run servers which sync their storage over DBRD. Creating file mysqldumps first would also generate network traffic combined with io operations and io latency based on the network link. This is why I would like to avoid write I/O operations if possible.

The reason why I would like to have some kind of backing up multiple stdin is the memory usage and index loading time of restic. It takes quite some time before restic starts backup up stuff (~10s but it depends on the size of the repo). If you then would like to backup multiple hundred stdin commands (say mysqldumps) and would like to avoid lots write I/O operations it you end up with executing restic multiple hundred times while the waiting time for index loading stacks up.

Sure a --stdin-commands parameter is not mission critical and the old fashioned way (creating files and then backing them up) still works. But the larger the dumps are getting the more I would like to safe write operations in shared environments or when using drbd over network.

Thanks in advance. Maybe I can get my head around it but I've never used go before.

fd0 · 2019-01-11T11:34:10Z

Okay, thanks for taking the time to describe your use case.

micw · 2019-01-29T09:13:03Z

@fd0 We are internally brainstorming our backup strategies and it turns out that the use case "backup files plus large streamed output of multiple commands" is a very common case. Streaming the command outputs to files and back it up along with all the regular files is a workaround but has massive drawbacks in performance and space usage. E.g. we backup apps containing small local data plus large elasticsearch dumps (which not even would fit on a single disk of the system that runs restic). Having all this together in one snapshot would be great for restore consistency.

I'd appreciate if you'd consider to support this use case.

Best regards,
Michael.

MichaelEischer · 2024-07-06T21:47:06Z

Related to #4804

fd0 added state: need feedback waiting for feedback, e.g. from the submitter type: feature suggestion suggesting a new feature labels Jan 6, 2019

fd0 removed the state: need feedback waiting for feedback, e.g. from the submitter label Jan 11, 2019

tim-seoss mentioned this issue Jan 14, 2019

Feed multi-file data to restic from external backup "helper" programs and scripts. #1873

Closed

tamalsaha mentioned this issue Apr 1, 2019

Backup and restore Mongo DB stashed/stash#699

Merged

MichaelEischer added the category: backup label Oct 8, 2020

avonwyss mentioned this issue May 3, 2024

Parse tar data backed up via stdin #2226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pack multiple stdin outputs into a single snapshot #2133

Pack multiple stdin outputs into a single snapshot #2133

dhoffend commented Jan 2, 2019 •

edited

Loading

dhoffend commented Jan 2, 2019

dhoffend commented Jan 2, 2019

fd0 commented Jan 6, 2019

dhoffend commented Jan 8, 2019 •

edited

Loading

fd0 commented Jan 11, 2019

micw commented Jan 29, 2019

MichaelEischer commented Jul 6, 2024

Pack multiple stdin outputs into a single snapshot #2133

Pack multiple stdin outputs into a single snapshot #2133

Comments

dhoffend commented Jan 2, 2019 • edited Loading

Output of restic version

What are you trying to do?

What should restic do differently? Which functionality do you think we should add?

Did restic help you or made you happy in any way?

dhoffend commented Jan 2, 2019

dhoffend commented Jan 2, 2019

fd0 commented Jan 6, 2019

dhoffend commented Jan 8, 2019 • edited Loading

fd0 commented Jan 11, 2019

micw commented Jan 29, 2019

MichaelEischer commented Jul 6, 2024

dhoffend commented Jan 2, 2019 •

edited

Loading

Output of `restic version`

dhoffend commented Jan 8, 2019 •

edited

Loading