-
Notifications
You must be signed in to change notification settings - Fork 1
Requirements
Marc Zimmermann edited this page Apr 20, 2020
·
3 revisions
- create archive package out of a given directory and its subtree and calculate various md5 sums and compress tar
- additionally split large archives
- optionally encrypt tars
- integrity checks: take archive package, unpack files, calculate md5 sums and verify they match
- list files in archive package (transparently across splits)
- extract files: given path in archive package, extract files in that path
A project is archived into an archive package. An archive package is a directory containing the following files:
- base archive: project_name.tar.lz
- content listing: project_name.tar.lst (roughly the output of
ls -l
) - archive md5: project_name.tar.lz.md5
- files md5: project_name.md5
Projects with large data should be split into smaller parts to simplify file handling and enhance parallelization.
The files should be partitioned into different archives, s.t. every archive part is independent of another (avoiding one bad archive corrupting the rest). Partitioning could be done by enumerating all files to be archived and then greedily pack files into a part until the size limit given by the user is reached.
Every subarchive could follow the same structure as a "normal" archive, instead of project_name.tar.lz
use project_name_part0001.tar.lz
.
- use lzip (improved data recovery, TODO document better)
- parallel version: https://www.nongnu.org/lzip/plzip.html
TBD
- several keys per archive (i.e. several people can decrypt)
- implement some sort of revoke mechanism, i.e. some people are not allowed to decrypt?
- encrypt then compress or compress then encrypt?
- make sure different key for every archive (part)? handling initialization vectors? (see https://crypto.stackexchange.com/questions/30077/would-it-be-better-to-split-a-file-and-then-encrypt-or-vice-versa)
- use AES256? Is there a multicore implementation? Exploiting hardware support?
- optionally encrypt metadata (i.e. filenames)?