Skip to content

Requirements

Marc Zimmermann edited this page Apr 20, 2020 · 3 revisions

Basic Functionality

  • create archive package out of a given directory and its subtree and calculate various md5 sums and compress tar
    • additionally split large archives
    • optionally encrypt tars
  • integrity checks: take archive package, unpack files, calculate md5 sums and verify they match
  • list files in archive package (transparently across splits)
  • extract files: given path in archive package, extract files in that path

Archive Package Structure

A project is archived into an archive package. An archive package is a directory containing the following files:

  • base archive: project_name.tar.lz
  • content listing: project_name.tar.lst (roughly the output of ls -l)
  • archive md5: project_name.tar.lz.md5
  • files md5: project_name.md5

Splitting

Projects with large data should be split into smaller parts to simplify file handling and enhance parallelization.

The files should be partitioned into different archives, s.t. every archive part is independent of another (avoiding one bad archive corrupting the rest). Partitioning could be done by enumerating all files to be archived and then greedily pack files into a part until the size limit given by the user is reached.

Every subarchive could follow the same structure as a "normal" archive, instead of project_name.tar.lz use project_name_part0001.tar.lz.

Compression

Encryption

TBD

Clone this wiki locally