Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Draft] Export Tales that have at least one version and run #435

Open
ThomasThelen opened this issue Jan 12, 2021 · 1 comment
Open

[Draft] Export Tales that have at least one version and run #435

ThomasThelen opened this issue Jan 12, 2021 · 1 comment

Comments

@ThomasThelen
Copy link
Member

Purpose:

The purpose of this issue is to discuss and track the progress of exporting Tales that have version/run structures.

Background:

Tales were recently updated to include the notions of versions and runs. Users can create multiple versions of a Tale; each version may contain multiple runs. Each run has a results/ directory where computational outputs are stored. Each run also has a workspace & data directory which are symlinked back to the version's respective folders. Note that this is where the mutability of the workspace folder comes into play.

During the Jan 11 2021 dev call we discussed a few different possibilities as to what this might look like when Tales are exported and re-imported.

The main take aways are that we can either export Tales with run/version structures or without.

Proposed Approaches:

The following approaches each have strengths, weaknesses, and varying levels of complexity.

Exporting All Runs in a Version

This approach exports all of the runs under a particular version of a Tale. The advantage of this is that users can have a record of all of the runs in the version rather than a limited view of what happened. When importing, a more complete version of the Tale is reconstructed. Note that the original Tale may have many versions. The versions that aren't exported will be lost on an import.

This may be confusing for some published Tales because (presumebly) only ony of the runs are going to be referenced in a linked paper. This also conflicts with the idea of exporting/publishing individual recorded runs (how do we let users export ALL runs and only a recorded run).

Proposed BagIt structure
bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-versions/
     |-version_1/
       |-workspace/
       |-data/
  |-runs/
     |-run1/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
     |-run2/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
     |-run3/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
Importing Changes

To import a Tale with multiple versions, we need to know

  1. The name of the version
  2. The name of each run
  3. A mapping between the run folders on the exported Tale and the name of the run that the user may have specified in Whole Tale.

These constraints can be tackled by

  1. Enforcing a naming convention on the folder names (the version folder name is the name of the Tale's version, each run folder is the name of each run). This can easily be parsed during import.
  2. Adding additional structure to the manifest.json to include metadata about each run and version (most likely requires us to come up with new terms for runs & versions).

eg

{
    @id: "run/1",
    @type: "wt:run"
    schema:name: "run_1"
}

{
    @id: "version/1",
    @type: "wt:version"
    schema:name: "version_1"
}

We can make this arbitrarily complex by inntroducing membership predicates (wasPartOf, etc) to describe relations between versions and runs.

Exporting Individual Runs

This approach exports a particular run of a version, which clearly contrasts exporting all of the runs. The visible difference is that the export looks a little cleaner (personal opinion) and can be useful for users that are interested in a particular result.

This approach is also more streamlined for the use case of exporting reproducible runs: the user interface should look the same for a user exporting a recorded & non-record run.

Proposed BagIt structure (1)
bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-versions/
     |-version_1/
       |-workspace/
       |-data/
  |-runs/
     |-run1/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
Importing Changes (1)

The constraints for exporting are the same as the case for exporting all of the runs. It may be useful to preserve the original naming that was done in the frontend.

Proposed BagIt structure (2)

This BagIt structure is different than the first in that there isn't any indication that the exported Tale is a version/run other than the filesystem artifacts from the run. This is nice because it's conceptually not that confusing (compared to many symlinks that users would be asking about) and much easier to navigate.

bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-results/
    |-.stderr
    |-.stdout
Importing Changes (2)

When importing a Tale with this structure there are a few options.

If we want to preserve the version/run names to partially reconstruct the the Tale, these can be encoded in the mannifest.json file.

We can also ignore the version/run information and place the content in the results/ folder into the workspace/ folder.

The third option is to create a generic Version & Run name and place the results/ artifacts in the appropriate place.

@ThomasThelen
Copy link
Member Author

ThomasThelen commented Jan 13, 2021

We also need to consider users that want to export Tales without Recorded Runs or versions. I think that this is still a legitimate use case that we should support. On the girer_wholetale side this should be mostly trivial since it's already implemented; the trick is getting a flag from the export endpoint dictating whether a run/Tale is being exported.

@ThomasThelen ThomasThelen changed the title Export Tales that have at least one version and run [Draft] Export Tales that have at least one version and run Jan 22, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant