Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

caching: investigate supporting between-workflows caching #68

Open
tiborsimko opened this issue Nov 3, 2023 · 0 comments
Open

caching: investigate supporting between-workflows caching #68

tiborsimko opened this issue Nov 3, 2023 · 0 comments
Assignees

Comments

@tiborsimko
Copy link
Member

Snakemake offers automated caching capabilities when a user restarts the same workflow on the same workspace. Snakemake automatically reuses outputs of past rules if their inputs did not change, and re-executes only those rules that really need it. This works well and is already fully supported in REANA.

Snakemake offers another experimental caching feature for between-workflows caching. Here the cache is external to the workspaces, so it can be used when the user needs e.g. to store input files or big computations that will be reused in several independent workflows. The user can then govern Snakemake's behaviour by means of a cache: True clause in rules instructing the workflow engine to use it or not. This feature is not currently supported by REANA.

The goal of this issue is:

  • First, experiment with Snakemake between-workflows caching feature outside of REANA to see whether the feature works well in situations from simple ones (when only code or data is changed) to complex ones (when the container image is changed whilst "hiding" behind the same fully-qualified image name, such as the user changing the image and repushing under the same "latest" tag).

  • Second, investigate whether we can support this feature in REANA easily. For example, the user John Doe could set as a secret the environment variable SNAKEMAKE_OUTPUT_CACHE pointing to his EOS directory (/eos/home-j/johndoe/mysnakemakecache) that would be used for between-workflows cache storage, and the user would then add cache: True to the Snakefile rules when the cache can be activated.

  • We don't really need any commands to inspect the cache or otherwise manipulate its files, since the cache will be stored on a storage solution external to REANA such as EOS. Hence the users could use regular tools to access, inspect, or otherwise manage the cached content.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants