RSECon22 Walkthrough: A FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

This repo contains the material for the RSECon22 walkthrough titled: "A FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows". The walkthrough uses a docker container and Jupyter Labs (formally, a notebook) to run through an example usage of the FAIR Data Pipeline. You can view the walkthrough presentation in the Society of Research Software Engineering channel.

Prerequisites

The only prerequisite is an installation of Docker, which is available free from docker.com.

Running the Docker Container

The docker container is available on the GitHub Package Registry and can be pulled using the following command(s):

AMD64

docker pull ghcr.io/fairdatapipeline/rsecon:latest

ARM64

docker pull ghcr.io/fairdatapipeline/rsecon:aarm64

The container can then be run using the following command:

docker run -p 8000:8000 -p 8888:8888 ghcr.io/fairdatapipeline/rsecon:latest

OR

docker run -p 8000:8000 -p 8888:8888 ghcr.io/fairdatapipeline/rsecon:aarm64

Once the container has started, there will be an address to access the Jupyter Lab within the console. This address will include a token for authentication to the Jupyter Labs page. The link will take the form of: http://127.0.0.1:8888/lab?token=<token>.

This address can then be accessed through your web browser to give you access to the Jupyter Lab installation.

N.B. The container will bind the ports 8000 and 8888 so please make sure these ports are available.

Some package requirements and packages have been pre-installed in the interest of saving time.

Jupyter Notebooks

The docker container contains 8 Jupyter Notebooks detailed below.

CLI and Registry Installation

1_registry_cli_install.ipynb

This notebook contains codeblocks to install the FAIR Command Line Interface (CLI) and the FAIR Local Registry.

Simple Models

The notebooks contain code blocks to run the SEIRS model example in different languages: they contain code to register inputs and run the models.

All the models use the same input and therefor the pull code block only needs to run from one of the files.

2a_python.ipynb

Code blocks to clone the simple model repo, install the simple model package, initalise a Fair repository, register ('pull') the inputs for the model and then 'run' the model in python.

2b_cpp.ipynb

Code blocks to initialise a fair repository, register ('pull') the inputs for the model and then 'run' the model in C++. The C++ repo has already been cloned and the executable has been compiled.

2c_java.ipynb

Code blocks to initialise a fair repository, register ('pull') the inputs for the model and then 'run' the model in JAVA. The Jave repo has already been cloned and the project pre built.

2d_julia.ipynb

Code blocks to initialise a fair repository, register ('pull') the inputs for the model and then 'run' the model in Julia. The Julia repository has been cloned into the docker container and the julia package has already been initialised.

2e_r.ipynb

Code blocks to initialise a fair repository, register ('pull') the inputs for the model and then 'run' the model in R. The R repo has already been cloned and the R Package installed

Comparison of the Simple Models

The SEIRS models can be compared and graphed using the following notebook.

3_comparison.ipynb

Code block to run a comparison of the simple models producing a graph.

Explore the registry

The local registry can be explored by running it and navigating to the web interface at: 127.0.0.1:8000

4_start_registry

Notebook to start and stop registry

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
Notebooks		Notebooks
.gitignore		.gitignore
Dockerfile		Dockerfile
FAIR-WALKTHROUGH_SLIDES.pdf		FAIR-WALKTHROUGH_SLIDES.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RSECon22 Walkthrough: A FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Prerequisites

Running the Docker Container

AMD64

ARM64

Jupyter Notebooks

CLI and Registry Installation

1_registry_cli_install.ipynb

Simple Models

2a_python.ipynb

2b_cpp.ipynb

2c_java.ipynb

2d_julia.ipynb

2e_r.ipynb

Comparison of the Simple Models

3_comparison.ipynb

Explore the registry

4_start_registry

About

Releases

Packages

Contributors 2

Languages

FAIRDataPipeline/RSECon22

Folders and files

Latest commit

History

Repository files navigation

RSECon22 Walkthrough: A FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Prerequisites

Running the Docker Container

AMD64

ARM64

Jupyter Notebooks

CLI and Registry Installation

1_registry_cli_install.ipynb

Simple Models

2a_python.ipynb

2b_cpp.ipynb

2c_java.ipynb

2d_julia.ipynb

2e_r.ipynb

Comparison of the Simple Models

3_comparison.ipynb

Explore the registry

4_start_registry

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages