Runnable examples of YW provenance queries highlighted in poster for DataONE AHM 2016.
The purpose of this demo is to demonstrate the Yesworkflow
(YW) query ability to use the prospective provenance
created by YW and the retrospective provenance
together to answer queries that can not be answered solely by prospective provenance or retrospective provenance.
The prospective provenance in this demo is created by YW which models conventional scripts and programs as scientific workflows. YW can provide a number of the benefits of using a scientific workflow management system without having to rewrite scripts and other scientific software. A YW user simply adds special YW comments to existing scripts. These comments declare how data is used and results produced, step by step, by the script. Then, YW interprets these comments and produces graphical output that reveals the stages of computation and the flow of data in the script.
There are various approaches to capture retrospective provenance. Retrospective provenance observables, e.g., from DataONE RunManagers
(file-level), ReproZip
(OS-level), or noWorkflow
(Python code-level) only yield isolated fragments of the overall data lineage and processing history. In this demo, two types of retrospective provenance observables are used: yw-recon
and DataONE RunManager
. The yw-recon
can search the file system for files that match the URI templates declared for @IN and @OUT ports in the script. On the other hand, DataONE RunManager
can record a list of input and output files for a script run.
Directory | Description |
---|---|
examples/ | Contains examples demonstrating the queries in the queries folder |
queries/ | it stores the scripts to the nine demo queries we asked. |
rules/ | it contains a set of Prolog rules for generating prospective yesworkflow views rules (yw_rules.P and yw_views.P ), retrospective reconstructed rules (recon_rules.P ), graph rendering rules (gv_rules.P ), and populating graph rules (yw_graph_rules.P ). |
The example subfolders also have a typical folder structure:
dataone-ahm-2016-poster/examples/<my_example>/
Subfolders that all <my_example>
folders have:
Directory | Description |
---|---|
script/ | the example script or scripts that make up <my_example> |
facts/ | the YW facts for <my_example>, generated by running YW on the example script(s) |
views/ | materialized views for <my_example> |
recon/ | reconstructed provenance used for <my_example> |
results/ | all artifacts generated by make.sh |
supplementary/ | a folder with supplementary files and information about the example |
clean.sh | removes generated demo artifacts for <my_example> |
make.sh | creates demo artifacts for <my_example> |
Note: after running clean.sh
and make.sh
, you can use git status to see what demo artifacts have just been created.
simulate_data_collection/
├── clean.sh
├── facts
│ ├── yw_extract_facts.P
│ └── yw_model_facts.P
├── make.sh
├── results
├── script
│ ├── calibration.img
│ ├── cassette_q55_spreadsheet.csv
│ └── simulate_data_collection.py
└── views
└── yw_views.P
- The following free software are required in order to run this demo.
-
XSB: a Logic Programming and Deductive Database system for Unix and Windows. It is available at [XSB homepage] (http://xsb.sourceforge.net). The download and installation page for XSB is at [here] (http://xsb.sourceforge.net/downloads/downloads.html).
-
Graphviz: a Graph Visuzlization Software for Unix and Windows. It is available at Graphviz homepage. The download and installation page for Graphviz is at here. The download page is at here.
-
SQLite: a high-reliability, embedded, zero-configuration, public-domain, SQL database engine. It is availabe at SQLite homepage.
- The following open-source packages are used in our demo project.
- Clone the
dataone-ahm-2016
git repo to your local machine using the command:git clone https://github.com/idaks/dataone-ahm-2016-poster.git
.
-
Go to the examples/ folder. We have provided four examples here:
- One MATLAB example (
C3C4/
) - Three Python examples (
LIGO/
,Twitter/
andsimulate_data_collection/
)
- One MATLAB example (
-
Go to one of the above example. First, run the cleaning script by calling
bash clean.sh
or./clean.sh
-
Run the demo example by calling
bash make.sh
or./make.sh
.
-
Copy your example folder under examples/ folder. There are already four examples there:
C3C4
,LIGO
,Twitter
, andsimulate_data_collection
. -
Reorganize your directory layout for your example to be the same as
C3C4
,LIGO
, andsimulate_data_collection
. Create arecon/
folder which contains yourreconfacts.P
. -
Copy two script files
clean.sh
andmake.sh
from thesimulate_data_collection
of the existing three examples to your own example folder. -
Open
make.sh
and customize the scripting name, outputfile name, parameter data object name to your example. -
Run
bash make.sh
.
Please read Query README in the demo repo.
We have created a Docker image (yesworkflow/provenance-demo
) to help readers to explore the YesWorkflow demonstrated provenance queries. In the yesworkflow/provenance-demo
image, the XSB, Graphivz, YesWorkflow, noWorkflow, dataone demo queries are installed. Users can boot up a Docker container to run the demo provenance queries using this image within seconds, without the need to manually install packages.
Here are instructions for each OS:
As part of this installation process, you’ll need to use a shell prompt. There’s a special version of the shell that comes pre-configured for using Docker commands. Users need to use the above shell prompt in order to run a Docker command or type a specific Docker command. Here is how to open it:
- Mac OS – launch the
Docker Quickstart Terminal
application from Launchpad. - Linux – launch any bash shell prompt, and
docker
will already be available. - Windows – click the
Docker Quickstart Terminal
icon on your desktop.
Users can use the following command to download the image from Docker Hub which is similar to GitHub. The command syntax is docker pull IMAGE_NAME
. The name of our current provenance query image is yesworkflow/provenance-demo. Users can type the following command into a shell prompt.
docker pull yesworkflow/provenance-demo
This will download the image from Docker Hub
for Docker images.
Once downloaded the image, users can run it using the command docker run
. Executing docker run
will create a Docker container which is isolated from the user's local computer. Here are some configuration options for docker run
.
-i
: interactive session-t
: TTY-v H:C
: mount the host path on your computerH
at the pathC
inside the Docker container.
The full command to run the provenance query looks like:
docker run -it -v $HOME:$HOME yesworkflow/provenance-demo
Then, users can go to ... to check the query results.