Skip to content

Latest commit

 

History

History
222 lines (170 loc) · 9.61 KB

File metadata and controls

222 lines (170 loc) · 9.61 KB

Intent Specification to Workflow Generation

Requirements

  • Python (3.11)
    • List of packages in requirements.txt (install with pip install -r requirements.txt)
      • PySHACL package available at DTIM PySHACL repo (requirements.txt already has the correct link).
  • Node.js (for the demo)
    • List of packages in package.json (install with npm install)

Directory Structure

  • common/: Common code for ontology and workflow generation, essentially namespace definition and base graph generation.
  • dataset_annotator/: Code for annotating datasets with ontology terms. Check usage in section below.
  • demo/: Code for the demo. Check usage in section below.
  • experiment_lab: Code for running the experiments. Check usage in section below.
  • ontologies/: Ontology used in the project. Divided in three files:
  • ontology_populator/: Code for generating the ontology. Check usage in section below.
  • pipeline_generator/: Code for generating workflows. Check usage in section below.
  • pipeline_tranlsator: Code for translating ontology workflows into KNIME workflows. Check usage in section below.

Dataset Annotator

Utility script to annotate csv datasets with ontology terms.
Reads all the csv files in the datasets directory and outputs the annotated datasets in the annotated_datasets directory.
Must be run from the dataset_annotator directory.

cd dataset_annotator
python3 main.py

Experiments

Scripts to execute the complexity experiments. Contains two scripts:

  • fake_cbox_generator.py: Creates a series of CBoxes representing different scenarios. The parameters of the scenarios (number of component, number of requirements per component and number of components per requirement) can be modified from inside the script. Stores the generated CBoxes in the fake_cboxes directory.

    cd experiment_lab
    python3 fake_cbox_generator.py
  • experiment_runner.py: Runs the experiments and outputs the results in the results directory.

    cd experiment_lab
    python3 experiment_runner.py

Ontology Populator

Scripts and classes to generate the TBox and the CBox.

TBox

The TBox can be generated by the tbox_generator.py script, which stores it in the ontologies directory (you can pass an alternative destination path as a parameter to the script).

cd ontology_populator
python3 tbox_generator.py

CBox

The CBox can be generated by the cbox_generator.py script, which stores it in the ontologies directory (you can pass an alternative destination path as a parameter to the script).

cd ontology_populator
python3 cbox_generator.py

It adds the following elements:

  • Problems: Specified in the script along their hierarchy
  • Algorithms: Specified in the script along the problem they solve
  • Models: Specified in the script
  • Shapes: Defined in the script
  • Implementations: Defined in the implementations directory
  • Components: Defined in the implementations directory
  • Parameters: Defined in the implementations directory
  • Transformations: Defined in the implementations directory

Implementations and related entities

Implementations and their related entities (namely components, parameters and transformations) are defined in the implementations directory. The directory contains the core package, defining the base classes, and the knime package, which contains subclasses of the base ones adding KNIME-specific behaviour, and specifies all the implementations (and related entities) available.

The classes in the core package are:

  • Implementation: Base class for implementations. Contains all the information related to the implementation (name, algorithm, parameters, inputs, and outputs), and is responsible for creating the RDF triples for the implementation (add_to_graph method).
  • Component: Base class for components. Contains all the information related to the component (name, implementation, exposed parameters, and overriden parameters), and is responsible for creating the RDF triples for the component (add_to_graph method).
  • Parameter: Base class for parameters. Contains all the information related to the parameter.
  • Transformation: Base class for transformations. Contains all the information related to the transformation (query and language). It also contains the two specialized transformations: CopyTransformation and LoaderTransformation.

The subclasses of the base classes defined the knime package are all defined in the knime_implementation.py file, and add the KNIME-specific information necessary to translate the RDF triples into KNIME workflows.

The knime package also contains several implemented implementations and components, which can be checked for reference.

The knime_miner.py script can be used to generate a skeleton of the implementations available in KNIME. It takes as input a JSON file containing the information of the KNIME Nodes (nodeDocumentation.json), and creates a hierarchy of Python packages. This can be used as a starting point to define the implementations in the ontology.

However, the public documentation doesn't provide many of the necessary information, specially for the parameters, so to define the implementations and components, the following steps are recommended:

  1. Open KNIME and create a workflow with the desired nodes.
  2. Save the workflow as a KNIME workflow (.knwf file).
  3. Decompress the KNIME workflow (the .knwf is a zip file).
  4. Check the config file of the nodes you want to define (there is a directory for each node with a settings.xml file inside).
  5. Define the parameters. There has to be a Parameter for every leaf tag in the model tag.
  6. Define the components. There has to be at least one Component for every Implementation, specifying which parameters are exposed and which are overriden.

Pipeline Generator

The pipeline generator can be used to generate workflows using the ontology and some user input.
It has to be run from the pipeline_generator directory.

cd pipeline_generator
python3 pipeline_generator.py

It will ask for the intent name (which can be whatever you want), the dataset name (which must be an annotated existing dataset), and the problem name (which must be an existing problem). It will also ask for a folder to store the generated workflows.

Introduce the intent name [DescriptionIntent]:  
Introduce the data name [titanic.csv]: 
Introduce the problem name [Description]: 
Introduce the folder to save the workflows:

You can use the default values for the three first questions for a quick example.

Pipeline Translator

The pipeline translator will translate the ontology-represented workflows into KNIME workflows.
It has to be run from the pipeline_translator directory.

cd pipeline_translator
python3 pipeline_translator.py

It will ask for a source directory (which must contain the ontology-represented workflows) and a destination directory, where the translated workflows will be stored. It will also ask whether you want to keep the KNIME workflows in the folder format or not.
The folder format is just the .knwf file decompressed. If you are testing or debugging the translation, it will make it easier to check the generated workflows (you can still just decompress the workflow yourself).

Source folder:
Destination folder:
Keep workflows in folder format? [Y/n]:

You can also use the translator in non-interactive mode, by passing the source and destination folders as parameters.

python workflow_translator.py <source_folder> <destination_folder>
python workflow_translator.py --keep <source_folder> <destination_folder>

Demo

The Demo is a web application that allows the user to generate workflows using the ontology, as well as giving a more fine-grained control over the generation process.

To run it, make sure you have all the dependencies installed (see Requirements), and run the following commands.

The backend must be run from the project root directory.

flask --app ./demo/demo_api/api.py run

The frontend must be run from the demo_web directory.

cd demo/demo_web
npm run dev

Note that the demo uses slightly modified versions of the pipeline generator and translator, which can be found in the demo_api directory.