- Python (3.11)
- List of packages in
requirements.txt
(install withpip install -r requirements.txt
)- PySHACL package available at DTIM PySHACL repo (
requirements.txt
already has the correct link).
- PySHACL package available at DTIM PySHACL repo (
- List of packages in
- Node.js (for the demo)
- List of packages in
package.json
(install withnpm install
)
- List of packages in
common/
: Common code for ontology and workflow generation, essentially namespace definition and base graph generation.dataset_annotator/
: Code for annotating datasets with ontology terms. Check usage in section below.demo/
: Code for the demo. Check usage in section below.experiment_lab
: Code for running the experiments. Check usage in section below.ontologies/
: Ontology used in the project. Divided in three files:ontology_populator/
: Code for generating the ontology. Check usage in section below.pipeline_generator/
: Code for generating workflows. Check usage in section below.pipeline_tranlsator
: Code for translating ontology workflows into KNIME workflows. Check usage in section below.
Utility script to annotate csv datasets with ontology terms.
Reads all the csv files in the datasets
directory and outputs the annotated datasets
in the annotated_datasets
directory.
Must be run from the dataset_annotator
directory.
cd dataset_annotator
python3 main.py
Scripts to execute the complexity experiments. Contains two scripts:
-
fake_cbox_generator.py
: Creates a series of CBoxes representing different scenarios. The parameters of the scenarios (number of component, number of requirements per component and number of components per requirement) can be modified from inside the script. Stores the generated CBoxes in thefake_cboxes
directory.cd experiment_lab python3 fake_cbox_generator.py
-
experiment_runner.py
: Runs the experiments and outputs the results in theresults
directory.cd experiment_lab python3 experiment_runner.py
Scripts and classes to generate the TBox and the CBox.
The TBox can be generated by the tbox_generator.py
script, which stores it
in the ontologies
directory (you can pass an alternative destination path as a parameter to the
script).
cd ontology_populator
python3 tbox_generator.py
The CBox can be generated by the cbox_generator.py
script, which stores it
in the ontologies
directory (you can pass an alternative destination path as a parameter to the
script).
cd ontology_populator
python3 cbox_generator.py
It adds the following elements:
- Problems: Specified in the script along their hierarchy
- Algorithms: Specified in the script along the problem they solve
- Models: Specified in the script
- Shapes: Defined in the script
- Implementations: Defined in the implementations directory
- Components: Defined in the implementations directory
- Parameters: Defined in the implementations directory
- Transformations: Defined in the implementations directory
Implementations and their related entities (namely components, parameters and transformations) are defined in
the implementations
directory. The directory contains
the core
package, defining the base classes, and the
knime
package, which contains subclasses of the base ones adding
KNIME-specific behaviour, and specifies all the implementations (and related entities) available.
The classes in the core
package are:
Implementation
: Base class for implementations. Contains all the information related to the implementation (name, algorithm, parameters, inputs, and outputs), and is responsible for creating the RDF triples for the implementation (add_to_graph
method).Component
: Base class for components. Contains all the information related to the component (name, implementation, exposed parameters, and overriden parameters), and is responsible for creating the RDF triples for the component (add_to_graph
method).Parameter
: Base class for parameters. Contains all the information related to the parameter.Transformation
: Base class for transformations. Contains all the information related to the transformation (query and language). It also contains the two specialized transformations:CopyTransformation
andLoaderTransformation
.
The subclasses of the base classes defined the knime
package are all defined in
the knime_implementation.py
file, and add the
KNIME-specific information necessary to translate the RDF triples into KNIME workflows.
The knime
package also contains several implemented implementations and
components, which can be checked for reference.
The knime_miner.py
script can be used to generate a
skeleton of the implementations available in KNIME. It takes as input a JSON file containing the information of the
KNIME Nodes (nodeDocumentation.json
), and creates a hierarchy
of Python packages. This can be used as a starting point to define the implementations in the ontology.
However, the public documentation doesn't provide many of the necessary information, specially for the parameters, so to define the implementations and components, the following steps are recommended:
- Open KNIME and create a workflow with the desired nodes.
- Save the workflow as a KNIME workflow (
.knwf
file). - Decompress the KNIME workflow (the
.knwf
is a zip file). - Check the config file of the nodes you want to define (there is a directory for each node with a
settings.xml
file inside). - Define the parameters. There has to be a Parameter for every leaf tag in the
model
tag. - Define the components. There has to be at least one Component for every Implementation, specifying which parameters are exposed and which are overriden.
The pipeline generator can be used to generate workflows using the ontology and some user input.
It has to be run from the pipeline_generator
directory.
cd pipeline_generator
python3 pipeline_generator.py
It will ask for the intent name (which can be whatever you want), the dataset name (which must be an annotated existing dataset), and the problem name (which must be an existing problem). It will also ask for a folder to store the generated workflows.
Introduce the intent name [DescriptionIntent]:
Introduce the data name [titanic.csv]:
Introduce the problem name [Description]:
Introduce the folder to save the workflows:
You can use the default values for the three first questions for a quick example.
The pipeline translator will translate the ontology-represented workflows into KNIME workflows.
It has to be run from the pipeline_translator
directory.
cd pipeline_translator
python3 pipeline_translator.py
It will ask for a source directory (which must contain the ontology-represented workflows) and a destination directory,
where the translated workflows will be stored. It will also ask whether you want to keep the KNIME workflows in the
folder format or not.
The folder format is just the .knwf
file decompressed. If you are testing or debugging the
translation, it will make it easier to check the generated workflows (you can still just decompress the workflow
yourself).
Source folder:
Destination folder:
Keep workflows in folder format? [Y/n]:
You can also use the translator in non-interactive mode, by passing the source and destination folders as parameters.
python workflow_translator.py <source_folder> <destination_folder>
python workflow_translator.py --keep <source_folder> <destination_folder>
The Demo is a web application that allows the user to generate workflows using the ontology, as well as giving a more fine-grained control over the generation process.
To run it, make sure you have all the dependencies installed (see Requirements), and run the following commands.
The backend must be run from the project root directory.
flask --app ./demo/demo_api/api.py run
The frontend must be run from the demo_web
directory.
cd demo/demo_web
npm run dev
Note that the demo uses slightly modified versions of the pipeline generator and translator, which can be found in the
demo_api
directory.