Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
investigraph
requires at least Python 3.11
pip install investigraph
There is a dedicated repo for example datasets built with investigraph.
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
View prefect dashboard:
make server
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
make test
Media Tech Lab Bayern batch #3