Tokern Lineage Engine is fast and easy to use application to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.
Tokern Lineage helps you browse column-level data lineage
- visually using kedro-viz
- analyze lineage graphs programmatically using the powerful networkx graph library
- Demo of Tokern Lineage App
-
Checkout an example data lineage notebook.
-
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
Download the docker-compose file from Github repository.
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml -o docker-compose.yml
Run docker-compose
docker-compose up -d
Check that the containers are running.
docker ps
CONTAINER ID IMAGE CREATED STATUS PORTS NAMES
3f4e77845b81 tokern/data-lineage-viz:latest ... 4 hours ago Up 4 hours 0.0.0.0:8000->80/tcp tokern-data-lineage-visualizer
1e1ce4efd792 tokern/data-lineage:latest ... 5 days ago Up 5 days tokern-data-lineage
38be15bedd39 tokern/demodb:latest ... 2 weeks ago Up 2 weeks tokern-demodb
Try out Tokern Lineage App
Head to http://localhost:8000/
to open the Tokern Lineage app
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o tokern-lineage-engine.yml
Run docker-compose
docker-compose up -d
If you want to use an external Postgres database, change the following parameters in tokern-lineage-engine.yml
:
- CATALOG_HOST
- CATALOG_USER
- CATALOG_PASSWORD
- CATALOG_DB
You can also override default values using environement variables.
CATALOG_HOST=... CATALOG_USER=... CATALOG_PASSWORD=... CATALOG_DB=... docker-compose -f ... up -d
For more advanced usage of environment variables with docker-compose, refer to docker-compose docs
Pro-tip
If you want to connect to a database in the host machine, set
CATALOG_HOST: host.docker.internal # For mac or windows
#OR
CATALOG_HOST: 172.17.0.1 # Linux
- Postgres
- AWS Redshift
- Snowflake
- SparkSQL
- Presto
For advanced usage, please refer to data-lineage documentation
Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.