This repository aims to collect and curate a list of projects which are related both to python and semantic technologies (RDF, OWL, SPARQL, Reasoning, ...). It is inspired by collections like awesome lists. The list might be incomplete and biased, due to the limited knowledge of its authors. Improvements are very welcome. Feel free to file an issue or a pull request. Every section is alphabetically sorted.
Furthermore, this repository might serve as a cristallization point for a community interested in such projects – and how they might productively interact. See this discussion for more information.
-
Bioregistry - The Bioregistry
- docs: https://bioregistry.readthedocs.io
- website: https://bioregistry.io/
- features:
- Open source (and CC 0) repository of prefixes, their associated metadata, and mappings to external registries' prefixes
- Standarization of prefixes and CURIEs
- Interconversion between CURIEs and IRIs
- Generation of context-specific prefix maps for usage in RDF, LinkML, SSSOM, OWL, etc.
-
brickschema – Brick Ontology Python package
- Brick is an open-source effort to standardize semantic descriptions of the physical, logical and virtual assets in buildings and the relationships between them.
- docs: https://brickschema.readthedocs.io/en/latest/
- website: https://brickschema.org/
- features:
- basic inference with different reasoners
- web based interaction (by means of Yasgui)
- Translations from different formats (Haystack, VBIS)
-
Cooking with Python and KBpedia
- Tutorial series on "how to pick tools and then use Python for using and manipulating the KBpedia knowledge graph"
- Material in form of Jupyter Notebooks,
- accompanying python package cowpoke,
-
CubicWeb a framework to build semantic web applications
- website: https://www.cubicweb.org
- docs: https://cubicweb.readthedocs.io/en/latest/
- features:
- An engine driven by the explicit data model of the application
- RQL, an intuitive query language close to the business vocabulary
- An architecture that separates data selection and visualisation
- Data security by design
- An efficient data storage
-
Eddy - graphical ontology editor
- website: https://www.obdasystems.com/eddy
- features:
- graphical ontology editing
- uses bespoke Graphol format but has an OWL2 export
- visualization built on PyQt5
- literature references:
-
fastobo-py: Python bindings for fastobo (rust library to parse OBO 1.4)
- features:
- load, edit and serialize ontologies in the OBO 1.4 format
- features:
-
FunOwl – functional OWL syntax for Python
- features:
- provide a pythonic API that follows the OWL functional model for constructing OWL
- features:
-
Gastrodon - puts RDF data on your fingertips in Pandas; gateway to matplotlib, scikit-learn and other visualization tools.
- features:
- interpolate variables into SPARQL queries
- access local RDFlib graphs and remote SPARQL protocol endpoints
- convert SPARQL result set to pandas dataframes
- understandable error messages
- input/output graphs in Turtle form
- conversion between RDF collections and Python collections
- Sphinx domain to incorporate RDF data into documentation
- features:
-
gizmos – Utilities for ontology development
- features:
- modules for "export", "extract", "tree"-rendering
- features:
-
Jabberwocky – a toolkit for ontologies
- features:
- associated text mining using an ontology terms & synonyms
- tf-idf for synonym curation then adding those synonyms into an ontology
- features:
-
kglab - Graph Data Science
- docs: https://derwen.ai/docs/kgl/
- tutorial: https://derwen.ai/docs/kgl/tutorial/
- features:
- an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries
- perspective: there are several "camps" of graph technologies, with little discussion between them
- focus on supporting "Hybrid AI" approaches that combine two or more graph technologies with other ML work
- PyData stack – e.g., Pandas, scikit-learn, etc. – allows for graph work within data science workflows
- scale-out tools – e.g., RAPIDS, Arrow/Parquet, Dask – provide for scaling graph computation (not necessarily databases)
- graph algorithm libraries include NetworkX, iGraph, cuGraph – plus related visualization libraries in PyVis, Cairo, etc.
- W3C libraries in Py also lacked full integration: RDFlib, pySHACL, OWL-RL, etc.
- pslpython provides for probabilistic soft logic, working with uncertainty in probabilistic graphs
- additional integration paths and examples show how to work with deep learning (PyG)
- import paths from graph databases, such as Neo4j
- import paths from note-taking tools, such as Roam Research
- usage in MkRefs to add semantic features into MkDocs so that open source projects can federate bibliographies, shared glossaries, etc.
- kglab team provides hands-on workshops at technology conferences for people to gain experience with these different graph approaches
-
KGX - Library for building and exchanging knowledge graphs
- docs: https://kgx.readthedocs.io/
- features:
- Load graphs into an in-memory model to facilitate data integration, validation, and graph operations
- Provides an easy way to bring data into Biolink Model, a a high-level data model for biomedical knowledge graphs
- The core data structure is a Property Graph (PG), represented internally using a
networkx.MultiDiGraph
- Supports various input and output formats including,
- RDF serializations
- SPARQL endpoints
- Neo4j endpoints
- CSV/TSV and JSON
- OWL
- OBOGraph JSON format
- SSSOM
-
LangChain's GraphSparqlQAChain – A LangChain module for making RDF and OWL accessible via natural language
- docs: https://python.langchain.com/docs/use_cases/graph/graph_sparql_qa
- features:
- Generates SPARQL SELECT and UPDATE queries from natural language
- Runs the generated queries against local files, endpoints, or triple stores
- Returns natural language responses
-
LinkML – Linked Open Data Modeling Language
- features:
- A high level simple way of specifying data models, optionally enhanced with semantic annotations
- A python framework for compiling these data models to json-ld, json-schema, shex, shacl, owl, sql-ddl
- A python framework for data conversion and validation, as well as generated Python dataclasses
- features:
-
Macleod – Ontology development environment for Common Logic (CL)
- features:
- Translating a CLIF file to formats supported by FOL reasoners
- Extracting an OWL approximation of a CLIF ontology
- Verifying (non-trivial) logical consistency of a CLIF ontology
- Proving theorems/lemmas, such as properties of concepts and relations or competency questions
- GUI (alpha state)
- features:
-
Morph-KGC – System to create RDF and RDF-star knowledge graphs from heterogeneous sources with R2RML, RML and RML-star
- docs: https://morph-kgc.readthedocs.io
- features:
- support for relational databases, tabular files (e.g. CSV, Excel, Parquet) and hierarchical files (XML and JSON)
- generates RDF and RDF-star knowledge graphs by running through the command line or as a library
- integrates with RDFlib and Oxigraph to load the generated RDF directly to those libraries
-
nxontology – NetworkX-based library for representing ontologies
- features:
- load ontologies into a
networkx.DiGraph
orMultiDiGraph
from.obo
,.json
, or.owl
formats (powered by pronto / fastobo) - compute information content scores for nodes and semantic similarity scores for node pairs
- load ontologies into a
- features:
-
obonet – read OBO-formatted ontologies into NetworkX
- features:
- Load an
.obo
file into anetworkx.MultiDiGraph
- Users should try nxontology first, as a more general purpose successor to this project
- Load an
- features:
-
OnToology – System for collaborative ontology development process
- docs: http://ontoology.linkeddata.es/stepbystep
- live version: http://ontoology.linkeddata.es/
- citable reference: https://doi.org/10.1016/j.websem.2018.09.003
-
OntoPilot – software for ontology development and deployment
- docs: https://github.com/stuckyb/ontopilot/wiki
- features:
- support end users in ontology development, documentation and maintainance
- convert spreadsheet data (one entity per row) to owl files
- call a reasoner before triple-store insertion
-
ontospy – Python library and command-line interface for inspecting and visualizing RDF models
- docs: http://lambdamusic.github.io/Ontospy/
- features:
- extract and print out any ontology-related information
- convert different OWL syntax variants
- generate html documentation for an ontology
-
ontor – Python library for manipulating and vizualizing OWL ontologies in Python
- features:
- tool set based on owlready2 and networkx
- features:
-
owlready2 – ontology oriented programming in Python
- docs: https://owlready2.readthedocs.io/en/latest/index.html
- features:
- parse owl files (RDF/XML or OWL/XML)
- parse SWRL rules
- call reasoner (via java)
- literature references:
- Lamy, JB: Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies. Artificial Intelligence In Medicine 2017;80:11-28
- Lamy, JB: Ontologies with Python, Apress, 2020
- accompanying material: https://github.com/Apress/ontologies-w-python
-
Oxrdflib – Oxrdflib provides rdflib stores using pyoxigraph (rust-based)
- could be used as drop-in replacements of the rdflib default ones
-
pronto: library to parse, browse, create, and export ontologies
- features: -supports several ontology languages and formats
- docs: https://pronto.readthedocs.io/en/latest/api.html
-
pyfactxx – Python bindings for FaCT++ OWL 2 C++ reasoner
- features:
- well-optimized reasoner for SROIQ(D) description logic, with additional improvements
- rdflib integration
- easy cross-platform installation
- features:
-
PyFuseki – Library that interact with Jena Fuseki (SPARQL server):
-
PyKEEN (Python KnowlEdge EmbeddiNgs) – Python package to train and evaluate knowledge graph embedding models
- features:
- 44 Models
- 37 Datasets
- 5 Inductive Datasets
- support for multi-modal information
- features:
-
PyLD - A JSON-LD processor written in Python
- conforms:
- JSON-LD 1.1, W3C Candidate Recommendation, 2019-12-12 or newer
- JSON-LD 1.1 Processing Algorithms and API, W3C Candidate Recommendation, 2019-12-12 or newer
- JSON-LD 1.1 Framing, W3C Candidate Recommendation, 2019-12-12 or newer
- conforms:
-
pyLoDStorage – python library to interchange data between SPARQL-, JSON and SQL-endpoints
- features:
- Integration of tabulate library
- QueryManager class for handling named queries
- Basic data structure: lists of dicts (thus: "LoD")
- docs: https://wiki.bitplan.com/index.php/PyLoDStorage
- features:
-
- docs: https://pyobo.readthedocs.io
- features:
- Provides unified, high-level access to names, descriptions, synonyms, xrefs, hierarchies, properties, relationships, etc. in ontologies from many sources listed in the Bioregistry
- Converts databases into OWL and OBO ontologies
- Wrapper around ROBOT for using Java tooling to convert between OBO and OWL
- Internal DSL for generating OBO ontology
-
Pyoxigraph – Python graph database library implementing the SPARQL standard.
- built on top of Oxigraph using PyO3
- docs: https://oxigraph.org/pyoxigraph/stable/index.html
- two stores with SPARQL 1.1 capabilities. in-memory/disk based
-
- resolution-based theorem provers for first-order logic
- focus on good comprehensibility of the code
- Literature: Teaching Automated Theorem Proving by Example
-
- Python bindings for the Stardog Knowledge Graph platform
-
Quit Store – workspace for distributed collaborative Linked Data knowledge engineering ("Quads in Git")
- features:
- read and write RDF Datasets
- create multiple branches of the Dataset
- literature references:
- Decentralized Collaborative Knowledge Management using Git by Natanael Arndt, Patrick Naumann, Norman Radtke, Michael Martin, and Edgard Marx in Journal of Web Semantics, 2018 [@sciencedirect] [@arXiv]
- features:
-
RaiseWikibase – A tool for speeding up multilingual knowledge graph construction with Wikibase
- fast inserts into a Wikibase instance: creates up to a million entities and wikitexts per hour
- docs: https://ub-mannheim.github.io/RaiseWikibase/
- ships with
docker-compose.yml
for Wikibase (Database, PHP-code) - publication: https://link.springer.com/chapter/10.1007%2F978-3-030-80418-3_11
-
Reasonable – An OWL 2 RL reasoner with reasonable performance
- written in Rust with Python-Bindings (via pyo3)
-
ROBOT – Java-tool for automating ontology workflow with several reasoners (ELK, Hermite, ...) and Python interface
- General docs: https://robot.obolibrary.org/
- Python interfaces: https://robot.obolibrary.org/python
- Docs on reasoning: https://robot.obolibrary.org/reason
-
rdflib – Python package for working with RDF
- docs: https://rdflib.readthedocs.io/
- graphical package overview: https://rdflib.dev/
- features:
- parsers and serializers for RDF/XML, NTriples, Turtle, JSON-LD and more
- a graph interface which can be backed by any one of a number of store implementations
- store implementations for in-memory storage and persistent storage
- a SPARQL 1.1 implementation – supporting SPARQL 1.1 Queries and Update statements
-
rdflib-endpoint – Python package for easily deploying SPARQL endpoints for RDFLib Graphs
- features:
- exposing machine learning models or any other logic implemented in Python through a SPARQL endpoint, using custom functions
- serving local RDF files using the command line interface
- features:
-
serd – Python serd module, providing bindings for Serd, a lightweight C library for working with RDF data
-
- LinkML based SPARQL template library and execution engine
- modularized core library of SPARQL templates
- Fully FAIR description of templates
- Rich expressive language for moedeling templates
- uses LinkML as base language
- optional python bindings / object model using LinkML
- supports both SELECT and CONSTRUCT
- optional export to TSV, JSON, YAML, RDF
- extensive endpoint metadata
- LinkML based SPARQL template library and execution engine
-
SPARQL kernel for Jupyter
- features:
- sending queries to an SPARQL endpoint
- fetching and presenting the results in a notebook
- features:
-
SPARQLing Unicorn QGIS Plugin – QGIS plugin which adds a GeoJSON layer from SPARQL enpoint queries
- docs: https://sparqlunicorn.github.io/sparqlunicornGoesGIS/
- QGIS plugin page: https://plugins.qgis.org/plugins/sparqlunicorn/
- features:
- Querying geospatial vector layers from SPARQL endpoints
- Conversion of geoformats (GeoJSON, SHP, KML, GML, etc.) to geospatial RDF
- Conversion of RDF geodata (GeoSPARQL-formatted) from one coordinate reference system to another
- SHACL validation of geospatial RDF graphs including validation of geoliteral (WKT, GML) contents
-
SPARQLWrapper – A wrapper for a remote SPARQL endpoint
- docs: https://sparqlwrapper.readthedocs.io/en/latest/index.html
- features:
- Creating a query invocation
- Optionally converting the result into a more manageable format
-
WikidataIntegrator – Library for reading and writing to Wikidata/Wikibase
- features:
- high integration with the Wikidata SPARQL endpoint
- features:
- Athene DL reasoner in pure python
- "[C]urrent version is a beta and only supports ALC. But it can easily be extended by adding tableau rules."
- Last update: 2017
- cwm
- Self description: "[cwm is a] forward chaining semantic reasoner that can be used for querying, checking, transforming and filtering information".
- Created in 2000 by Tim Berners-Lee and Dan Connolly, see w3.org
- air-reasoner
- Self description: "Reasoner for the AIR policy language, based on cwm"
- based on cwm
- Last update: 2013
- FuXi
- Self description: "An OWL / N3-based in-memory, logic reasoning system for RDF"
- based on cwm
- Last update: 2013
- see also http://code.google.com/p/python-dlp/wiki/FuXi http://code.google.com/p/fuxi/source/browse/ (hg-repo)
- pysumo
- Ontology IDE for the Sugested Upper Merged Ontology (SUMO)
- Docs: https://pysumo.readthedocs.io/
- Last update: 2015
- ontology – A curated list of ontology things (with some python-related entries)
- awesome-semantic-web#python Python section of awesome list for semantic-web-related projects
- github-semantic-web-python – github project search with
topic=semantic-web
andlanguage=python
- "Graph Thinking" – Talk by Paco Nathan (@ceteri) PyData Global 2021; slides, video
- Hydra Ecosystem - Semantically Linked REST APIs
- docs: https://www.hydraecosystem.org/
- tutorials: the stack has three major layers (server, client, GUI); each repo has it own README
- features:
- deploy a server automatically from API Documentation (JSON-LD and W3C Hydra)
- client automatically reads the documentation and provides access to endpoints
- GUI allows visualization of the network generated by the servers and external resources
- a parser for OpenAPI specs translation
- notes:
- under development, experimental
- part of Google Summer of Code
- Pywikibot
- Library to interact with Wikidata and Wikimedia API
- see also: https://www.wikidata.org/wiki/Wikidata:Creating_a_bot#Pywikibot
- semantic – Python library for extracting semantic information from text, such as dates and numbers
- Solving Einstein Puzzle – jupyter notebook demonstrating how to use owlready2 to solve a logic puzzle
- W3C-Link-List1 – link list "SemanticWebTools", section "Python_Developers" (wiki page)
- might be outdated
- W3C-Link-List2 – list of tools usable from, or with, Python (wiki page)
- wikidata-mayors
- Python code to ask wikidata for european mayors and where they where born
- Article: https://towardsdatascience.com/where-do-mayors-come-from-querying-wikidata-with-python-and-sparql-91f3c0af22e2
- yamlpyowl – read an yaml-specified ontology into python by means of owlready2 (experimental)
- Notebook, which generates quiz questions from wikidata