Skip to content

MkDocs plugin to generate semantic reference Markdown pages from a knowledge graph

License

Notifications You must be signed in to change notification settings

DerwenAI/mkrefs

Repository files navigation

MkRefs

The MkRefs plugin for MkDocs generates reference Markdown pages from a knowledge graph, based on the kglab project.

No graph database is required; however, let us know if you'd like to use one in particular.

There are several planned use cases for the MkRefs plugin, including:

  • biblio – semantic bibliography entries, generated from RDF
  • glossary – semantic glossary entries, generated from RDF
  • apidocs – semantic apidocs supporting the Diátaxis grammar for documentation, generated as RDF from Python source code
  • depend – semantic dependency graph for Python libraries, generated as RDF from setup.py
  • index – semantic search index, generated as RDF from MkDocs content

Only the apidocs, biblio, and glossary components have been added to MkRefs so far, although the other mentioned components exist in separate projects and are being integrated.

Contributing Code

We welcome people getting involved as contributors to this open source project!

For detailed instructions please see: CONTRIBUTING.md

Semantic Versioning

Before MkRefs reaches release v1.0.0 the types and classes may undergo substantial changes and the project is not guaranteed to have a consistent API.

Even so, we'll try to minimize breaking changes. We'll also be sure to provide careful notes.

See: changelog.txt

MkRefs, for semantic references

Why does this matter?

A key takeaway is that many software engineering aspects of open source projects involve graphs, therefore a knowledge graph can provide an integral part of an open source repository. Moreover, by using semantic representation (RDF) projects that integrate with each other can share (i.e., federate) common resources, for example to share definitions, analyze mutual dependencies, etc.

Installation

To install the plugin using pip:

python3 -m pip install mkrefs

Then add the plugin into the mkdocs.yml file:

plugins:
  - mkrefs

In addition, the following configuration parameter is expected:

  • mkrefs_config - YAML configuration file for MkRefs; e.g., mkrefs.yml

API Docs

An apidocs parameter within the configuration file expects four required sub-parameters:

  • page – name of the generated Markdown page, e.g., ref.md
  • template – a Jinja2 template to generate Markdown, e.g., ref.jinja
  • package – name of the package being documented
  • git – base URL for source modules in Git, e.g., https://github.com/DerwenAI/mkrefs/blob/main

There is an optional includes parameter, as a list of class definitions to include. If this is used, then all other classes get ignored.

See the source code in this repo for examples of how to format Markdown within docstrings. Specifically see the parameter documentation per method or function, which differs slightly from pre-exisiting frameworks.

Note that the name of the generated Markdown page for the apidocs must appear in the nav section of your mkdocs.yml configuration file. See the structure used in this repo for an example.

Best Practices: RDF representation

You can use this library outside of MkDocs, i.e., calling it programmatically, to generate an RDF graph to represent your package API reference:

package_name = "mkrefs"
git_url = "https://github.com/DerwenAI/mkrefs/blob/main"
includes = [ "MkRefsPlugin", "PackageDoc" ]

pkg_doc = PackageDoc(package_name, git_url, includes)
pkg_doc.build()

kg = pkg_doc.get_rdf()

The PackageDoc.get_rdf() method returns an RDF graph as an instance of an kglab.KnowledgeGraph object. For more details, see https://derwen.ai/docs/kgl/

Bibliography

A biblio parameter within the configuration file expects four required sub-parameters:

  • graph – an RDF graph represented as a Turtle (TTL) file, e.g., mkrefs.ttl
  • page – name of the generated Markdown page, e.g., biblio.md
  • template – a Jinja2 template to generate Markdown, e.g., biblio.jinja
  • queriesSPARQL queries used to extract bibliography data from the knowledge graph

See the mkrefs.ttl file for an example bibliography represented in RDF. This comes from the documentation for the pytextrank open source project.

In the example RDF, the bibo vocabulary represents bibliographic entries, and the FOAF vocabulary represents authors. This also uses two custom OWL relations from the derwen vocabulary:

  • derw:citeKey – citekey used to identify a bibliography entry within the documentation
  • derw:openAccess – open access URL for a bibliography entry (if any)

The queries parameter has three required SPARQL queries:

  • entry – to select the identifiers for all of the bibliograpy entries
  • entry_author – a mapping to identify author links for each bibliography entry
  • entry_publisher - the publisher link for each bibliography entry (if any)

Note that the name of the generated Markdown page for the bibliography must appear in the nav section of your mkdocs.yml configuration file. See the structure used in this repo for an example.

You may use any valid RDF representation for a bibliography. Just be sure to change the three SPARQL queries and the Jinja2 template accordingly.

While this example uses an adaptation of the MLA Citation Style, feel free to modify the Jinja2 template to generate whatever bibliographic style you need.

Best Practices: constructing bibliographies

As much as possible, bibliography entries should use the conventions at https://www.bibsonomy.org/ for their citation keys.

Journal abbreviations should use ISO 4 standards, for example from https://academic-accelerator.com/Journal-Abbreviation/System

Links to online versions of cited works should use DOI for persistent identifiers.

When available, open access URLs should be listed as well.

What is going on here?

For example with the bibliography use case, when the plugin runs...

  1. It parses its configuration file to identify the target Markdown page to generate and the Jinja2 template
  2. The plugin also loads an RDF graph from the indicated TTL file
  3. Three SPARQL queries are run to identify the unique entities to extract from the graph
  4. The graph is serialized as JSON-LD
  5. The author, publisher, and bibliography entry entities are used to denormalize the graph into a JSON data object
  6. The JSON is rendered using the Jinja2 template to generate the Markdown
  7. The Markdown page is parsed and rendered by MkDocs as HTML, etc.

Glossary

A glossary parameter within the configuration file expects four required sub-parameters:

  • graph – an RDF graph represented as a Turtle (TTL) file, e.g., mkrefs.ttl
  • page – name of the generated Markdown page, e.g., glossary.md
  • template – a Jinja2 template to generate Markdown, e.g., glossary.jinja
  • queriesSPARQL queries used to extract glossary data from the knowledge graph

See the mkrefs.ttl file for an example glossary represented in RDF. This example RDF comes from documentation for the pytextrank open source project.

In the example RDF, the cito vocabulary represents citations to locally represented bibliographic entries. The skos vocabulary provides support for taxonomy features, e.g., semantic relations among glossary entries. This example RDF also uses a definition from the derwen vocabulary:

  • derw:Topic – a skos:Concept used to represent glossary entries

The queries parameter has three required SPARQL queries:

  • entry – to select the identifiers for all of the bibliograpy entries
  • entry_syn – a mapping of synonyms (if any)
  • entry_ref – a mapping of external references (if any)
  • entry_cite – citations to the local bibliography citekeys (if any)
  • entry_hyp – a mapping of hypernyms (if any)

Note that the name of the generated Markdown page for the glossary must appear in the nav section of your mkdocs.yml configuration file. See the structure used in this repo for an example.

You may use any valid RDF representation for a glossary. Just be sure to change the three SPARQL queries and the Jinja2 template accordingly.

Usage

The standard way to generate documentation with MkDocs is:

mkdocs build

If you'd prefer to generate reference pages programmatically using Python scripts, see the code for usage of the MkRefsPlugin class, plus some utility functions:

  • load_kg()
  • render_apidocs()
  • render_biblio()
  • render_glossary()

There are also command line entry points provided, which can be helpful during dev/test cycles on the semantic representation of your content:

mkrefs apidocs docs/mkrefs.yml
mkrefs biblio docs/mkrefs.yml
mkrefs glossary docs/mkrefs.yml

Caveats

While the MkDocs utility is astoundingly useful, its documentation (and coding style) leave much room for improvement. The documentation for developing plugins is not even close to what happens when its code executes.

Consequently, the MkRefs project is an attempt to reverse-engineer the code from many other MkDocs plugins, while documenting its observed event sequence, required parameters, limitations and workarounds, etc.

Two issues persist, where you will see warnings even though the MkRefs code is handling configuration as recommended:

WARNING -  Config value: 'mkrefs_config'. Warning: Unrecognised configuration name: mkrefs_config 

and

INFO    -  The following pages exist in the docs directory, but are not included in the "nav" configuration:
  - biblio.md
  - glossary.md
  - ref.md

For now, you can simply ignore both of these warnings. Meanwhile, we'll work on eliminating them.

Feature roadmap

Let us know if you need features to parse and generate BibTeX.

License and Copyright

Source code for MkRefs plus its logo, documentation, and examples have an MIT license which is succinct and simplifies use in commercial applications.

All materials herein are Copyright © 2021 Derwen, Inc.

Acknowledgements

Many thanks to our open source sponsors; and to our contributors: @ceteri

This plugin code is based on the marvelous examples in https://github.com/byrnereese/mkdocs-plugin-template with kudos to @byrnereese, and also many thanks to @louisguitton, @dmccreary, and @LarrySwanson for their inspiration and insights.