Skip to content
Melanie Courtot edited this page Jul 7, 2016 · 7 revisions

Mapping ontology annotations to a slim (subset)

Introduction

Given a GO slim file, and a current ontology (in one or more files), this script will map a gene association file (containing annotations to the full GO) to the terms in the GO slim.

The script can be used to either create a new gene association file, which contains the most pertinent GO slim accessions, or in count-mode, in which case it will give distinct gene product counts for each slim term.

The association file format is described here:

http://geneontology.org/page/go-annotation-file-formats

Background

GO is a Directed Acyclic Graph (DAG), not a tree. This means that there is often more than one path from a GO term up to the root Gene_Ontology node; the path may intersect multiple terms in the slim ontology - which means that one annotation can map to multiple slim terms!

GO also uses multiple relations (object properties) and depending on which GO file you use with map2slim different relations will be considered for slimming purposes. We recommend the go-basic version of the ontology be used, which contains:

  • subClassOf (is a)
  • part of
  • regulates (+ positively and negatively regulates)

You can also use the full version of GO and filter those relationships you do not want to consider.

Example

In a hypothetical example, blue circles show terms in the GO slim and yellow circles show terms in the full ontology. The full ontology subsumes the slim, so the blue terms are also in the ontology.

image

  GO ID  MAPS TO SLIM ID        ALL SLIM ANCESTORS
  =====  ===============        ==================
  5      2+3                    2,3,1
  6      3 only                 3,1
  7      4 only                 4,3,1
  8      3 only                 3,1
  9      4 only                 4,3,1
  10     2+3                    2,3,1

The 2nd column shows the most pertinent ID(s) in the slim direct mapping. The 3rd column shows all ancestors in the slim.

Note in particular the mapping of ID 9: although this has two paths to the root through the slim via 3 and 4, 3 is discarded because it is subsumed by 4.

On the other hand, 10 maps to both 2 and 3 because these are both the first slim ID in the two valid paths to the root, and neither subsumes the other.

The algorithm used is:

  • to map any one term in the full ontology: find all valid paths through to the root node in the full ontology

  • for each path, take the first slim term encountered in the path

  • discard any redundant slim terms in this set i.e. slim terms subsumed by other slim terms in the set

Using OWLTools Command-line

OWLTools provides a dedicated option for map2slim (--map2slim). The general workflow is as follows:

  1. Load the ontology

    OWLTools can load local ontology files or PURLs.

  2. Load the GAF

    OWLTools expects Gene Annotations Files (GAFs) as local files, use: --gaf FILE

  3. Select subset

    There are two options to define the relevant subset:

    • use existing subset: --subset NAME

    OR

    • use custom set of identifiers --idfile FILE

      The id file is expected to contain a single identifier per line

  4. Save modified GAF

    Set the output file for the mapped annotations using --write-gaf FILE

Example command lines:

  • using a custom slim from an id file:
 owltools go.obo --gaf annotations.gaf --map2slim --idfile slim.terms --write-gaf annotations.mapped.gaf
  • using an existing slim
 owltools go.obo --gaf annotations.gaf --map2slim --subset goslim_pombe --write-gaf annotations.mapped.gaf

General information about getting and using OWLTools can be found at https://github.com/owlcollab/owltools/wiki/Install-OWLTools