Skip to content

About gene pair statistics

Jean-François edited this page Dec 15, 2020 · 14 revisions

Tree reconciliation with RapGreen produces a rooted and annotated tree, but also a complete statistics .tsv file specifying the evolutionary events separating each pair of genes in the tree. An example is provided here:

https://github.com/SouthGreenPlatform/rap-green/blob/master/example_files/example_rapgreen_statistics.tsv

Each line describes a gene pair relationships, as follows:

  • GENE1 and GENE2 are the identifiers of each gene of the considered pair.

  • kVALUE corresponds to the number of leaves under the last common ancestor of the considered gene pair.

  • DIST corresponds to the total evolutive distance between the two genes, calculated using the sum of all branch lengths between them.

  • SPEC is the number of speciation nodes between the two genes.

  • T-DUP is the number of topological duplications between the two genes. A topological duplication is inferred in a node G of the gene tree by an incongruence between the species tree and the gene tree on G. G must not have any redundancy of species represented under its two sons.

  • I-DUP is the number of intersection duplication between the two genes. An intersection duplication is inferred in a node G when species redundancy between its two sons is found.

  • U-DUP is the number of recent duplications between the two genes of the same species. It is not counted as an intersection duplication but is in fact an intersection duplication with only one species under the duplication node.

  • ORTHO is true if the two considered genes have diverged after a speciation event. (ie. orthologs)

  • ULTRAP is true if the two considered genes have diverged after a recent duplication (ie. ultraparalogs or inparalogs).

  • FSCORE (experimental), intends to estimate the functional conservation between the two genes under consideration. It is initialized at 1 (functional equivalence), and is multiplied by a fixed factor for each type of event separating the two genes, and a factor weighted by the total evolutionary distance.

NB: genes that are neither orthologous nor ultraparalogous are simply paralogous.