Skip to content
sabifo4 edited this page Nov 1, 2024 · 6 revisions

🔧 CODEML

We are still working on a more interactive tutorial to navigate the settings and usage of the PAML program CODEML. In the meantime, you can consult the PAML documentation in PDF format for details on the settings you can enable in the control file to run the program. In addition, you may want to consult various resources and tutorials that provide users with guidelines and practical examples to run CODEML -- we highly recommend you check them out!

Estimating non-synonymous to synonymous rate ratio of protein-coding genes


RESOURCES AND CITATIONS


The protocol paper above includes all the theoretical and practical details you need to know to estimate the value of $\omega$ for all protein-coding genes in a genome. As shown in their Fig. 1, the protocol guides users throughout a possible workflow of data preparation: gathering sequences, ortholog assignment, alignment, possible post-alignment filtering, and tree construction. Please note that, depending on the type of data you are analysing, you may want to follow another workflow and/or use other programs that have been made available after the publication of Jeffares et al. 2014. Then, the protocol illustrates how CODEML is to be run to estimate the value of $\omega$ (as well as $d_{N}$ and $d_{S}$) and running likelihood ratio tests (LRTs) for positive selection. They also show how to test for adaptive selection on their supplementary material.

Detecting positive selection


RESOURCES AND CITATIONS


If you are looking for step-by-step guidelines that guides you through the usage of CODEML to test for positive selection, this is the protocol you have been looking for! You will specifically learn how to run the following models:

  • Homogenous model: all alignment sites and taxa have evolved under the same evolutionary pressure. This model, also known as M0 model, assumes that $\omega$ is constant across all sites and lineages.
  • Site models assume that different (amino acid or codon) sites are under different selective pressures and have different $\omega$ values. Positive selection is detected when a subset of sites in the protein-coding gene have $\omega > 1$.
  • Branch models assume that $\omega$ varies among branches of the phylogeny and positive selection is detected along specific lineages if $\omega$ for the branches is $> 1$.
  • Branch-site models assume that $\omega$ varies among branches of the phylogeny and across sites of the gene, and positive selection is detected if a subset of sites for specific branches of the phylogeny have $\omega > 1$.

You can navigate the positive-selection GitHub repository to follow a step-by-step tutorial from data collection and filtering to the usage of CODEML to detect positive selection under the four models mentioned above. We suggest you first try to run CODEML with the examples in the GitHub repository while going through the paper, which may help better integrate the workflow of this type of analysis with CODEML.

Clone this wiki locally