Skip to content

MCMCtree

sabifo4 edited this page Nov 1, 2024 · 6 revisions

🔧 MCMCtree

We are still working on a more interactive tutorial to navigate the settings and usage of the dating program MCMCtree. In the meantime, you can consult the PAML documentation in PDF format for details on the settings you can enable in the control file to run the program. In addition, you may want to consult various resources and tutorials that provide users with guidelines and practical examples to run MCMCtree -- we highly recommend you check them out!

Bayesian timetree inference using the approximate likelihood calculation


RESOURCES AND CITATIONS


The approximation for the likelihood calculation implemented in MCMCtree (dos Reis and Yang 2011) has been found to be 1000x faster when compared to the exact likelihood calculation (Battistuzzi et al. 2011), and has been widely used in Bayesian node-dating analyses with phylogenomic data. For instance, it has been used in mammals (Meredith et al. 2011; dos Reis et al. 2012; Álvarez-Carretero et al. 2022), birds (Jarvis et al. 2014; Stiller et al. 2024), metazoans (dos Reis et al. 2015), and plants (Barba-Montoya et al. 2018; Morris et al. 2018); and also to study the origin of life on Earth (Betts et al. 2018; Moody et al. 2024).

The article cited above (dos Reis and Yang 2011) provides users with the theoretical and technical background required to understand the validation and implementation of the approximate likelihood calculation in MCMCtree.

For examples on how to use such an approximation with phylogenomic datasets, users may want to read the Bayesian Molecular Clock Dating Using Genome-Scale Datasets, which includes the code snippets and instructions required to run MCMCtree with the example data provided in the divtime GitHub repository maintained by Mario dos Reis. Users can also follow the protocol Dating Microbial Evolution with MCMCtree for an example on how to run MCMCtree with microbial datasets, which example data can be accessed in the microdiv GitHub repository maintained by Mario dos Reis.

Bayesian model selection analyses


RESOURCES AND CITATIONS

Important

Remember to cite the mcmc3r R package if you use it (see tutorial via this link).


The paper cited above introduced Bayesian model selection in PAML. Both the thermodynamic integration (e.g., Gelman and Meng, 1998, Lartillot and Philippe, 2006, Lepage et al. 2007) and the stepping-stone approach (Xie et al. 2011) have been implemented in the mcmc3r R package. A very detailed tutorial explaining how to use the functions in the mcmc3r R package to generate the file structure and input files required by MCMCtree to sample from the power posteriors has been written by Mario dos Reis. The Bayesian model selection analyses with the mcmc3r R package and MCMCtree has been already used to find the best-fitting relaxed-clock model (e.g., dos Reis et al. 2018, Álvarez-Carretero et al. 2019, McGowen et al. 2020) and the best-fitting tree topology (Perri et al. 2021).

Bayesian timetree inference with continuous morphological data


RESOURCES AND CITATIONS

Important

Remember to cite the mcmc3r R package if you use it (see tutorial).


The dating program MCMCtree can also be used to infer species divergence times using morphological quantitative data such as geometric morphometrics (GMM) data. Sandra Álvarez-Carretero and Mario dos Reis wrote a very detailed tutorial explaining how to parse the data in R, how to generate the morphological alignment and format it for MCMCtree, and how to enable the analysis of GMM data in MCMCtree.

Bayesian sequential subtree (BSS) approach


RESOURCES AND CITATIONS


Sometimes, approximating the likelihood calculation (dos Reis and Yang 2011) may not be enough to infer evolutionary timelines with large phylogenomic datasets within a reasonable amount of time. In such instances, the BSS approach can be of help!

This Bayesian sequential approach was used by Álvarez-Carretero et al. (2022) was first validated and applied to infer a species-level timeline of mammal evolution with phylogenomic data of 4,705 mammal species. Sandra Álvarez-Carretero wrote a step-by-step tutorial that guides used from data filtering to timetree inference, available in the mammals_dating GitHub repository that Sandra maintains. You will be able to find in-house scripts, input/output files, control files, plots, intermediate files... Everything you may need to reproduce the results reported in the study!

Cross-bracing (or equality constraints)


RESOURCES AND CITATIONS

If you use or adapt the LUCA-divtimes protocol (including the in-house scripts) for your analyses, please also cite the following:

Sandra Álvarez-Carretero. (2024). sabifo4/LUCA-divtimes: v1.0.1 (LUCAdivtimes-v1.0.1). Zenodo. https://doi.org/10.5281/zenodo.12731583


All the timetree inference analyses carried out throughout this study have been thoroughly documented in a reproducible workflow by Sandra Álvarez-Carretero in the LUCA-divtimes GitHub repository, which she actively maintains. Once users clone this repository, they can navigate the file structure and go through all the README files, where extensive documentation and justifications for each analyses have been given. Users can also find in-house scripts to run PAML programs in a HPC (some settings may need to be adapted depending on the scheduler used) and to carry out MCMC diagnostics in R. Explanations to enable cross-bracing in MCMCtree, as well as in-house scripts to calibrate the tree topology in MCMCtree format, are also provided in the repository.