Reference-free Orthology-free Annotation-free DIscordance aware Estimation of Species tree (ROADIES)

Introduction

Welcome to the official repository of ROADIES, a novel pipeline for inferring phylogenetic species trees directly from raw genomic assemblies. ROADIES offers a fully automated, scalable, and easy-to-use solution, eliminating manual steps and allowing flexible control over the trade-off between accuracy and runtime.

🟡 For a detailed overview of ROADIES' features and configuration options, please visit our Wiki.

🟡 If you encounter issues while running the pipeline, please refer to this page for common errors and troubleshooting tips.

Figure: ROADIES Pipeline Stages

Quick Install

Please follow any of the options below to install ROADIES in your system.

Option 1: Install via Bioconda (Recommended)

Step 1: Install Conda (if not installed):

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

export PATH="$HOME/miniconda3/bin:$PATH"
source ~/.bashrc

Step 2: Configure Conda channels:

conda config --add channels defaults

conda config --add channels bioconda

conda config --add channels conda-forge

Verify the installation by running conda in your terminal

Step 3: Create and activate a custom environment:

conda create -n roadies_env python=3.9 ete3 seaborn

conda activate roadies_env

Step 4: Install ROADIES:

conda install roadies=0.1.10

Step 5: Locate the installed files:

cd $CONDA_PREFIX/ROADIES

Now, you will be able to find the contents of the repository within this ROADIES folder.

Step 6: Run the following commands:

git clone https://github.com/smirarab/pasta.git
git clone https://github.com/smirarab/sate-tools-linux.git
cd pasta
python3 setup.py develop --user

Also, in the align.smk file (inside the workflow/rules directory of the ROADIES repository), please replace any instance of:

pasta.py with python pasta/run_pasta.py
run_seqtools.py with python pasta/run_seqtools.py

After following all these steps, now you are ready to follow the Quick Start section to run the pipeline. Make sure to go back to the main directory to follow the next steps:

cd ROADIES

Option 2: Install via DockerHub

If you would like to install ROADIES using DockerHub, follow these steps:

Step 1: Pull the ROADIES image from DockerHub:

docker pull ang037/roadies:latest

Step 2: Launch a container:

docker run -it ang037/roadies:latest

These commands will launch the Docker container in interactive mode, with the roadies_env environment activated and the working directory set to the ROADIES repository containing all necessary files. Once you are able to access the ROADIES repository, refer to the Quick Start to run the pipeline.

Option 3: Install via Local Docker Build

Step 1: Clone the ROADIES repository:

git clone https://github.com/TurakhiaLab/ROADIES.git

cd ROADIES

Step 2: Build and run the Docker container:

docker build -t roadies_image .

docker run -it roadies_image

Once you are able to access the ROADIES repository, refer to Quick Start instructions to run the pipeline.

Option 4: Install via Source Script

Step 1: Install the following dependencies (requires sudo access):

Java Runtime Environment (Version 1.7 or higher)
Python (Version 3.9 or higher)
wget and unzip commands
GCC (Version 11.4 or higher)
cmake (Download here: https://cmake.org/download/)
Boost library (Download here: https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/)
zlib (Download here: http://www.zlib.net/)

For Ubuntu, you can install these dependencies with:

sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake

Step 2: Clone the repository:

git clone https://github.com/TurakhiaLab/ROADIES.git

cd ROADIES

Step 3: Run the installation script:

chmod +x roadies_env.sh

source roadies_env.sh

After successful setup (Setup complete message), your environment roadies_env will be activated. Proceed to Quick Start.

Note: If you encounter issues with the Boost library, add its path to $CPLUS_LIBRARY_PATH and save it in ~/.bashrc.

Quick Start

After installing using one of the options mentioned in Quick Install, you're ready to run ROADIES! To get started:

Step 1: Download the test dataset (11 Drosophila genomes) (make sure to perform this step from the main repository directory):

mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'

This will save the datasets on a separate test/test_data folder within the repository

Step 2: Run the ROADIES pipeline

IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use `--noconverge` to run a single iteration.

Full run (multiple iterations) - Default

python run_roadies.py --cores 16

OR

Quick test run (one iteration)

python run_roadies.py --cores 16 --noconverge

Step 3: Access final species tree

Default mode: Final species tree (in Newick format) for individual iterations (latest one will be the most confident and accurate tree) will be saved in separate converge_files/iteration_<iteration_number> folders.

If --noconverge is used: Final species tree (in Newick format) will be saved as roadies.nwk in a separate output_files folder.

NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided `reroot.py` script in `workflow/scripts/` (given a reference rooted species tree as input).

Running ROADIES on your own data

If you want to run ROADIES with your own datasets, follow these steps:

Step 1: Specify Input Dataset:

Edit config.yaml file (found in the ROADIES directory - config folder).
Update the GENOMES field with paths to your .fa or .fa.gz genome assemblies. Ensure all input genomic assemblies are in .fa or .fa.gz format and named according to the species' name (e.g., Aardvark.fa).

IMPORTANT: Each file must contain only one species. If needed, split multi-species files with:

faSplit byname <input_dir> <output_dir>

Step 2: Configure Other Parameters:

Modify other parameters in config.yaml as needed.
Refer to detailed settings on the Wiki.

Step 3: Run the Pipeline:

python run_roadies.py --cores 16

Modes of operation: ROADIES supports multiple modes of operation (fast, balanced, accurate) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (accurate mode is the default):

python run_roadies.py --cores 16 --mode accurate

python run_roadies.py --cores 16 --mode balanced

python run_roadies.py --cores 16 --mode fast

Final unrooted species tree (in Newick format) for individual iterations (latest one will be the most confident and accurate tree) will be saved in separate ALL_OUT_DIR/iteration_<iteration_number> folders (ALL_OUT_DIR is configured in config/config.yaml).

For contributing to the code, or SLURM cluster usage, refer to Wiki

Citing ROADIES

If you use ROADIES in your research or publications, please cite the following paper:

A. Gupta, S. Mirarab, & Y. Turakhia, Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES, Proc. Natl. Acad. Sci. U.S.A. 122 (19) e2500553122, https://doi.org/10.1073/pnas.2500553122 (2025).

Accessing ROADIES output files

The output files with the gene trees and species trees generated by ROADIES in the manuscript are deposited to Dryad. To access it, please refer to the following:

Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. Dryad. https://doi.org/10.5061/dryad.tht76hf73.

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.github		.github
config		config
docs		docs
reference_trees		reference_trees
test		test
workflow		workflow
.gitignore		.gitignore
Dockerfile		Dockerfile
HOXD55.q		HOXD55.q
LICENSE		LICENSE
README.md		README.md
drawing_github.png		drawing_github.png
mkdocs.yml		mkdocs.yml
roadies_env.sh		roadies_env.sh
run_roadies.py		run_roadies.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reference-free Orthology-free Annotation-free DIscordance aware Estimation of Species tree (ROADIES)

Table of Contents

Introduction

🟡 For a detailed overview of ROADIES' features and configuration options, please visit our Wiki.

🟡 If you encounter issues while running the pipeline, please refer to this page for common errors and troubleshooting tips.

Quick Install

Option 1: Install via Bioconda (Recommended)

Option 2: Install via DockerHub

Option 3: Install via Local Docker Build

Option 4: Install via Source Script

Quick Start

IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use `--noconverge` to run a single iteration.

NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided `reroot.py` script in `workflow/scripts/` (given a reference rooted species tree as input).

Running ROADIES on your own data

For contributing to the code, or SLURM cluster usage, refer to Wiki

Citing ROADIES

Accessing ROADIES output files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Languages

License

TurakhiaLab/ROADIES

Folders and files

Latest commit

History

Repository files navigation

Reference-free Orthology-free Annotation-free DIscordance aware Estimation of Species tree (ROADIES)

Table of Contents

Introduction

🟡 For a detailed overview of ROADIES' features and configuration options, please visit our Wiki.

🟡 If you encounter issues while running the pipeline, please refer to this page for common errors and troubleshooting tips.

Quick Install

Option 1: Install via Bioconda (Recommended)

Option 2: Install via DockerHub

Option 3: Install via Local Docker Build

Option 4: Install via Source Script

Quick Start

IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use --noconverge to run a single iteration.

NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided reroot.py script in workflow/scripts/ (given a reference rooted species tree as input).

Running ROADIES on your own data

For contributing to the code, or SLURM cluster usage, refer to Wiki

Citing ROADIES

Accessing ROADIES output files

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Languages

IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use `--noconverge` to run a single iteration.

NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided `reroot.py` script in `workflow/scripts/` (given a reference rooted species tree as input).

Packages