Skip to content

Coarse grained parallelization with RAxML NG v.0.9 and below (obsolete)

Alexey Kozlov edited this page Aug 7, 2020 · 1 revision

IMPORTANT: Starting with v1.0, RAxML-NG features easy-to-use, "integrated" coarse-grained parallelization. For analyzing multiple MSAs at once (e.g., gene tree inference for ASTRAL etc.), please also have a look at our ParGenes pipeline which implements coarse-grained parallelization and dynamic load balancing. Hence, "the hard way" of coarse-grained parallelization described below is now obsolete, but we kept this tutorial here for educational purposes.

Coarse-grained parallelization can be easily emulated by starting multiple RAxML-NG instances with distinct random seeds. For instance, let's assume we want to run an "all-in-one" analysis on a dataset described above, and we want to use a server with 16 CPU cores. As we can see from the scaling plots above, fine-grained parallelization across 16 cores is very inefficient for this dataset. We will therefore use fine-grained parallelization with 2 cores per tree search, which means we can run 16 / 2 = 8 RAxML-NG instances in parallel. First, we will infer 24 ML trees, using 12 random and 12 parsimony-based starting trees. Hence, each RAxM-NG instance will run searches from 24 / 8 = 3 starting trees. Here is a sample SLURM script for doing this:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 8 
#SBATCH -B 2:8:1
#SBATCH --threads-per-core=1
#SBATCH --cpus-per-task=2
#SBATCH -t 02:00:00
 
for i in `seq 1 4`; 
do
  srun -N 1 -n 1 --exclusive raxml-ng --search --msa ali.fa.raxml.rba --tree pars{3} --prefix CT$i --seed $i --threads 2 &
done

for i in `seq 5 8`; 
do
  srun -N 1 -n 1 --exclusive raxml-ng --search --msa ali.fa.raxml.rba --tree rand{3} --prefix CT$i --seed $i --threads 2 &
done

wait

Of course, this script has to be adapted for your specific cluster configuration and/or job submission system. You can also use GNU parallel, or directly start multiple RAxML-NG instances from the command line. Please pay attention to the ampersand symbol (&) at the end of each RAxML-NG command line: it is extremely important here, since if you forget the ampersand all RAxML-NG instances will run one after another and not in parallel! Furthermore, we add --exclusive flag to tell srun that raxml-ng instances must be assigned to distinct CPU cores (this is default behavior with some SLURM configurations, but not always).

Once the job has finished, we can check the likelihoods:

$ grep "Final LogLikelihood" CT*.raxml.log | sort -k 3

CT7.raxml.log:Final LogLikelihood: -30621.004116
CT6.raxml.log:Final LogLikelihood: -30621.537107
CT2.raxml.log:Final LogLikelihood: -30621.699234
CT3.raxml.log:Final LogLikelihood: -30622.534482
CT1.raxml.log:Final LogLikelihood: -30622.783250
CT8.raxml.log:Final LogLikelihood: -30623.963471
CT5.raxml.log:Final LogLikelihood: -30623.020351
CT4.raxml.log:Final LogLikelihood: -30623.378857

and pick the best-scoring tree (CT7.raxml.bestTree in our case):

$ ln -s CT7.raxml.bestTree best.tre

The same trick can be applied to bootstrapping. For simplicity, let's infer 8 * 15 = 120 replicate trees:

for i in `seq 1 8`; 
do
  raxml-ng --bootstrap --msa ali.fa.raxml.rba --bs-trees 15 --prefix CB$i --seed $i --threads 2 &
done

wait

Now, we can simply concatenate all replicate tree files (*.raxml.bootstraps) and then proceed with bootstrap convergence check and branch support calculation as usual:

$ cat CB*.raxml.bootstraps > allbootstraps

$ raxml-ng --bsconverge --bs-trees allbootstraps --prefix CS --seed 2 --threads 1

$ raxml-ng --support --tree best.tre --bs-trees allbootstraps --prefix CS --threads 1

There are two things to keep in mind when doing this type of coarse-grained parallelization. First, memory consumption will grow proportionally to the number of RAxML-NG instances running in parallel. That is, in our case, an estimate given by --parse command should be multiplied by 8. Second, correct thread allocation (1 thread per CPU core) is crucial for achieving the optimal performance. Hence, we recommend to check thread allocation, e.g. by running htop after your first script submission.