Parallelization

Introduction

RAxML-NG supports three levels of parallelism: CPU instruction (vectorization), intra-node (multithreading) and inter-node (MPI). Unlike with RAxML/ExaML, single RAxML-NG executable can handle all parallelism levels which can be configured in run-time (MPI support is optional and should be enabled at compile-time).

As of v.0.7.0 beta, RAxML-NG only supports fine-grained parallelization across alignment sites. This is the same parallelization approach that has been used in RAxML-PTHREADS and ExaML, and conceptually different from coarse-grained parallelization across tree searches or tree moves as implemented in RAxML-MPI or IQTree-MPI, respectively. With fine-grained parallelization, the number of CPU cores that can be efficiently utilized is limited by the alignment "width" (=number of sites). For instance, using 20 cores on a single-gene protein alignment with 300 sites would be suboptimal, and using 100 cores would most probably result in a huge slowdown. In order to prevent the waste of CPU time and energy, RAxML-NG will warn you -- or, in extreme cases, even refuse to run -- if you try to assign too few alignment sites per core.

If you want to utilize many CPU cores for analyzing a small ("single-gene") alignment, please have a look at our ParGenes pipeline which implements coarse-grained parallelization and dynamic load balancing. Alternatively, coarse-grained parallelization can be easily emulated by starting multiple RAxML-NG instances with distinct random seeds. For instance, one can infer 1000 bootstrap trees by calling RAxML-NG five times with --bs-trees 200 (e.g., on five distinct compute nodes). Then, all resulting *.raxml.bootstraps files can be simply concatenated. TODO: tutorial link

Multithreading

By default, RAxML-NG will start as many threads as there are CPU cores available in your system. Most modern CPUs employ so-called hyperthreading technology, which makes each physical core appear as two logical cores to software. For instance, on my laptop with Intel i7-3520M processor, RAxML-NG will detect 4 (logical) cores and use 4 threads by default, even though this CPU has only 2 physical cores. Hyperthreading can be beneficial for some programs, but RAxML-NG shows best performance with one thread per physical core. Thus, I would recommend using the --threads option to set the number of threads manually, to be on the safe side.

MPI

So how many threads/cores should I use?

Vector instructions

RAxML-NG will automatically detect the best set of vector instructions available on your CPU, and use the respective computational kernels to achieve optimal performance. On modern Intel CPUs, autodetection seems to work pretty well, so most probably you don't need to worry about this. However, you can force RAxML-NG to use a specific set of vector instructions with the --simd option, e.g.

raxml-ng --msa ali.fa --model GTR+G --simd sse

to use SSE3 kernels, or

raxml-ng --msa ali.fa --model GTR+G --simd none

to use non-vectorized (scalar) kernels. This option might be useful for debugging, since distinct vectorized kernel might yield slightly different likelihood values due to numerical reasons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization

Introduction

Multithreading

MPI

So how many threads/cores should I use?

Vector instructions

Clone this wiki locally