-
Notifications
You must be signed in to change notification settings - Fork 64
Parallelization
RAxML-NG supports three levels of parallelism: CPU instruction (vectorization), intra-node (multithreading) and inter-node (MPI). Unlike with RAxML/ExaML, single RAxML-NG executable can handle all parallelism levels which can be configured in run-time (MPI support is optional and should be enabled at compile-time).
As of v.0.7.0 beta, RAxML-NG only supports fine-grained parallelization across alignment sites. This is the same parallelization approach that has been used in RAxML-PTHREADS and ExaML, and conceptually different from coarse-grained parallelization across tree searches or tree moves as implemented in RAxML-MPI or IQTree-MPI, respectively. With fine-grained parallelization, the number of CPU cores that can be efficiently utilized is limited by the alignment "width" (=number of sites). For instance, using 20 cores on a single-gene protein alignment with 300 sites would be suboptimal, and using 100 cores would most probably result in a huge slowdown. In order to prevent the waste of CPU time and energy, RAxML-NG will warn you -- or, in extreme cases, even refuse to run -- if you try to assign too few alignment sites per core.
If you want to utilize many CPU cores for analyzing a small ("single-gene") alignment, please have a look at our ParGenes pipeline which implements coarse-grained parallelization and dynamic load balancing. Alternatively, coarse-grained parallelization can be easily emulated by starting multiple RAxML-NG instances with distinct random seeds. For instance, one can infer 1000 bootstrap trees by calling RAxML-NG five times with --bs-trees 200
(e.g., on five distinct compute nodes). Then, all resulting *.raxml.bootstraps
files can be simply concatenated. TODO: tutorial link
By default, RAxML-NG will start as many threads as there are CPU cores available in your system. Most modern CPUs employ so-called hyperthreading technology, which makes each physical core appear as two logical cores to software. For instance, on my laptop with Intel i7-3520M processor, RAxML-NG will detect 4 (logical) cores and use 4 threads by default, even though this CPU has only 2 physical cores. Hyperthreading can be beneficial for some programs, but RAxML-NG shows best performance with one thread per physical core. Thus, I would recommend using the --threads
option to set the number of threads manually, to be on the safe side.
RAxML-NG will automatically detect the best set of vector instructions available on your CPU, and use the respective computational kernels to achieve optimal performance. On modern Intel CPUs, autodetection seems to work pretty well, so most probably you don't need to worry about this. However, you can force RAxML-NG to use a specific set of vector instructions with the --simd
option, e.g.
raxml-ng --msa ali.fa --model GTR+G --simd sse
to use SSE3 kernels, or
raxml-ng --msa ali.fa --model GTR+G --simd none
to use non-vectorized (scalar) kernels. This option might be useful for debugging, since distinct vectorized kernel might yield slightly different likelihood values due to numerical reasons.