Problem about different results with different threads #121

liugui · 2017-04-24T02:10:59Z

When I use BWA(version 0.7.15-r1140), I select the BWA-MEM algorithm and here is my command:

bwa mem -t 2 reference_file fastq1.fq.gz fastq2.fq.gz > result1.sam

This works well and then I use this command to improve the threads:

bwa mem -t 10 reference_file fastq1.fq.gz fastq2.fq.gz > result2.sam

This also works well but when I compare the result1.sam with result2.sam, they are different! I also test with -t 6, -t 16, and all the results are different. However, when I run with the same threads twice, the results are identical. So I found that BWA-MEM will get different results with different threads.

Then I read the source code and found this:

kt_for(opt->n_threads, worker1, &w, (opt->flag&MEM_F_PE)? n>>1 : n); // find mapping positions
	for (i = 0; i < opt->n_threads; ++i) smem_aux_destroy(w.aux[i]);
free(w.aux);
if (opt->flag&MEM_F_PE) { // infer insert sizes if not provided
    if (pes0) memcpy(pes, pes0, 4 * sizeof(mem_pestat_t)); // if pes0 != NULL, set the insert-size distribution as pes0
    else mem_pestat(opt, bns->l_pac, n, w.regs, pes); // otherwise, infer the insert size distribution from data
}
kt_for(opt->n_threads, worker2, &w, (opt->flag&MEM_F_PE)? n>>1 : n);

That is, BWA-MEM use n_threads(such as -t 6, n = 6) to find mapping positions, but use only 1 thread to execute the function mem_pestat to calculate avg(average of the insert size) and std(standard deviation of the insert size), which are important to find pair information. According to BWA, every thread will process around 10000000bp, so:

If I use -t 2, BWA will calculate avg and std with 2 x 10000000bp
If I use -t 10, BWA will calculate avg and std with 10 x 10000000bp
If I use -t 16, BWA will calculate avg and std with 16 x 10000000bp

So I know why the results are different with different threads.

I wonder to know if there is anything wrong with my opinion? If it's correct, I want to know how to evaluate the difference? The difference will change which filed of the SAM record( such as RNAME or POS)? If it's wrong, I want to know the real reason to make the difference.

Any reply will be much appreciated!

The text was updated successfully, but these errors were encountered:

lh3 · 2017-04-25T02:37:52Z

You haven't done anything wrong. The bwa-mem result does change with the number of threads. Use a large -K like -K 10000000 if you prefer stable results regardless of the number of threads in use.

lh3 closed this as completed Apr 25, 2017

tnguyensanger mentioned this issue Jul 29, 2020

How to make bwa short read alignment deterministic for testing malariagen/pipelines#41

Closed

d-cameron mentioned this issue Feb 17, 2021

BWA Results Vary Based on Thread Number Sydney-Informatics-Hub/Fastq-to-BAM#1

Open

DarioS mentioned this issue Mar 4, 2022

Does increasing memory cause a difference in BWA results? #341

Open

holtgrewe mentioned this issue Mar 4, 2022

Make BWA results stable for thread count bihealth/snappy-pipeline#112

Open

pmenzel mentioned this issue Apr 29, 2022

make bwa-mem output reproducible upon changing number of threads jodyphelan/pathogen-profiler#9

Closed

teepean mentioned this issue Oct 6, 2023

Difference in mapping quality and alignment for bwa mem2 compared to bwa mem bwa-mem2/bwa-mem2#246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem about different results with different threads #121

Problem about different results with different threads #121

liugui commented Apr 24, 2017

lh3 commented Apr 25, 2017

Problem about different results with different threads #121

Problem about different results with different threads #121

Comments

liugui commented Apr 24, 2017

lh3 commented Apr 25, 2017