You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use BWA(version 0.7.15-r1140), I select the BWA-MEM algorithm and here is my command:
bwa mem -t 2 reference_file fastq1.fq.gz fastq2.fq.gz > result1.sam
This works well and then I use this command to improve the threads:
bwa mem -t 10 reference_file fastq1.fq.gz fastq2.fq.gz > result2.sam
This also works well but when I compare the result1.sam with result2.sam, they are different! I also test with -t 6, -t 16, and all the results are different. However, when I run with the same threads twice, the results are identical. So I found that BWA-MEM will get different results with different threads.
Then I read the source code and found this:
kt_for(opt->n_threads, worker1, &w, (opt->flag&MEM_F_PE)? n>>1 : n); // find mapping positions
for (i = 0; i < opt->n_threads; ++i) smem_aux_destroy(w.aux[i]);
free(w.aux);
if (opt->flag&MEM_F_PE) { // infer insert sizes if not provided
if (pes0) memcpy(pes, pes0, 4 * sizeof(mem_pestat_t)); // if pes0 != NULL, set the insert-size distribution as pes0
else mem_pestat(opt, bns->l_pac, n, w.regs, pes); // otherwise, infer the insert size distribution from data
}
kt_for(opt->n_threads, worker2, &w, (opt->flag&MEM_F_PE)? n>>1 : n);
That is, BWA-MEM use n_threads(such as -t 6, n = 6) to find mapping positions, but use only 1 thread to execute the function mem_pestat to calculate avg(average of the insert size) and std(standard deviation of the insert size), which are important to find pair information. According to BWA, every thread will process around 10000000bp, so:
If I use -t 2, BWA will calculate avg and std with 2 x 10000000bp
If I use -t 10, BWA will calculate avg and std with 10 x 10000000bp
If I use -t 16, BWA will calculate avg and std with 16 x 10000000bp
So I know why the results are different with different threads.
I wonder to know if there is anything wrong with my opinion? If it's correct, I want to know how to evaluate the difference? The difference will change which filed of the SAM record( such as RNAME or POS)? If it's wrong, I want to know the real reason to make the difference.
Any reply will be much appreciated!
The text was updated successfully, but these errors were encountered:
You haven't done anything wrong. The bwa-mem result does change with the number of threads. Use a large -K like -K 10000000 if you prefer stable results regardless of the number of threads in use.
When I use BWA(version 0.7.15-r1140), I select the BWA-MEM algorithm and here is my command:
bwa mem -t 2 reference_file fastq1.fq.gz fastq2.fq.gz > result1.sam
This works well and then I use this command to improve the threads:
bwa mem -t 10 reference_file fastq1.fq.gz fastq2.fq.gz > result2.sam
This also works well but when I compare the result1.sam with result2.sam, they are different! I also test with
-t 6, -t 16
, and all the results are different. However, when I run with the same threads twice, the results are identical. So I found that BWA-MEM will get different results with different threads.Then I read the source code and found this:
That is, BWA-MEM use n_threads(such as -t 6, n = 6) to find mapping positions, but use only 1 thread to execute the function
mem_pestat
to calculateavg
(average of the insert size) andstd
(standard deviation of the insert size), which are important to find pair information. According to BWA, every thread will process around 10000000bp, so:If I use -t 2, BWA will calculate avg and std with 2 x 10000000bp
If I use -t 10, BWA will calculate avg and std with 10 x 10000000bp
If I use -t 16, BWA will calculate avg and std with 16 x 10000000bp
So I know why the results are different with different threads.
I wonder to know if there is anything wrong with my opinion? If it's correct, I want to know how to evaluate the difference? The difference will change which filed of the SAM record( such as RNAME or POS)? If it's wrong, I want to know the real reason to make the difference.
Any reply will be much appreciated!
The text was updated successfully, but these errors were encountered: