This software allows the sorting and the marking of duplicates reads for high-throughput sequencing data after alignment from SAM files. mpiMarkDup
relies on the Message Passing Interface (MPI) standard to perform the parallelisation of the sorting processing over multiple cores and nodes of high performance computing clusters.
The programm sorts and markduplicates SAM files.
The programm needs a power of two number of cores.
The SAM files must contain at least one read group.
The SAM files must have fix mate information.
This programm has only been tested for short pair-end reads and upon whole genome.
Its better if duplicates are balanced along the the entire genome.
The total memory needed is approximately 2 times the SAM file size.
It produces BGZF files for each chromosome present in the header and are
compatible with samtools.
The programm markduplicates discordant reads and optical duplicates.
Discordant reads are also marked in a separates BGZF file.
example of command lines:
mpirun numactl --interleave=all $MPIMD $SAM $OUTPUT -q 0 -d 1000 -v 4
OUTPUT is the output directory
d is for optical distance
v is for log level
q if for quality
Contacts: frederic.jarlier@curie.fr