Execution

How to Run cgpCaVEManWrapper

Option 2 provides more control over individual steps in the process. For CaVEMan option 2 is preferred as option 1 results in much longer runtimes.

Single host execution does require a reasonable level of resources. Minimum specification (Based on WGS at 30x coverage):

NB The code will recover from failure and restart at the last successful section.

For users with access to a compute farm it is possible to break down the execution into the component parts (see Overview).

There are 9 steps:

One job per line in splitList, builds a profile of the section of the genome using various covariates
Command suffixed with -process mstep -index N
N is between 1 and the number of lines in splitList

One job per line in splitList, uses the profile built in the mstep, combined with sequence data and copy number to assign a probability to each possible genotype at each position.
Command suffixed with -process estep -index N
N is between 1 and the number of lines in splitList
Positions where germline mutation probabilities total more than the -snp-cutoff are output to <split_section>.snps.vcf
Positions where somatic mutation probabilities total more than the -mut-cutoff are output to <split_section>.muts.vcf

Applies filters to the T_vs_N.muts.vcf.gz file. Produces T_vs_N.muts.flagged.vcf.gz
Command suffixed with -process flag -index 1

Flagging is described in further detail on the cgpCaVEManPostProcessing wiki