Skip to content

Additional pipeline features and details

Chris Jackson edited this page Jul 7, 2023 · 4 revisions

Managing computing resources

The config file paragone.config can be edited to suit the computing resources available on your local machine, and it can also be used to specify resource requests when running the pipeline and submitting jobs via a scheduling system such as SLURM. Note that this is a feature of Nextflow and applies to all Nextflow scripts as described here. This can be achieved by defining profiles in the config file; the paragone.config file provided has the profiles standard and slurm.

By default, if you run the paragone.nf script without specifying a profile (e.g. by using the parameter -profile slurm), Nextflow will use the profile standard. If you open the paragone.config in a text editor and find the definition of the standard profile (line beginning with standard {), you'll see it's possible to specify resources for each Nextflow process that is defined in the paragone.nf script. For example:

 standard {
         process {
             withName: CHECK_AND_ALIGN {
                 cpus = { 2 }
                 memory = { 2.GB * task.attempt }
                 errorStrategy  = { task.exitStatus in 137..143 ? 'retry' : 'terminate' }
                 maxRetries = 3
             }

             ...(some processes not shown)...

             withName: FINAL_ALIGNMENTS {
                 cpus = { 2 }
                 memory = { 2.GB * task.attempt }
                 errorStrategy  = { task.exitStatus in 137..143 ? 'retry' : 'terminate' }
                 maxRetries = 3
             }

Here, you can see that there are specific resources allocated to the processes CHECK_AND_ALIGN and FINAL_ALIGNMENTS. As you might expect (and can directly view by opening the paragone.nf script in a text editor), these processes execute ParaGone commands performing checking and alignment steps, respectively. If you are editing the standard profile to better suit the resources on your local machine, the main values to change will be the number of cpus and the memory (RAM).

If you look at the slurm_singularity profile:

 slurm_singularity {
    process {
        withName: CHECK_AND_ALIGN {
            cpus = { 2 }
            memory = { 2.GB * task.attempt }
            errorStrategy  = { task.exitStatus in 137..143 ? 'retry' : 'terminate' }
            maxRetries = 3
            time = '24h'
        }
     
        ...etc

...you'll see there's an extra important parameter: time. Most HPC scheduling systems require the user to specify a desired wall-time; you might need a bit of trial and error to work out the appropriate time requests for you given dataset and the wall-time limits of your HPC. Other options can also be specified as described in the Nextflow documentation here.