BiTeN is a short pipeline written in nextflow that aims to be used as a template for nextflow pipeline development.
Nextflow is a free, open source software project that facilitates the execution of a computational workflow consisting of a series of interconnected steps/tasks. Utilizing Nextflow can take various forms. This repository offers a specific example illustrating how a bioinformatician can organize their code to be executed using Nextflow.
The pipeline and the whole repository (readme/Contributing/etc) can be use as template for nextflow pipeline projects. Comment in pipeline's code help the user to better understand the different usages.
This pipeline template follow the following steps:
- handling parameters, file input and help (Deal with gz and not gz file, deal with paired and unpaired input reads, etc.)
- QC
- Alignment
- file conversion (sam2bam)
- file sorting (samtools_sort)
├── # Documentation that gives users a detailed description of a project and with guidelines on how to use it.
├── LICENSE # Lience of your projet. Licenses are important for open-source projects because they set the legal terms and conditions for using, distributing, and modifying the software
├── # Provides potential project contributors with a short guide to how they can help with your project
├── img # Folder containing images used by the README
├── # The nextflow main executable file use to run your pipeline. It contains the logic of your pipeline
├── modules/ # Modules folder contains components that can be included in workflows. Think as functions in programming languages. Module were introduced in DSL2. See It is encouraged to have a module file by tool.
│ ├── # A module file containings processes (the basic processing primitive to execute a user script see related to the bowtie2 tool.
│ ├── # A module file containings processes related to the fastqc tool.
│ ├── # A module file containings processes related to the samtools tool.
│ └── # A template module file.
├── subworkflows/ # Subworkflows folder contains workflow components that can be included in other workflows, typically used by the main workflow in the
├── nextflow.config # Configuration file. Nextflow has multiple way to handle config ((see here)[]). We can define it this file, parameters, profiles, etc.
├── ressources/ # Contains configuration files that define the differents ressources i.e. computing and tools
│ ├── computing/ # Contains configuration files that define the computing ressources that will be loaded via profiles
│ │ ├── hpc.config # A hpc configuration that define computing ressource on HPC (CPU, TimeOut, RAM per process/label and other information like parallelisation and scheduler)
│ │ └── local.config # A local configuration that define computing ressource on local machine (CPU, TimeOut, RAM per process/label).
│ └── softwares.config # A software configuration that define where Nextflow have to fetch the container of each tool.
└── test # Folder containing a test data set
├── reads.fastq.gz
└── genome.fa
- Nextflow Documentation The official nextflow documentation very well written. Do not hesitate to extensively use the search bar!
- Basic pipeline example from Do not hesitate to look at other examples
- Nextflow Training Fundamentals The offical training module from to learn the fundamentals.
- Nextflow Training Advanded The offical training module from for advanced users.
- Nextflow Cheat Sheet A nice nextflow cheat sheet made by @dabrlu .
- Software Carpentry Nextflow training. High quality course made by the Software Carpentry.
- Bioinformatics Workshop on Tools for Reproducible Research - Nextflow A course based on NBIS material to learn Nextflow basics.
- Nextflow Slack
- Nf-core A community effort to collect a curated set of analysis pipelines built using Nextflow.
- Seqera Community
The prerequisites to run the pipeline are:
- The BiTeN repository
- Nextflow >= 22.04.0
- Docker or Singularity
# clone the workflow repository
git clone
# Move in it
cd BiTeN
Via conda
See here
conda create -n nextflow conda activate nextflow conda install nextflow
See here
Nextflow runs on most POSIX systems (Linux, macOS, etc) and can typically be installed by running these commands:# Make sure 11 or later is installed on your computer by using the command: java -version # Install Nextflow by entering this command in your terminal(it creates a file nextflow in the current dir): curl -s | bash # Add Nextflow binary to your user's PATH: mv nextflow ~/bin/ # OR system-wide installation: # sudo mv nextflow /usr/local/bin
To run the workflow you will need a container platform: docker or singularity.
Please follow the instructions at the Docker website
Please follow the instructions at the Singularity website
You can first check the available options and parameters by running:
nextflow run --help
To run the workflow you must select a profile according to the container platform you want to use:
, a profile using Singularity to run the containersdocker
, a profile using Docker to run the containers
The command will look like that:
nextflow run -profile docker <rest of paramaters>
Another profile is available (/!\Work in progress):
, to add if your system has a slurm executor (local by default)
The use of the slurm
profile will give a command like this one:
nextflow run -profile docker,slurm <rest of paramaters>
Test data are included in the BiTeN repository in the test
A typical command to run a test on single end data will look like that:
nextflow run -profile local,docker --genome test/genome.fa --reads test --single_end true
On success you should get a message looking like this:
BiTeN Pipeline execution summary
Completed at : 2024-03-07T21:40:23.180547+01:00
UUID : e2a131e3-3652-4c90-b3ad-78f758c06070
Duration : 8.4s
Success : true
Exit Status : 0
Error report : -
Parameter | Comment |
--help | prints the help section |
--reads | path to the directory containing the reads |
--pattern_reads | pattern to match the read files. In the case of single end data it would looks like: ".fastq.gz". In the case of paired end data it would looks like: "_{R1,R2}001.fastq.gz" or "*{1,2}.fastq.gz" |
--single_end | Boolean to inform if we have a single end or paired end data. |
--stranded | Boolean to inform if we have a single or stranded data. |
--genome | path to the genome file in fasta format. |
--bowtie2_options | Parameter to tune the bowtie2 aligner behaviour. |
Jacques Dainat (@Juke34) Juliette Hayer (@jhayer) Mahesh Binzer-Panchal (@mahesh-panchal)