A Next Generation Sequencing Consensus-based HLA Typing Workflow
The sub-workflow (blue box)
A: Bowtie2 Alignment to IMGT HLA reference (generates .sam)
B: Mapped reads extraction with samtools (generates .fastq.gz)
C: HLA typing with HLA-HD (generates .txt and .json)
RAM depend on input file size
For WGS results with 30x coverage: min RAM = 2Gb
For WGS results with 100x coverage: min RAM = 30Gb
You can run the workflow (cwl v1.0 or v1.2) on any cloud platform supporting CWL execution (i.e. Cavatica)
You can also run consHLA on an instance of Cromwell which utilises Azure backend.
Please use Cromwell version 79 or earlier because CWL was no longer supported after version 79. In addition, Cromwell only supports CWL v1.0 and the consHLA compatible with Cromwell are under ./cwl/v1.0
.
Since CWL v1.0 does not support conditional execution of workflow steps, consHLA in cwl v1.0 had to be split into two modes as:
./cwl/v1.0/consHLA WGS
contains the consHLA workflow that accepts two NGS inputs (germline and tumour WGS). Workflow dependencies are zipped../cwl/v1.0/consHLA WGS and RNA-seq
contains the consHLA workflow that accepts three NGS inputs (germline and tumour WGS and tumour RNA-seq). Workflow dependencies are zipped.
You will need to have a docker daemon available.
Running a .cwl
workflow requires specific software. Here we pick cwltool
. Install it following these instructions. cwltool
usage is shown below
cwltool --basedir . ./cwl/v1.2/consHLA.cwl ./sample_input.yml
You can run the whole or part of the consHLA workflow by specifing the .cwl
file and supplying the correct input.yml
*_sample1_hla.json
: HLA alleles typed from tumour WGS
*_sample2_hla.json
: HLA alleles typed from germline WGS
*_sample3_hla.json
: HLA alleles typed from tumour RNAseq (optional)
*_[three|two]Sample_hla.consensus.clinSig.[json|txt]
: Consensus HLA alleles for clinically significant genes
*_[three|two]Sample_hla.consensus.[json|txt]
: Consensus HLA alleles for all genes
Publicly available NGS data for two cell lines COLO829 and HCC1954 were used to demonstrate consHLA functionality. Download the files to validate consHLA installation. The expected output is provided in ./sample_output
- COLO829 tumour WGS link
- COLO829 germline WGS link
- COLO829 tumour RNAseq link
- HCC1954 tumour WGS link
- HCC1954 germline WGS link
Runtime tested with 30x WGS and RNAseq with 180M reads on amazon cloud computing EC2 instance model c5.4xlarge with 16 CPUs, 32Gb of RAM, and 1024Gb of attached storage
We would like to acknowledge Luminesce Alliance – Innovation for Children’s Health for its contribution and support. Luminesce Alliance, is a not-for-profit cooperative joint venture between the Sydney Children’s Hospitals Network, the Children’s Medical Research Institute, and the Children’s Cancer Institute. It has been established with the support of the NSW Government to coordinate and integrate paediatric research. Luminesce Alliance is also affiliated with the University of Sydney and the University of New South Wales Sydney.
consHLA is a wrapper on HLA-HD and is protected by MIT open source software license. For commercial use of consHLA, please contact the author of HLA-HD to obtain a commercial license.