Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Miniprot p2 #34

Merged
merged 12 commits into from
Dec 1, 2022
Merged
78 changes: 71 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,48 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

The version 1 pipeline will be made up of the following steps:

- INPUT_READ

- The reading of the input yaml and conversion into channels for the sub-workflows.


- GENERATE_GENOME
Generate .genome for the input genome.

- Generate .genome for the input genome.
- Uses SAMTOOLS FAIDX.


- GENERATE_ALIGNMENT
Generate .as files from BLAST alignment results of input genome against set datasets.

- Peptides will run pep_alignment.nf
- Uses Miniprot.

- CDNA, RNA and CDS will run through nuc_alignment.nf
- Uses Minimap2.


- INSILICO DIGEST

- Generates a map of enzymatic digests using 3 Bionano enzymes
- Uses Bionano software.


- SELFCOMP

- Identifies regions of self-complementary sequence
- Uses Mummer.

- SYNTENY

- Generates syntenic alignments between other high quality genomes.
- Uses Minimap2.


- ANCESTRAL ELEMENT ANALYSIS
- Lepidopteran Element Analysis
- Uses BUSCO and custom python scripts to parse ancestral lep genes
- This will eventually have a number of clade specific sub-workflows.


## Quick Start

Expand All @@ -52,7 +89,12 @@ The version 1 pipeline will be made up of the following steps:
<!-- TODO nf-core: Update the example "typical command" below used to run the pipeline -->

```console
nextflow run nf-core/treeval --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
nextflow run main.nf -profile singularity --input treeval.yaml
```

LSF specific run
```console
echo "nextflow run main.nf -profile singularity --input treeval.yaml" | bsub -Is -tty -e error -o out -n 10 -q normal -M10000 -R'select[mem>10000] rusage[mem=10000] span[hosts=1]'
```

## Documentation
Expand All @@ -61,11 +103,14 @@ The nf-core/treeval pipeline comes with documentation about the pipeline [usage]

## Credits

nf-core/treeval was originally written by Damon-Lee Pointon, Yumi Sims and William Eagles.
nf-core/treeval was originally written by Damon-Lee Pointon (@DLBPointon), Yumi Sims (@yumisims) and William Eagles (@weaglesBio).

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
@muffato
@gq1
@ksenia-krasheninnikova
@priyanka-surana

## Contributions and Support

Expand All @@ -75,10 +120,29 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/treeval for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
### Tools

BedTools

Bionano CMAP

BUSCO

Minimap2

Miniprot

Mummer

Python3

Samtools

TABIX

UCSC

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

Expand Down
3 changes: 3 additions & 0 deletions assets/treeval_test.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
assembly:
sizeClass: '' # S if {genome => 4Gb} else L
level: scaffold
sample_id: nxOscDoli1
classT: nematode
Expand All @@ -17,3 +18,5 @@ self_comp:
synteny:
synteny_genome_path: "/nfs/team135/dp24/treeval_testdata/synteny_data"
outdir: "NEEDS TESTING"
intron:
size: "50k"
12 changes: 12 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,25 @@ process {
maxRetries = 1
maxErrors = '-1'

withName:SAMTOOLS_MERGE {
memory = { check_max( 50.GB * task.attempt, 'memory') }
}

// 20GB * (task attempt * 2) = 40GB, 80GB, 120GB
withName:MUMMER {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 20.GB * Math.ceil( task.attempt * 2 ), 'memory' ) }
time = { check_max( 2.h * task.attempt, 'time' ) }
}

// Process-specific resource requirements
// NOTE - Please try and re-use the labels below as much as possible.
// These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
// If possible, it would be nice to keep the same label naming convention when
// adding in your local modules too.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors

withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
Expand Down
33 changes: 15 additions & 18 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,12 @@ process {
]
}

withName: FILTER_BLAST {
ext.args = 90.00
withName: MINIPROT_ALIGN {
ext.args = "-u --gff -j 1"
}

withName: BLAST_MAKEBLASTDB {
ext.args = '-dbtype nucl'
}

withName: BLAST_TBLASTN {
ext.args = '-outfmt 6 -task tblastn -evalue 0.001 -qcov_hsp_perc 60 -max_target_seqs 1'
}

withName: BLAST_BLASTN {
ext.args = '-outfmt 6 -task blastn -evalue 0.001'
withName: '.*:.*:.*:NUC_ALIGNMENTS:BEDTOOLS_BAMTOBED' {
ext.args = "-bed12"
}

withName: '.*:.*:INSILICO_DIGEST:UCSC_BEDTOBIGBED' {
Expand All @@ -53,16 +45,21 @@ process {
}

withName: '.*:.*:SELFCOMP:UCSC_BEDTOBIGBED' {
ext.args = { " -type=bed3+6 -extraIndex=name,qStart,qEnd" }
ext.prefix = { "${meta.id}" }
ext.args = { " -type=bed3+6 -extraIndex=name,qStart,qEnd" }
ext.prefix = { "${meta.id}" }
}

withName: '.*:.*:SYNTENY:MINIMAP2_ALIGN' {
ext.args = '-t 8 -x asm10'
ext.prefix = { "${meta.id}_synteny_${reference.getName().tokenize('.')[0]}" }
}

withName: MINIMAP2_ALIGN {
ext.args = '-t 8 -x asm10'
ext.prefix = { "${meta.id}_synteny_${reference.getName().tokenize('.')[0]}" }
withName: '.*:.*:.*:NUC_ALIGNMENTS:MINIMAP2_ALIGN' {
ext.args = "-ax splice"
ext.prefix = { "${meta.id}_alignment_${reference.getName().tokenize('.')[0]}" }
}

withName : MUMMER {
ext.args = "-n -b -c -L -l 400"
}
}
}
27 changes: 24 additions & 3 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,32 @@
"minimap2/align": {
"git_sha": "1a5a9e7b4009dcf34e6867dd1a5a1d9a718b027b"
},
"mummer": {
"git_sha": "233fa70811a03a4cecb2ece483b5c8396e2cee1d"
},
"nf-core/bedtools/sort": {
"git_sha": "4bb1d4e362a38642e877afe41aaf58ded9e56c86"
},
"nf-core/samtools/merge": {
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905"
},
"nf-core/tabix/bgziptabix": {
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905"
},
"samtools/faidx": {
"git_sha": "3eb99152cedbb7280258858e5df08478a4670696"
},
"ucsc/bedtobigbed": {
"git_sha": "90aef30f432332bdf0ce9f4b9004aa5d5c4960bb"
},
"mummer": {
"git_sha": "233fa70811a03a4cecb2ece483b5c8396e2cee1d"
}
},
"sanger-tol/nf-core-modules": {
"": {
"git_sha": "5f56c87e361ae0a90b1e54f574ab48c58988f5f7"
},
"bedtools/bamtobed": {
"git_sha": "90aef30f432332bdf0ce9f4b9004aa5d5c4960bb"
},
"blast/tblastn": {
"git_sha": "a6effa038df6248bad2329c6ab104c5ae9d556c5"
},
Expand All @@ -41,6 +56,12 @@
"makecmap/renamecmapids": {
"git_sha": "461204ac9e8bae621a24a493db3ec9f8274b7757"
},
"miniprot/align": {
"git_sha": "f7c0161a1375840fca96f1bab27d3e9a8d423b06"
},
"miniprot/index": {
"git_sha": "5f56c87e361ae0a90b1e54f574ab48c58988f5f7"
},
"selfcomp/mapids": {
"git_sha": "6f5a750a30268d30ab55ad2ae6e8af48b0d29d1b"
},
Expand Down
32 changes: 32 additions & 0 deletions modules/local/bedtools_bed_sort.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
process BEDTOOLS_BED_SORT {
tag "${meta.id}"
label "process_medium"

def version = '2.30.0--hc088bd4_0'

if (params.enable_conda) {
exit 1, "Conda environments cannot be used when using the chunkfasta process. Please use docker or singularity containers."
}
container "quay.io/repository/biocontainers/bedtools:${version}"

input:
tuple val( meta ), path( merged_bam )

output:
tuple val( meta ), file( "*.bed" ), emit: sorted_bed
path "versions.yml", emit: versions

script:
"""
bamToBed \\
-i $merged_bam \\
-bed12 | \\
bedtools sort \\
-i > $meta.id/.sorted.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
BEDTOOLS_BED_SORT: $version
END_VERSIONS
"""
}
1 change: 1 addition & 0 deletions modules/local/chunkfasta.nf
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
process CHUNKFASTA {
tag "${meta.id}"
label "process_medium"

if (params.enable_conda) {
exit 1, "Conda environments cannot be used when using the chunkfasta process. Please use docker or singularity containers."
Expand Down
15 changes: 15 additions & 0 deletions modules/local/concat_gff.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
process CONCAT_GFF {
tag "${meta.id} - ${meta.type}"
label "process_medium"

input:
tuple val(meta), file(input_files)

output:
tuple val(meta), file ("${meta.id}-${meta.type}-all.gff"), emit: concat_gff

script:
"""
cat $input_files > $meta.id-$meta.type-all.gff
"""
}
2 changes: 1 addition & 1 deletion modules/local/csv_generator.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process CSV_GENERATOR {
tag "${ch_org}"
label 'process_small'
label 'process_low'

input:
val ch_org
Expand Down
2 changes: 1 addition & 1 deletion modules/local/filter_blast.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process FILTER_BLAST {
tag "${meta.id} - ${meta.type}"
label "process_small"
label "process_low"

def version = '0.004-c1'

Expand Down
2 changes: 1 addition & 1 deletion modules/local/generate_genome_file.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process GENERATE_GENOME_FILE {
tag "${meta.id}"
label "process_small"
label "process_low"

input:
tuple val( meta ), path( fai )
Expand Down
2 changes: 1 addition & 1 deletion modules/local/get_synteny_genomes.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process GET_SYNTENY_GENOMES {
tag "${assembly_classT}"
label "process_small"
label "process_low"

input:
val ( synteny_path )
Expand Down
17 changes: 17 additions & 0 deletions modules/local/merge_bam.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
process MERGE_BAM {
tag " ${meta.id} "
label "process_medium"

container ""

script:
"""
samtools merge merged.bam *.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
MERGE_BAM: $version
END_VERSIONS
"""

}
29 changes: 29 additions & 0 deletions modules/local/minimap_samtools.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
process MINIMAP_SAMTOOLS {
tag "${meta.id}"
label "process_medium"

def version = '2.22_1.12'

container "niemasd/minimap2_samtools:${version}"

input:
tuple val( ref_meta ), file( ref )
tuple val( meta ), file( nuc_file )

output:
tuple val( meta ), file( "${nuc_file}.bam" ), emit: partial_alignment
path "versions.yml", emit: versions

script:
def intron_size = task.ext.args ?: '50k'
"""
minimap2 -ax splice $ref $nuc_file -G $intron_size | samtools view -Sb -T $ref - > ${nuc_file}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
MINIMAP_SAMTOOLS: $version
END_VERSIONS
"""
}

//USE THE OFFICIAL MINIMAP MODULE
Loading