Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update subworkflow for deepvariant to version 1.8.0 #7473

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 1 addition & 14 deletions modules/nf-core/deepvariant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,7 @@ These module subcommands incorporate the individual steps of the DeepVariant pip
## makeexamples

This process imports the data used for calling, and thus decides what information is available to the
deep neural network. It's important to import the correct channels for the model you want to use.

The script `run_deepvariant` (not used in the subworkflow) does this automatically. You can refer to
the implementation in the DeepVariant repo:

https://github.com/google/deepvariant/blob/bf9ed7e6de97cf6c8381694cb996317a740625ad/scripts/run_deepvariant.py#L367

For WGS and WES models you need to enable the `insert_size` channel. Specify the following in the config:

```
withName: "DEEPVARIANT_MAKEEXAMPLES" {
ext.args = '--channels "insert_size"'
}
```
deep neural network. It's important to use the correct settings for the model you want to use for each step. The script [`run_deepvariant.py`](https://github.com/google/deepvariant/blob/r1.8/scripts/run_deepvariant.py) does this automatically. To figure out the flags needed for each model, you can run `run_deepvariant.py` while adding `dry_run=true`, to print out the command used for each step, as described [here](https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-pacbio-model-case-study.md).

## callvariants

Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/deepvariant/callvariants/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ process DEEPVARIANT_CALLVARIANTS {
label 'process_high'

//Conda is not supported at the moment
container "nf-core/deepvariant:1.6.1"
container "docker.io/google/deepvariant:1.8.0"

input:
tuple val(meta), path(make_examples_tfrecords)
Expand Down
14 changes: 7 additions & 7 deletions modules/nf-core/deepvariant/callvariants/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"versions": {
"content": [
[
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
"versions.yml:md5,384f8c54b3d1b03f7bdb583cb3c93e5c"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
"nextflow": "24.10.3"
},
"timestamp": "2024-08-09T16:38:47.927241"
"timestamp": "2025-01-08T10:33:45.081424542"
},
"homo_sapiens-wgs-call_variants_tfrecords-filenames": {
"content": [
Expand All @@ -34,7 +34,7 @@
]
],
"1": [
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
"versions.yml:md5,384f8c54b3d1b03f7bdb583cb3c93e5c"
],
"call_variants_tfrecords": [
[
Expand All @@ -46,14 +46,14 @@
]
],
"versions": [
"versions.yml:md5,5ff99ffba1e56e4e919d3dfc2d0f3cbb"
"versions.yml:md5,384f8c54b3d1b03f7bdb583cb3c93e5c"
]
}
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
"nextflow": "24.10.3"
},
"timestamp": "2024-08-13T21:07:17.335788301"
"timestamp": "2025-01-08T10:07:27.993998742"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ process {
}
process {
withName: "DEEPVARIANT_MAKEEXAMPLES" {
ext.args = '--channels "insert_size"'
ext.args = '--checkpoint "/opt/models/wgs" --call_small_model_examples --small_model_indel_gq_threshold "30" --small_model_snp_gq_threshold "25" --small_model_vaf_context_window_size "51" --trained_small_model_path "/opt/smallmodels/wgs"'
}
}
2 changes: 1 addition & 1 deletion modules/nf-core/deepvariant/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ process DEEPVARIANT {
tag "$meta.id"
label 'process_high'

container "nf-core/deepvariant:1.6.1"
container "docker.io/google/deepvariant:1.8.0"

input:
tuple val(meta), path(input), path(index), path(intervals)
Expand Down
3 changes: 2 additions & 1 deletion modules/nf-core/deepvariant/makeexamples/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ process DEEPVARIANT_MAKEEXAMPLES {
label 'process_high'

//Conda is not supported at the moment
container "nf-core/deepvariant:1.6.1"
container "docker.io/google/deepvariant:1.8.0"

input:
tuple val(meta), path(input), path(index), path(intervals)
Expand All @@ -15,6 +15,7 @@ process DEEPVARIANT_MAKEEXAMPLES {
output:
tuple val(meta), path("${prefix}.examples.tfrecord-*-of-*.gz{,.example_info.json}"), emit: examples
tuple val(meta), path("${prefix}.gvcf.tfrecord-*-of-*.gz"), emit: gvcf
tuple val(meta), path("${prefix}_call_variant_outputs.examples.tfrecord-*-of-*.gz", arity: "0..*"), emit: small_model_calls
path "versions.yml", emit: versions

when:
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/deepvariant/makeexamples/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,15 @@ output:
type: list
description: |
Tuple containing sample metadata and the GVCF data in tfrecord format
- small_model_calls:
- meta:
type: list
description: |
Tuple containing sample metadata
- '${prefix}_call_variant_outputs.examples.tfrecord-*-of-*.gz", arity: "0..*':
type: list
description: |
Optional variant calls from the small model, if enabled, in tfrecord format
- versions:
- versions.yml:
type: file
Expand Down
14 changes: 7 additions & 7 deletions modules/nf-core/deepvariant/makeexamples/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,13 @@ nextflow_process {
{ assert process.out.examples.get(0).get(0) == [ id:'test', single_end:false ] },
{ assert process.out.gvcf.get(0).get(0) == [ id:'test', single_end:false ] },
{ assert process.out.examples.get(0).get(1).size() == 4 },
{ assert snapshot( // Check examples (tfrecord / json) file name list
{ assert snapshot( // Check examples (tfrecord / json) file name list
file(process.out.examples.get(0).get(1).get(0)).name,
file(process.out.examples.get(0).get(1).get(1)).name,
file(process.out.examples.get(0).get(1).get(2)).name,
file(process.out.examples.get(0).get(1).get(3)).name,
).match("test1-exaamples-filenames")},

{ assert process.out.gvcf.get(0).get(0) == [ id:'test', single_end:false ] },
{ assert process.out.gvcf.get(0).get(1).size() == 2 },
{ assert snapshot( // Check gvcf file name list
Expand Down Expand Up @@ -154,7 +154,7 @@ nextflow_process {
{ assert process.out.examples.get(0).get(0) == [ id:'test', single_end:false ] },
// The test is always run with 2 cpus
{ assert process.out.examples.get(0).get(1).size() == 4 },
{ assert snapshot( // Check examples (tfrecord / json) file name list
{ assert snapshot( // Check examples (tfrecord / json) file name list
file(process.out.examples.get(0).get(1).get(0)).name,
file(process.out.examples.get(0).get(1).get(1)).name,
file(process.out.examples.get(0).get(1).get(2)).name,
Expand All @@ -173,7 +173,7 @@ nextflow_process {
}

test("stub") {

options "-stub"

when {
Expand Down Expand Up @@ -208,13 +208,13 @@ nextflow_process {
assertAll(
{ assert process.success },
{ assert process.out.examples.get(0).get(1).size() == 4 },
{ assert snapshot( // Check examples (tfrecord / json) file name list
{ assert snapshot( // Check examples (tfrecord / json) file name list
file(process.out.examples.get(0).get(1).get(0)).name,
file(process.out.examples.get(0).get(1).get(1)).name,
file(process.out.examples.get(0).get(1).get(2)).name,
file(process.out.examples.get(0).get(1).get(3)).name,
).match("test4-examples-filenames")},

{ assert process.out.gvcf.get(0).get(0) == [ id:'test', single_end:false ] },
{ assert process.out.gvcf.get(0).get(1).size() == 2 },
{ assert snapshot( // Check gvcf file name list
Expand All @@ -225,4 +225,4 @@ nextflow_process {
}
}

}
}
80 changes: 40 additions & 40 deletions modules/nf-core/deepvariant/makeexamples/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,18 @@
},
"timestamp": "2024-09-04T16:09:47.885995"
},
"test3-versions": {
"content": [
[
"versions.yml:md5,2bfe7f3902fb3d9e2dc1d97dc6347c9c"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.3"
},
"timestamp": "2025-01-08T10:34:31.031697972"
},
"test2-examples-filenames": {
"content": [
"test.examples.tfrecord-00000-of-00002.gz",
Expand All @@ -26,14 +38,27 @@
"test2-versions": {
"content": [
[
"versions.yml:md5,842dca9323f25aa3cfd67789d18e7e33"
"versions.yml:md5,2bfe7f3902fb3d9e2dc1d97dc6347c9c"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.3"
},
"timestamp": "2025-01-08T10:34:17.998740352"
},
"test1-exaamples-filenames": {
"content": [
"test.examples.tfrecord-00000-of-00002.gz",
"test.examples.tfrecord-00000-of-00002.gz.example_info.json",
"test.examples.tfrecord-00001-of-00002.gz",
"test.examples.tfrecord-00001-of-00002.gz.example_info.json"
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-08-09T16:39:28.960959"
"timestamp": "2024-09-04T16:09:47.874585"
},
"test4-examples-filenames": {
"content": [
Expand All @@ -51,14 +76,25 @@
"test1-versions": {
"content": [
[
"versions.yml:md5,842dca9323f25aa3cfd67789d18e7e33"
"versions.yml:md5,2bfe7f3902fb3d9e2dc1d97dc6347c9c"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.3"
},
"timestamp": "2025-01-08T10:34:04.940271042"
},
"test3-gvcf-filenames": {
"content": [
"test.gvcf.tfrecord-00000-of-00002.gz",
"test.gvcf.tfrecord-00001-of-00002.gz"
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-08-09T16:39:13.57526"
"timestamp": "2024-09-04T16:10:17.714443"
},
"test3-examples-filenames": {
"content": [
Expand Down Expand Up @@ -94,41 +130,5 @@
"nextflow": "24.04.4"
},
"timestamp": "2024-09-04T16:10:27.423442"
},
"test3-versions": {
"content": [
[
"versions.yml:md5,842dca9323f25aa3cfd67789d18e7e33"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-08-09T16:39:44.83616"
},
"test1-exaamples-filenames": {
"content": [
"test.examples.tfrecord-00000-of-00002.gz",
"test.examples.tfrecord-00000-of-00002.gz.example_info.json",
"test.examples.tfrecord-00001-of-00002.gz",
"test.examples.tfrecord-00001-of-00002.gz.example_info.json"
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-04T16:09:47.874585"
},
"test3-gvcf-filenames": {
"content": [
"test.gvcf.tfrecord-00000-of-00002.gz",
"test.gvcf.tfrecord-00001-of-00002.gz"
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-04T16:10:17.714443"
}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process {
withName: "DEEPVARIANT_MAKEEXAMPLES" {
ext.args = '--channels "insert_size"'
cpus = 2 // The number of output files is determined by cpus - keep it the same for tests
ext.args = '--checkpoint "/opt/models/wgs" --call_small_model_examples --small_model_indel_gq_threshold "30" --small_model_snp_gq_threshold "25" --small_model_vaf_context_window_size "51" --trained_small_model_path "/opt/smallmodels/wgs"'
}
}
22 changes: 20 additions & 2 deletions modules/nf-core/deepvariant/postprocessvariants/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ process DEEPVARIANT_POSTPROCESSVARIANTS {
label 'process_medium'

//Conda is not supported at the moment
container "nf-core/deepvariant:1.6.1"
container "docker.io/google/deepvariant:1.8.0"

input:
tuple val(meta), path(variant_calls_tfrecord_files), path(gvcf_tfrecords)
tuple val(meta), path(variant_calls_tfrecord_files), path(gvcf_tfrecords), path(small_model_calls), path(intervals)
tuple val(meta2), path(fasta)
tuple val(meta3), path(fai)
tuple val(meta4), path(gzi)
Expand All @@ -30,6 +30,7 @@ process DEEPVARIANT_POSTPROCESSVARIANTS {
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

def regions = intervals ? "--regions ${intervals}" : ""
def variant_calls_tfrecord_name = variant_calls_tfrecord_files[0].name.replaceFirst(/-\d{5}-of-\d{5}/, "")

def gvcf_matcher = gvcf_tfrecords[0].baseName =~ /^(.+)-\d{5}-of-(\d{5})$/
Expand All @@ -41,6 +42,22 @@ process DEEPVARIANT_POSTPROCESSVARIANTS {
// Reconstruct the logical name - ${tfrecord_name}.examples.tfrecord@${task.cpus}.gz
def gvcf_tfrecords_logical_name = "${gvcf_tfrecord_name}@${gvcf_shardCount}.gz"

// The following block determines whether the small model was used, and if so, adds the variant calls from it
// to the argument --small_model_cvo_records.
def small_model_arg = ""
def small_model_calls_copy = small_model_calls // Create a copy of the process-level variable so it can be used inside the if{}
if (small_model_calls_copy) {
def small_model_matcher = (small_model_calls_copy[0].baseName =~ /^(.+)-\d{5}-of-(\d{5})$/)
if (!small_model_matcher.matches()) {
throw new IllegalArgumentException("tfrecord baseName '" + small_model_calls_copy[0].baseName + "' doesn't match the expected pattern")
}
def small_model_tfrecord_name = small_model_matcher[0][1]
def small_model_shardCount = small_model_matcher[0][2]
// Reconstruct the logical name. Example: test_call_variant_outputs.examples.tfrecord@12.gz
def small_model_tfrecords_logical_name = "${small_model_tfrecord_name}@${small_model_shardCount}.gz"
small_model_arg = "--small_model_cvo_records ${small_model_tfrecords_logical_name}"
}

"""
/opt/deepvariant/bin/postprocess_variants \\
${args} \\
Expand All @@ -49,6 +66,7 @@ process DEEPVARIANT_POSTPROCESSVARIANTS {
--outfile "${prefix}.vcf.gz" \\
--nonvariant_site_tfrecord_path "${gvcf_tfrecords_logical_name}" \\
--gvcf_outfile "${prefix}.g.vcf.gz" \\
${regions} ${small_model_arg} \\
--cpus $task.cpus

cat <<-END_VERSIONS > versions.yml
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/deepvariant/postprocessvariants/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@ input:
description: |
Sharded tfrecord file from DEEPVARIANT_MAKEEXAMPLES with the coverage information used for GVCF output
pattern: "*.gz"
- small_model_calls:
type: file
description: |
Sharded tfrecord file from DEEPVARIANT_MAKEEXAMPLES with variant calls from the small model
pattern: "*.gz"
- intervals:
type: file
description: Interval file for targeted regions
pattern: "*.bed"
- - meta2:
type: map
description: |
Expand Down
Loading
Loading