The runtime and accuracy reported in this page are generated using
n2-standard-96
GCP instances which has the following configuration:
GCP instance type: n2-standard-96
CPUs: 96-core (vCPU)
Memory: 384GiB
GPUs: 0
Runtime is on HG003 (all chromosomes). Reported runtime is an average of 5 runs.
Stage | Time (minutes) |
---|---|
make_examples | 54m58.62s |
call_variants | 38m45.29s |
postprocess_variants (with gVCF) | 8m22.88s |
vcf_stats_report (optional) | 5m37.52s (optional) |
total | 113m11.70s (1h53m11.70s) |
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.
Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
---|---|---|---|---|---|---|
INDEL | 501653 | 2848 | 1289 | 0.994355 | 0.997541 | 0.995945 |
SNP | 3306740 | 20756 | 4386 | 0.993762 | 0.998676 | 0.996213 |
Runtime is on HG003 (all chromosomes). Reported runtime is an average of 5 runs.
Stage | Time (minutes) |
---|---|
make_examples | 3m17.64s |
call_variants | 0m56.36s |
postprocess_variants (with gVCF) | 0m39.27s |
vcf_stats_report (optional) | 0m4.93s (optional) |
total | 5m26.00s |
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.
Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
---|---|---|---|---|---|---|
INDEL | 1020 | 31 | 7 | 0.970504 | 0.993327 | 0.981783 |
SNP | 24984 | 295 | 60 | 0.98833 | 0.997604 | 0.992946 |
In release 1.8.0, we have updated the PacBio test data from HG003 Sequel-II to latest Revio with SPRQ chemistry data to showcase performance on the updated platform and chemistry. The numbers reported here are generated using the bam that can be found in:
gs://deepvariant/pacbio-case-study-testdata/HG003.SPRQ.pacbio.GRCh38.nov2024.bam
Which is also available through here.
Runtime is on HG003 (all chromosomes). Reported runtime is an average of 5 runs.
Stage | Time (minutes) |
---|---|
make_examples | 31m51.00s |
call_variants | 34m49.62s |
postprocess_variants (with gVCF) | 5m28.59s |
vcf_stats_report (optional) | 5m36.49s (optional) |
total | 86m50.09s (1h26m50.09s) |
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.
Starting from v1.4.0, users don't need to phase the BAMs first, and only need to run DeepVariant once.
Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
---|---|---|---|---|---|---|
INDEL | 500955 | 3546 | 3373 | 0.992971 | 0.993555 | 0.993263 |
SNP | 3321825 | 5670 | 4263 | 0.998296 | 0.99872 | 0.998508 |
Runtime is on HG003 reads (all chromosomes). Reported runtime is an average of 5 runs.
Stage | Time (minutes) |
---|---|
make_examples | 53m25.60s |
call_variants | 55m24.86s |
postprocess_variants (with gVCF) | 7m17.83s |
vcf_stats_report (optional) | 6m30.29s (optional) |
total | 127m56.44s (2h7m56.44s) |
hap.py results on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training.
Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
---|---|---|---|---|---|---|
INDEL | 452010 | 52491 | 40289 | 0.895955 | 0.920501 | 0.908062 |
SNP | 3321452 | 6032 | 3942 | 0.998187 | 0.998815 | 0.998501 |
Runtime is on HG003 (all chromosomes). Reported runtime is an average of 5 runs.
Stage | Time (minutes) |
---|---|
make_examples | 71m52.43s |
call_variants | 51m42.37s |
postprocess_variants (with gVCF) | 4m6.13s |
vcf_stats_report (optional) | 5m18.39s (optional) |
total | 151m34.49s (2h31m34.49s) |
Evaluating on HG003 (all chromosomes, using NIST v4.2.1 truth), which was held out while training the hybrid model.
Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
---|---|---|---|---|---|---|
INDEL | 503109 | 1392 | 2636 | 0.997241 | 0.995022 | 0.99613 |
SNP | 3324179 | 3316 | 2049 | 0.999003 | 0.999384 | 0.999194 |
The DeepVariant VCFs, gVCFs, and hap.py evaluation outputs are available at:
gs://deepvariant/case-study-outputs
You can also inspect them in a web browser here: https://42basepairs.com/browse/gs/deepvariant/case-study-outputs
For simplicity and consistency, we report runtime with a CPU instance with 96 CPUs This is NOT the fastest or cheapest configuration.
Use gcloud compute ssh
to log in to the newly created instance.
Download and run any of the following case study scripts:
# Get the script.
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.8/scripts/inference_deepvariant.sh
# WGS
bash inference_deepvariant.sh --model_preset WGS
# WES
bash inference_deepvariant.sh --model_preset WES
# PacBio
bash inference_deepvariant.sh --model_preset PACBIO
# ONT_R104
bash inference_deepvariant.sh --model_preset ONT_R104
# Hybrid
bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA
Runtime metrics are taken from the resulting log after each stage of DeepVariant. The runtime numbers reported above are the average of 5 runs each. The accuracy metrics come from the hap.py summary.csv output file. The runs are deterministic so all 5 runs produced the same output.