Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add MechPredict plugin #772

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

ainefairbrother
Copy link
Collaborator

@ainefairbrother ainefairbrother commented Feb 5, 2025

JIRA ticket: ENSVAR-6662

Description

This PR adds the MechPredict plugin, which annotates missense variants with one of predicted gene-level mechanisms:

  • Dominant-negative (DN)
  • Gain-of-function (GOF)
  • Loss-of-function (LOF)

MechPredict does this by reading in gene-level probabilities predicted by an external model and assigning the most likely mechanism based on empircally-derived cut-offs described in the related manuscript. For example, if gene A has the following probability values: DN = 0.2, GOF = 0.3, LOF = 0.9, then the returned interpretation would be "gene_predicted_as_associated_with_loss_of_function_mechanism".

Notes

  • New VEP fields added by plugin
    • MechPredict_pDN: Numeric
    • MechPredict_pGOF: Numeric
    • MechPredict_pLOF: Numeric
    • MechPredict_interpretation: Character
  • The plugin only annotates transcript-variant pairs with missense_variant as the consequence. This is because the methods used by the authors to generate the predictions was optimised to assess missense mutations, the most common protein-altering mutations.
  • The plugin reads in MechPredict_input.tsv which can be generated using instructions in the module's header.
  • There is a known exception found during testing:
    • The 'test with 50 missense variants - should annotate all' test will annotate 49 variants only. I believe this is to do with VEP's most severe consequence functionality - if a variant-transcript pair has >1 consequence, VEP will assign the more severe one.
    • As such, in the case below, start_lost is assigned over missense, and so missense is removed as a consequence and is thus not annotated by MechPredict.

Testing

Test with 50 missense variants - should annotate all

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_missense_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

Test with 50 intron variants - should annotate none

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_intron_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant