Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Are multi-nucleotide and complex variants ignored? #129

Open
rebber opened this issue Jan 18, 2024 · 2 comments
Open

Are multi-nucleotide and complex variants ignored? #129

rebber opened this issue Jan 18, 2024 · 2 comments

Comments

@rebber
Copy link

rebber commented Jan 18, 2024

Hi,

We use somaticseq to just merge variants from Mutect2 and HMF Tools SAGE (the latter as "arbitrary" vcf's), the classification module is not used currently. However we were missing some multi-nucleotide variants (MNVs) in the somaticseq output, so I looked into the somaticseq code for how they are handled. I found that it seems any variants in input vcf's with both REF and ALT with length >1 base are ignored.

I see the following division into SNVs or indels, both in modify_ssMuTect2.py and splitVcf.py (for preparation of arbitrary vcf's):

if len(vcf_i.refbase) == 1 and len(vcf_i.altbase) == 1:
    snv_out.write( new_line + '\n' )
elif len(vcf_i.refbase) == 1 or len(vcf_i.altbase) == 1:
    indel_out.write( new_line + '\n' )

And any other variants, i.e len(vcf_i.refbase) > 1 and len(vcf_i.altbase) > 1, will be skipped.

Is it a correct observation that MNVs and complex variants are ignored? What was the reasoning behind setting it up like this? Is there any way to go around it?

We do not want to miss these types of variants, and have to look into other tools if we can't avoid this behaivour with somaticseq.

Best regards
Rebecka

@litaifang
Copy link
Contributor

litaifang commented Jan 22, 2024

Yeah MNV and complex variants are limitations because they just haven't been our focus.
For VarDict, MNVs are parsed into multiple SNVs.
I can do similar things for MuTect2 and other outputs, i.e., parse MNVs into multiple SNVs, and parse complex variants into SNVs and indels. If you can give me examples of complex variants and MNV's from those outputs, I can incorporate them.

@rebber
Copy link
Author

rebber commented Jan 26, 2024

Thanks for a quick reply!
Primarily we want to keep any MNVs and complex variants together, in order to get proper annotation of them by VEP. We will therefore look into some other solution for variant merging from different callers

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants