Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Interpretation of allele codes for multiallelic variants that have been split into separate lines #1764

Closed
astro-geno opened this issue Jul 28, 2022 · 7 comments

Comments

@astro-geno
Copy link

Hi,

Just a quick question/clarification:

Let's say a multiallelic variant has two ALT alleles: ALT1 and ALT2. Before splitting this multiallelic variant, allele code '0' refers to the REF allele only. After splitting this multiallelic variant using 'bcftools norm -m -', the single record for this variant in the VCF has been split into two records, one for ALT1 and one for ALT2. In these two post-split records, allele code '0' now refers to any allele other than the single ALT allele defined in the newly split record, and doesn't simply mean REF allele, correct? In other words, in the post-split record for ALT1, allele code '1' refers to the ALT1 allele and allele code '0' collectively refers to REF or ALT2 (i.e., anything other than ALT1)? I just want to be sure I understand this correctly.

Thanks very much!
Aaron

@pd3
Copy link
Member

pd3 commented Aug 12, 2022

Yes. Another way to look at it is to interpret non-zero indexes as string substitutions that modify the REF allele into the ALT allele, while 0 means no change.

@pd3 pd3 closed this as completed Aug 12, 2022
@astro-geno
Copy link
Author

Thanks!

@benostendorf
Copy link

Would it be worth considering to add the option to set non-REF and non-ALT1 alleles to NA/. instead of REF in this situation?
In my opinion there are situtations where you'd like to compare ALT1 to REF rather than ALT1 to all non-ALT1.

@pd3
Copy link
Member

pd3 commented Sep 20, 2022

@benostendorf Do you mean literally set the GT field to be NA/.? Best if you provided a specific test case demonstrating the behavior you'd like

@pd3
Copy link
Member

pd3 commented Oct 11, 2022

Would it be worth considering to add the option to set non-REF and non-ALT1 alleles to NA/. instead of REF in this situation? In my opinion there are situtations where you'd like to compare ALT1 to REF rather than ALT1 to all non-ALT1.

Just added a new option --multi-overlap which is analogous to --atom-overlap and allows to select between the ref (0) or missing (.) allele d984ce9

@gernophil
Copy link

Am I correct that this option is not integrated in the latest release (1.16) yet? If so, when will this happen?

@pd3
Copy link
Member

pd3 commented Dec 8, 2022

@gernophil That's correct, you need the latest github version for that, see here how to install http://samtools.github.io/bcftools/howtos/install.html. I can't give a time estimate atm.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants