remove restriction of 10 alleles max in multiallelic sites #11

akotlar · 2024-01-04T02:18:43Z

Simplifies makeHetHomozygotes and removes the rune-by-rune reading in favor of string by string, which enables us to have unlimited numbers of alleles. This is useful because dbSNP VCF format has many submissions with > 9 alleles

Previously we used a fast parsing strategy, which took a string like "0|1" and read it as runes (ascii code points) '0', '|', and '1'. This allowed us to avoid splitting on "|" and to also allow us to not split on ":" should there be metadata about the genotypes (e.g. GT:AD).

Now, we do the necessary splitting, and evaluate each allele as a string, which may be composed of multiple runes, allowing genotypes like "100|0" to be evaluated (for a hypothetical site with >= 100 alternate alleles).

We still keep a fast path for biallelic sites, entirely avoiding the need to do any processing besides exact match on the most common genotypes (0|0, 0/0, 1|0, 1/0, 0/1, 0|1).

Addresses #7

wingolab

I see there's a test case for the multiallelic with > 10 sites. Logic seems fine.

FYI, I needed to do the following to test it locally.

go mod init github.com/bystrogenomics/bystro-vcf
go mod tidy
go test -v ./...

cristinaetrv · 2024-01-08T22:46:24Z

main.go

-				continue
-			}
-
-			// We don't support haploid genotypes very well; I will count such sites


Is the new version going to represent haploid genotypes as haploid or still homozygous?

It will still be homozygous; in the for loop, each genotype value adds 1 count to the genotype count; if that genotype value matches the alleleNum (the multi allelic allele in the decomposed multi allelic, where alleleNum will be the only ALT allele on that line in the output of bystro-vcf), then alt also gets a count of 1.

bystro-vcf/main.go

Line 1017 in dfb9065

gtCount++

In the haploid case, if the only genotype value matches the alleleNum, that site will be homozygous/hemizygous, since altCount == 1 and gtCount == 1

akotlar · 2024-01-09T05:02:28Z

I see there's a test case for the multiallelic with > 10 sites. Logic seems fine.

FYI, I needed to do the following to test it locally.
go mod init github.com/bystrogenomics/bystro-vcf
go mod tidy
go test -v ./...

Ah, I forgot to add the go.mod and go.sum, thanks

cristinaetrv

Lgtm!

remove restriction of 10 alleles max in multiallelic sites

ce5f673

akotlar requested a review from cristinaetrv January 4, 2024 02:19

fix haploid with allele > 9, add test

1d1b7c5

akotlar requested a review from wingolab January 4, 2024 02:41

akotlar added 2 commits January 7, 2024 11:50

improve strictness of bi-allelic check, add clarifying comments

441c957

rune comparison is 15-20x faster than string, even with casts

dfb9065

wingolab previously approved these changes Jan 8, 2024

View reviewed changes

cristinaetrv reviewed Jan 8, 2024

View reviewed changes

add go.mod and go.sum

2c25734

akotlar dismissed wingolab’s stale review via 2c25734 January 9, 2024 04:59

cristinaetrv approved these changes Jan 9, 2024

View reviewed changes

akotlar merged commit 6c736a2 into master Jan 9, 2024

akotlar deleted the feature/more-than-10-alleles branch January 9, 2024 23:59

akotlar mentioned this pull request Jan 13, 2024

Review performance optimization of makeHetHomozygotes #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove restriction of 10 alleles max in multiallelic sites #11

remove restriction of 10 alleles max in multiallelic sites #11

akotlar commented Jan 4, 2024 •

edited

Loading

wingolab left a comment

cristinaetrv Jan 8, 2024

akotlar Jan 9, 2024

akotlar commented Jan 9, 2024

cristinaetrv left a comment

remove restriction of 10 alleles max in multiallelic sites #11

remove restriction of 10 alleles max in multiallelic sites #11

Conversation

akotlar commented Jan 4, 2024 • edited Loading

wingolab left a comment

Choose a reason for hiding this comment

cristinaetrv Jan 8, 2024

Choose a reason for hiding this comment

akotlar Jan 9, 2024

Choose a reason for hiding this comment

akotlar commented Jan 9, 2024

cristinaetrv left a comment

Choose a reason for hiding this comment

akotlar commented Jan 4, 2024 •

edited

Loading