-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Mykrobe overcounts homopol deletions when making probes #148
Comments
For context. We have this example "Isoniazid": {
"predict": "R",
"called_by": {
"katG_GC1037G-GC2155074C": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-711.0227088947968,
-417.9483717274669
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 175,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 293
},
"_cls": "Call.VariantCall"
},
"katG_CC1038C-GG2155073G": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-727.5071540961122,
-377.76816030583126
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 160,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 350
},
"_cls": "Call.VariantCall"
},
"katG_CC1039C-GG2155072G": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-729.808268296947,
-372.51398695693916
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 158,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 357
},
"_cls": "Call.VariantCall"
}
}
},
The three mutations in the JSON above could concieveably be the same deletion...or a 2bp deletion I guess. GC1037G I guess it probably is a single 1bp deletion - at 1038. Although I'm a bit confused about how the mutations work. Which of the two bases is the one at the position? If I pull out these three bases, with one base flanking, from the katG reference, I get G GCC T. So if the position describes the first base, then 1039 should read CT1039C right? But if the second base describes the position, it's the same problem right? |
Hmmm, wonder why I got a different sequence. |
The probe generator is making one probe for each 1bp and 2bp deletions in katG and pncA. But if you make a all 1bp deletions in a homopolymer, they are essentially the same.
eg deleting one A from AAA could happen in 3 places and give the same sequence.
This leads to annoying things where we report >1 mutation being detected, when it is basicvally the same thing detected twice/whatever.
Workaround suggestions
The text was updated successfully, but these errors were encountered: