-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Locus spec improvements #75
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egor-dolzhenko I have completed my review. Besides a few comments below, one last question I have is that: this doesn't seem to be backwards compatible with the previous catalog format, does it? It requires a "LocusType" field to be present in the catalog.
{ | ||
public: | ||
CnvLocusSpecification( | ||
RegionId locusId, LocusType locusType, CnvLocusSubtype locusSubtype, ContigCopyNumber contigCopyNumber, | ||
GenomicRegion locusLocation, GenotyperParameters genotyperParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep locusLocation, at least for cnvLocusSpec. This is because the coordinates we give for the target and baseline variants are the coordinates used for counting. The true coordinates for the CNV variant (encoded as the locus here) can be different from the target or the baseline (say, in the example of an overlapping CNV). So we will need to specify them in the user input.
@@ -194,7 +194,7 @@ void CnvVariantJsonWriter::visit(CnvVariantFindings& cnvFindings) | |||
record_.clear(); | |||
record_["VariantId"] = locusSpec_.locusId(); | |||
record_["VariantType"] = "CNV"; | |||
record_["ReferenceRegion"] = encode(contigInfo_, locusSpec_.locusLocation()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we need the true coordinates of the CNV, which is taken from the catalog input.
|
||
const CnvLocusSubtype& locusSubtype() const { return locusSubtype_; } | ||
std::vector<GenomicRegion> regionsWithReads() const; | ||
const CnvType& locusSubtype() const { return cnvType_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename locusSubtype to cnvType?
{ | ||
using VariantTuple = std::tuple<int32_t, int64_t, int64_t, LocusIdAndVariantId>; | ||
std::vector<VariantTuple> tuples; | ||
|
||
for (const auto& locusIdAndFindings : sampleFindings_) | ||
{ | ||
const string& locusId = locusIdAndFindings.first; | ||
const LocusSpecification& locusSpec = *(regionCatalog_.at(locusId)); | ||
// TO DO: add CNVs | ||
if (locusSpec.locusType() == LocusType::kGraph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNVVariantFindings are actually one per locus and go by locus id, so looking for locus id in variant ids will not be successful. I guess we can return a locusId-locusId pair for CNVs here, since by now we only need locus-level information, not variant-level information.
After discussion with @egor-dolzhenko, we decided to add an "OutputVariants" field to the catalog so variant findings can still go by variant instead of locus. Will merge this branch now. |
This pull request aims to streamline locus description and catalog loading process.