Skip to content

Expose bcf_get_value_count on the record object #669

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
multimeric opened this issue Apr 20, 2018 · 4 comments
Open

Expose bcf_get_value_count on the record object #669

multimeric opened this issue Apr 20, 2018 · 4 comments
Assignees
Labels

Comments

@multimeric
Copy link

bcf_get_value_count seems to calculate how many values a given INFO/FMT field should have, on a given record. This is a useful feature, if you want to edit a VCF record but don't know how many values each field should have. Could this be added to the VariantRecord object, e.g. as VariantRecord#get_value_count(type: str, id: str)?

I'll probably try writing this myself at some point, but I wanted to keep track of this as an issue

@bioinformed
Copy link
Member

Good idea! It will be a few weeks before I have time to work on this, but it will be extremely useful.

@multimeric
Copy link
Author

Any chance of merging this?

@bioinformed
Copy link
Member

Having to specify the type of attribute is too clunky. It would be better if we added:

rec.info.value_count(tag)
rec.samples[i].value_count(tag)

with some attention paid to encoding of variable length fields as None?

Specifically, I'd also expect value_count(FMT/GT) to always be None, rather than returning the current cardinality, attempting to match other samples within the same record or other records. All of those other use cases can be handled by testing len(gt).

Does this make sense?

@multimeric
Copy link
Author

Totally agree with the interface of putting value_count inside the sample/info object. I forgot that they were actually C objects and not just dictionaries, and so can have methods.

I'm not sure I follow in terms of GT though. It's a field that always has a Number of R, effectively (although | or / separated, rather than comma separated). You can calculate the length of the field with len(gt), but I see no reason not to allow this function to cover all fields in a general way.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants