Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enable JSON <-> YAML, JSON <-> binary conversion? #16

Open
julesjacobsen opened this issue Oct 27, 2021 · 8 comments
Open

Enable JSON <-> YAML, JSON <-> binary conversion? #16

julesjacobsen opened this issue Oct 27, 2021 · 8 comments

Comments

@julesjacobsen
Copy link
Collaborator

Currently the converter only handles JSON. Might be an idea to offer conversion of other formats too.

@pnrobinson
Copy link
Collaborator

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

@pnrobinson
Copy link
Collaborator

@julesjacobsen see new class DefaultPhenopacketIngestor. We could add some functions to this class such as public fromYamlFile(...) and DefaultPhenopacketIngestor(Message message). Thoughts?

@pnrobinson
Copy link
Collaborator

@ielis is this issue closable? I think this is supported for some operations

@ielis
Copy link
Collaborator

ielis commented Nov 29, 2022

In principle yes. Each command that reads or writes a phenopacket accepts/produces phenopacket, family, or cohort in any of these formats.
The commands have the -f | --format option for the input data. The convert command has the --output-format option for choosing the, well, output format.

We do not have a command solely for the format conversion (something similar to cat sample.bam | samtools view -S > file.sam). Implementing the command is a no-brainer, since we already have all the nuts and bolts. I just need some use case.

@andrewpatto
Copy link

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

Just revisiting this - is JSON the primary format for phenopackets? Is this written somewhere else?

I am trying to do some dataset sharing (ala EGA) - and was considering placing a phenopacket alongside each individuals' genomic artifacts. But I was assuming I needed that to be a protobuf file with some sort of known file suffix like pxf to be a primary format Phenopacket.

e.g.

ABC.bam
ABC.vcf
ABC.pxf

And so to that end - I was going to store some v2 JSON or YAML phenopackets for ease of editing - and then convert them over to protobuf using the CLI tool (so this is my +1 for the general feature of being able to convert between formats with just the CLI tool - which is currently not possible - convert requires the input to be v1 format)

But if JSON is the primary way we think phenopackets are to be exchanged in the wild - then I can skip using protobuf entirely.

Is there some suggested file naming conventions to let people know it is a phenopacket (in JSON)?

@andrewpatto
Copy link

I should add that I am starting via hand crafting some examples for a demonstration of how this would all work - hence the hand editing of JSON or YAML.

Obviously for a real system I would be translating from some clinical source like an EHR or Redcap or something and so I guess I would do that using the Java library and output easily whatever format choice I wanted.

I think the broader thought is still there - if I have unlimited choice here - what is the primary "phenopacket" file format and how should I name them to make this clear?

@pnrobinson
Copy link
Collaborator

Hi Andrew, there could be a lossless conversion from protobuf (binary), JSON, YAML, XML, SQL ... so there really isn't a primary format. My guess is that almost everybody would prefer JSON because of the tooling for JSON.

@andrewpatto
Copy link

In which case - having an tool that seamlessly converts between the formats might be useful (if I get a batch of phenopackets in protobuf but would prefer them in JSON) - I can just run the CLI tool to convert.. (rather than dusting off my java and writing a small snippet using the library to do the same)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants