Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

should we store stats as YAML (or json) #79

Closed
sergpolly opened this issue Jan 31, 2020 · 4 comments
Closed

should we store stats as YAML (or json) #79

sergpolly opened this issue Jan 31, 2020 · 4 comments

Comments

@sergpolly
Copy link
Member

that's how we store stats now:

total_mapped    2189618376
total_nodups    1753432070
cis     1533122797
...
pair_types/WW   88884
pair_types/MU   404456330
...
cis_1kb+        998076606
cis_2kb+        836035718
...
chrom_freq/chr1/chr1    137125332
chrom_freq/chr1/chr10   1791283
...

it is hard to parse that and YAML would serve us just fine i believe - should we switch ?
would be useful for #78

@Phlya
Copy link
Member

Phlya commented Jan 31, 2020 via email

@sergpolly
Copy link
Member Author

things like pairs_type:

...
pair_types/WW   88884
pair_types/MU   404456330
...

imply nested structure - i.e. I would want to parse it as

stats = {...,"pair_types": {"WW": 8884, "MU":40404000},...}

I'm not sure pandas would help with that

Also , for MultiQC - they don't want to rely on pandas for whatever reason - pandas isn't the smallest dependency I guess

@sergpolly
Copy link
Member Author

that's how we parse a typical stats file in the pairtools now: https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L263

with standard YAML - that is great for storing nested dicst, and various small lists it would simply look like:

import yaml

stats_dict = yaml.load("sample.nodups.stats.yml")

and here is the ultimate goal:
https://multiqc.info/
https://multiqc.info/examples/hi-c/multiqc_report.html

@agalitsyna
Copy link
Member

I updated pairtools stats output in yaml in version 1.0.0: https://github.com/open2c/pairtools/pull/117/files#diff-e4b8770efd538564222d48d69b00ed2c5012a76b35c926f1aba227fe45db2309

I guessed the best way to convert some fields, e.g. reporting chromosomes separated by slash instead of separate dict for each chromosome:

chrom_freq:
  chr1/chr1: 3
  chr1/chr2: 1
  chr2/chr3: 1

But this is minor and you may change it in the future.

@open2c open2c locked and limited conversation to collaborators Apr 20, 2022
@agalitsyna agalitsyna converted this issue into discussion #129 Apr 20, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants