granularity #209

parlar · 2020-01-28T16:37:12Z

Hi,

I have not actually used chanjo for coverage reports but as I recall it provides reports of "completeness" on transcript or gene level.

Storing coverage data is tricky business since inclusion of too detailed information (per base) would quickly eat up a lot of space. However, some more granularity might still be useful for assessing the sequencing quality in different regions.

I have two questions.

Do you think it would be feasible to provide completeness info on exon-level? I made some quick tests with a WGS dataset using all exons for all ensembl transcripts and 4 completeness levels. The resulting data table amounted to 12 Mb compressed and 63 Mb uncompressed. Admittedly quite alot but it could reduced significantly more if, for example, CCDS was used instead. The size would also be reduced by using an SQL database if the data is sufficiently normalized.
In my mind, however, it would be a good thing if coverage data could be included directly into the scout system. But then it would also be convenient if data was stored in MongoDB, which though prevents the use JOINs and normalized data.

Do you have any thoughts on this?

adrosenbaum · 2020-02-03T09:33:05Z

Hello,

Our plan during the spring is to do some development on chanjo, and one goal would be to facilitate storage of more granular data, e.g. exons.

Did you do the test using mongodb or sql-database? We have also been thinking of using mongodb as backend, however it would be necessary to assess the performance of this vs sql.

moonso · 2020-04-20T11:58:52Z

Hi @parlar , there is a PR for using a mongodb backend open in #202 , however to say when we have time to get somewhere with this is tricky at the moment. We are hiring some people now and we hope to start developing chanjo again soon. Can not give any time frame now unfortunately

parlar · 2020-04-21T06:24:38Z

Thanks!

…

On Mon, Apr 20, 2020 at 1:59 PM Måns Magnusson ***@***.***> wrote: Hi @parlar <https://github.com/parlar> , there is a PR for using a mongodb backend open in #202 <#202> , however to say when we have time to get somewhere with this is tricky at the moment. We are hiring some people now and we hope to start developing chanjo again soon. Can not give any time frame now unfortunately — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#209 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABYQ2ULS4WUV52OVP4KKBGLRNQ2IXANCNFSM4KMVRIMA> .

-- Pär Larsson, PhD Clinical scientist, Bioinformatician Laboratory Medicine, Clinical Genetics / Pathology Umeå University Hospital 901 87 Umeå par.g.larsson@vll.se par.larsson@medbio.umu.se +46 90 785 2802

moonso added the question label Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

granularity #209

granularity #209

parlar commented Jan 28, 2020

adrosenbaum commented Feb 3, 2020

moonso commented Apr 20, 2020

parlar commented Apr 21, 2020 via email

granularity #209

granularity #209

Comments

parlar commented Jan 28, 2020

adrosenbaum commented Feb 3, 2020

moonso commented Apr 20, 2020

parlar commented Apr 21, 2020 via email