-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
granularity #209
Labels
Comments
Hello, Our plan during the spring is to do some development on chanjo, and one goal would be to facilitate storage of more granular data, e.g. exons. Did you do the test using mongodb or sql-database? We have also been thinking of using mongodb as backend, however it would be necessary to assess the performance of this vs sql. |
Thanks!
…On Mon, Apr 20, 2020 at 1:59 PM Måns Magnusson ***@***.***> wrote:
Hi @parlar <https://github.com/parlar> , there is a PR for using a
mongodb backend open in #202
<#202> , however to say
when we have time to get somewhere with this is tricky at the moment. We
are hiring some people now and we hope to start developing chanjo again
soon. Can not give any time frame now unfortunately
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYQ2ULS4WUV52OVP4KKBGLRNQ2IXANCNFSM4KMVRIMA>
.
--
Pär Larsson, PhD
Clinical scientist, Bioinformatician
Laboratory Medicine, Clinical Genetics / Pathology
Umeå University Hospital
901 87 Umeå
par.g.larsson@vll.se
par.larsson@medbio.umu.se
+46 90 785 2802
|
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Hi,
I have not actually used chanjo for coverage reports but as I recall it provides reports of "completeness" on transcript or gene level.
Storing coverage data is tricky business since inclusion of too detailed information (per base) would quickly eat up a lot of space. However, some more granularity might still be useful for assessing the sequencing quality in different regions.
I have two questions.
Do you think it would be feasible to provide completeness info on exon-level? I made some quick tests with a WGS dataset using all exons for all ensembl transcripts and 4 completeness levels. The resulting data table amounted to 12 Mb compressed and 63 Mb uncompressed. Admittedly quite alot but it could reduced significantly more if, for example, CCDS was used instead. The size would also be reduced by using an SQL database if the data is sufficiently normalized.
In my mind, however, it would be a good thing if coverage data could be included directly into the scout system. But then it would also be convenient if data was stored in MongoDB, which though prevents the use JOINs and normalized data.
Do you have any thoughts on this?
The text was updated successfully, but these errors were encountered: