-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
About gray lines... #51
Comments
At the most basic level, grey links denote the start and end of link bundles. Even the large regions have grey lines at the start and end points of contiguous regions (though it may be hard to see). In theory, the tiny grey links could have colour in them too, but are so small the colour is not rendered. The grey lines are always rendered for every region even if very small according to link settings. Interpreting these links really depends on the link settings ( In terms of text-formated information to analyze links statistically, that's a little tougher. Mostly I'm not sure how informative it is as a metric as links are both joined together and filtered based on upstream settings. However, if you really want to, you can parse the Let me know if that makes sense. |
Appreciate your responding @JustinChu! Thanks for the tip, now that kinda makes sense to me, that those links could be affected due mixing raw sequencing data indeed. Assuming that, the first plot depicts our genome built with our own PE data and public ONT data (from SRA DB), while the second one using only our PE, so those ONT+PE-built scaffolds really shows that extra data may show us some unique genomic events (even though that we mix genomic data of two different plant lines of the same specie) or misassemblies, compared with reference genome. But the second plot doesn't show the same, because reference-based scaffolding was performed only using PE-derived contigs with very low length (max was approx 150k), so the genome is enough the same as the reference. Yet, thanks for help again! |
Interesting. So you have been doing reference guided scaffolding. If that is the case it makes sense why the results are so much cleaner on the second plot. Indeed the long read scaffolded assembly could show real variation/translocations. The algorithm jupiterplot performs to line up the scaffolds is essentially the same as a reference guided scaffolder. In its original conception, in an ideal case you don't want to use Jupiter plot with the same reference you scaffolded it with. However I suppose if your goal is to show that the long reads helped reduce the bias of the reference guided assembly I think you could make that argument. I would encourage looking into more de novo assembly methods in your assembly process and to additionally include reference free metrics of assembly correctness (e.g. tools that only use the reads to measure misassemblies/inconsistencies). |
Yeah, we've already performed the last thing you suggest, but only needed a compact graphical depiction of the raw numerical results, just for internal purposes to show what those metrics mean for people of different specialization. All the types of de novo assemblies (with practically all de novo assemblers based on short reads) were attempted, as well, but neither of them satisfy our goal, except MaSuRCA (bc only PE data is insufficient for adequate plant genome assembly and we lack any kind of TGS machine in our lab to date), so ref-based scaffolding is just a bit better option for, yet again, our purposes. Thanks a lot for your help and useful information and tips on our job! |
Greetings, I didn't find any official contacts, e.g. mailing, so writing my question here... First, thank you a lot for such user friendly developed software! Appreciate it!
Second... Our junior bioinformatics team have been attempted to assemble a large Wild Soybean genome with two different strategies, so JupiterPlot was one of the tools we exactly needed to assess those ones. All genomes were assembled to contigs and reference scaffolded and than compared to that one reference, only different subsets of data differentiate between them. I'd like to know, how could I interpret the gray lines in those plots: are those local missassmblies, or just genomic events/features? Can we say that different subsets of data (used in different attempts) have significant impact on that dramatical differences depicted on two plots? Is there any text-formated information on how those lines are produced to analyse it statistically?
Thanks!
P.S. parameter were changed are the g=10000 and ng=80


The text was updated successfully, but these errors were encountered: