Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Assess_assembly Documentation #30

Open
nhartwic opened this issue Mar 9, 2019 · 5 comments
Open

Assess_assembly Documentation #30

nhartwic opened this issue Mar 9, 2019 · 5 comments

Comments

@nhartwic
Copy link

nhartwic commented Mar 9, 2019

It would be really nice if the documentation provided some explanation of the output files. Specifically these two table types.

Percentage Errors

name mean q10 q50 q90
err_ont 1.609% 0.731% 1.197% 3.830%
err_bal 1.621% 0.734% 1.204% 3.915%
iden 0.406% 0.109% 0.247% 1.152%
del 0.434% 0.196% 0.290% 1.017%
ins 0.783% 0.379% 0.557% 1.756%

Q Scores

name mean q10 q50 q90
err_ont 17.94 21.36 19.22 14.17
err_bal 17.90 21.34 19.19 14.07
iden 23.91 29.61 26.07 19.39
del 23.63 27.07 25.37 19.93
ins 21.07 24.22 22.54 17.55

"ins" and "del" are straightforward. What are "iden", "err_ont", and "err_bal" errors. How are these q scores being computed and what do they represent in this context?

@changhan1110
Copy link

I have the same questions. If you have any answers, please let me know.

@cjw85
Copy link
Member

cjw85 commented Oct 29, 2019

iden measures the proportion of aligned (non-indel) bases which are "identical" to their reference base; it is the substitution rate.

err_ont and err_bal both measure total error (substitutions, insertions, and mismatches) contained within alignments. They differ in the divisor used; the former divides the error count by the alignment length, while the latter divides by the reference span. We use exclusively the former (hence the "ont" suffix), the latter was added as it is preferred by some users.

The Qscores are simply the log transform (-10*log10[1 - p]) of the error rates.

@bktorrevillas
Copy link

One further question regarding column 3-4 of assm_stats.txt, what is the difference between coverage and ref_coverage, and what do these values indicate? Thanks very much. Great package!

@cjw85
Copy link
Member

cjw85 commented Mar 4, 2021

coverage measures the proportion of the assembly contig that is covered by the alignment, whereas ref_coverage measures the same for the reference sequence.

@bktorrevillas
Copy link

Thank you @cjw85, cheers!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants