Correlation distances in kmerdb #95
MatthewRalston
started this conversation in
Show and tell
Replies: 1 comment
-
The formula for the correlation coefficient is very common. It's essentially a summation of very small float64 deviations, and they indeed converge to 1 when I correlate related datasets or species, depending on the resolution of k afforded by the profile. This float64 (or better? It's in the Cython source what the data type is) summation is tricky and ugly, but it is how individual implementations produce their coefficients via the correct formulation. r = ssxy/(np.sqrt(ssxx*ssyy)) |
Beta Was this translation helpful? Give feedback.
0 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
It should be noted that there are several correlation coefficients available to interpret k-mer signature similarities...
The first and perhaps most useful is the Spearman correlation coefficient, implemented through SciPy. The second is the correlation coefficient, not the R2 or any of it's adjusted derivatives, but the true Pearson correlation coefficient. The implementation is custom in Cython and may be inspected in the source. Other SciPy and statsmodels coefficients are available distances.
Beta Was this translation helpful? Give feedback.
All reactions