Correlation distances in kmerdb #95

MatthewRalston · 2022-12-06T14:31:54Z

MatthewRalston
Dec 6, 2022
Maintainer

It should be noted that there are several correlation coefficients available to interpret k-mer signature similarities...

The first and perhaps most useful is the Spearman correlation coefficient, implemented through SciPy. The second is the correlation coefficient, not the R2 or any of it's adjusted derivatives, but the true Pearson correlation coefficient. The implementation is custom in Cython and may be inspected in the source. Other SciPy and statsmodels coefficients are available distances.

MatthewRalston · 2022-12-06T14:55:51Z

MatthewRalston
Dec 6, 2022
Maintainer Author

The formula for the correlation coefficient is very common. It's essentially a summation of very small float64 deviations, and they indeed converge to 1 when I correlate related datasets or species, depending on the resolution of k afforded by the profile. This float64 (or better? It's in the Cython source what the data type is) summation is tricky and ugly, but it is how individual implementations produce their coefficients via the correct formulation.

r = ssxy/(np.sqrt(ssxx*ssyy))
In kmerdb/distances.pyx

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlation distances in kmerdb #95

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Correlation distances in kmerdb #95

MatthewRalston Dec 6, 2022 Maintainer

Replies: 1 comment

MatthewRalston Dec 6, 2022 Maintainer Author

MatthewRalston
Dec 6, 2022
Maintainer

MatthewRalston
Dec 6, 2022
Maintainer Author