Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Perf tweaks #1

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
23ad7ae
PERF: more direct use of numpy
wasade Jul 11, 2016
2a5d4fe
Adding license
mortonjt Jul 16, 2016
2721bf4
Merge pull request #9 from mortonjt/license.txt
mortonjt Jul 16, 2016
b979d41
Last touches before release
mortonjt Jul 16, 2016
1be0fe7
Merge pull request #10 from mortonjt/license.txt
mortonjt Jul 16, 2016
362fe9e
ENH: Adding util functions
mortonjt Jul 19, 2016
a1b9cef
STY: pep8
mortonjt Jul 19, 2016
2908c93
STY: Flake8 to the rescue
mortonjt Jul 19, 2016
e2e2e63
STY: Addressing @antgonza and @josenavas comments
mortonjt Jul 19, 2016
d6a037b
STY: pep8 flake8
mortonjt Jul 19, 2016
38b64c8
GPL license
mortonjt Jul 19, 2016
1cbc6ad
Adding changelog
mortonjt Jul 19, 2016
f79d9a6
Merge pull request #13 from mortonjt/license.txt
antgonza Jul 19, 2016
d197605
DOC: Changing method name
mortonjt Jul 19, 2016
2ded931
STY: clean up pep8/flake8
mortonjt Jul 19, 2016
a54d051
Merge branch 'master' of https://github.com/biocore/gneiss into util
mortonjt Jul 20, 2016
477d196
DOC: Updating changelog
mortonjt Jul 20, 2016
e0c7505
ENH: Adding in niche sorting algorithm
mortonjt Jul 25, 2016
3d9ded6
TST: Adding in length and nan tests
mortonjt Jul 25, 2016
4c8d41d
DOC: typo
mortonjt Jul 25, 2016
371cc16
ENH: Adding inplace option
mortonjt Jul 25, 2016
6d70cf9
STY: Code dereplication
mortonjt Jul 26, 2016
9003db8
DOC: cleaning up docstrings
mortonjt Jul 26, 2016
b049ced
pep8
mortonjt Jul 26, 2016
ff3df22
Merge pull request #12 from mortonjt/util
antgonza Jul 26, 2016
0ed7267
Merge branch 'master' of https://github.com/biocore/gneiss into niche…
mortonjt Jul 26, 2016
ac1832d
FIX: Fixing ordering
mortonjt Jul 26, 2016
42221f5
pep8
mortonjt Jul 26, 2016
23acedf
Adding @josenavas and @antgonza comments
mortonjt Jul 27, 2016
940d7e8
TST: Adding callable error
mortonjt Jul 27, 2016
7ee169c
DOC: Adding raises section
mortonjt Jul 27, 2016
0168b42
DOC: Clarifying formulat in `mean_niche_estimator`
mortonjt Jul 27, 2016
c9e2373
DOC: Correcting formula
mortonjt Jul 27, 2016
8f2d1a7
Merge pull request #16 from mortonjt/nichesort
antgonza Jul 27, 2016
c51b215
Updating changelog
mortonjt Jul 28, 2016
22cbcc3
Merge pull request #19 from mortonjt/nichesort
josenavas Jul 28, 2016
6a247a7
Typo in ipython notebook
mortonjt Jul 29, 2016
f13954d
Fixing typo in ipython notebook
mortonjt Jul 29, 2016
452d62c
Adding another test
mortonjt Aug 4, 2016
ecc88d2
Merge branch 'master' of https://github.com/biocore/gneiss into perf_…
mortonjt Aug 4, 2016
475b96f
Adding additional test cases
mortonjt Aug 4, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# gneiss changelog

## Version 0.0.2 (changes since 0.0.2 go here)

### Features
* * Adding in a niche sorting algorithm `gneiss.sort.niche_sort` that can generate a band table given a gradient [#16](https://github.com/biocore/gneiss/pull/16)
* Adding in utility functions for handing feature tables, metadata, and trees. [#12](https://github.com/biocore/gneiss/pull/12)
* Adding GPL license.

### Bug fixes
674 changes: 674 additions & 0 deletions COPYING.txt

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions gneiss/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2016--, gneiss development team.
#
# Distributed under the terms of the Modified BSD License.
# Distributed under the terms of the GPLv3 License.
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------

from __future__ import absolute_import, division, print_function


__version__ = "0.0.1"
__version__ = "0.0.2"
134 changes: 79 additions & 55 deletions gneiss/balances.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,37 @@
def _balance_basis(tree_node):
""" Helper method for calculating balance basis
"""
counts, n_tips = _count_matrix(tree_node)
counts = OrderedDict([(x, counts[x])
for x in counts.keys() if not x.is_tip()])
nds = counts.keys()
r = np.array([counts[n]['r'] for n in nds])
s = np.array([counts[n]['l'] for n in nds])
k = np.array([counts[n]['k'] for n in nds])
t = np.array([counts[n]['t'] for n in nds])
# TODO: use recarray
# col 0 -> right counts
# col 1 -> left counts
# col 2 -> k
# col 3 -> t
r_idx = 0
l_idx = 1
k_idx = 2
t_idx = 3

counts, n_tips, n_nodes = _count_matrix(tree_node)
r = counts[:, r_idx]
s = counts[:, l_idx]
k = counts[:, k_idx]
t = counts[:, t_idx]

a = np.sqrt(s / (r*(r+s)))
b = -1*np.sqrt(r / (s*(r+s)))

basis = np.zeros((n_tips-1, n_tips))
for i in range(len(nds)):
basis[i, :] = np.array([0]*k[i] + [a[i]]*r[i] + [b[i]]*s[i] + [0]*t[i])
# Make sure that the basis is in level order
basis = basis[:, ::-1]
nds = list(nds)
return basis, nds
for i in np.arange(n_nodes - n_tips, dtype=int):
v = basis[i]

k_i = n_tips - k[i]
r_i = k_i - r[i]
s_i = r_i - s[i]

v[r_i:k_i] = a[i]
v[s_i:r_i] = b[i]

return basis, [n for n in tree_node.levelorder() if not n.is_tip()]


def balance_basis(tree_node):
Expand Down Expand Up @@ -90,51 +102,63 @@ def balance_basis(tree_node):


def _count_matrix(treenode):
n_tips = 0
nodes = list(treenode.levelorder(include_self=True))
# fill in the Ordered dictionary. Note that the
# elements of this Ordered dictionary are
# dictionaries.
counts = OrderedDict()
columns = ['k', 'r', 'l', 't', 'tips']
for n in nodes:
if n not in counts:
counts[n] = {}
for c in columns:
counts[n][c] = 0

# fill in r and l. This is done in reverse level order.
for n in nodes[::-1]:
node_count = 0
for n in treenode.postorder(include_self=True):
node_count += 1
if n.is_tip():
counts[n]['tips'] = 1
n_tips += 1
elif len(n.children) == 2:
lchild = n.children[0]
rchild = n.children[1]
counts[n]['r'] = counts[rchild]['tips']
counts[n]['l'] = counts[lchild]['tips']
counts[n]['tips'] = counts[n]['r'] + counts[n]['l']
n._tip_count = 1
else:
raise ValueError("Not a strictly bifurcating tree!")

# fill in k and t
for n in nodes:
if n.parent is None:
counts[n]['k'] = 0
counts[n]['t'] = 0
continue
elif n.is_tip():
try:
left, right = n.children
except:
raise ValueError("Not a strictly bifurcating tree!")
n._tip_count = left._tip_count + right._tip_count

# TODO: use recarray
# col 0 -> right counts
# col 1 -> left counts
# col 2 -> k
# col 3 -> t
r_idx = 0
l_idx = 1
k_idx = 2
t_idx = 3
counts = np.zeros((node_count, 4), dtype=int)

for i, n in enumerate(treenode.levelorder(include_self=True)):
if n.is_tip():
continue
# left or right child
# left = 0, right = 1
child_idx = 'l' if n.parent.children[0] != n else 'r'
if child_idx == 'l':
counts[n]['t'] = counts[n.parent]['t'] + counts[n.parent]['l']
counts[n]['k'] = counts[n.parent]['k']

n._lo_idx = i
node_counts = counts[i]

node_counts[r_idx] = 1 if n.is_tip() else n.children[1]._tip_count
node_counts[l_idx] = 1 if n.is_tip() else n.children[0]._tip_count

if n.is_root():
k = 0
t = 0
else:
counts[n]['k'] = counts[n.parent]['k'] + counts[n.parent]['r']
counts[n]['t'] = counts[n.parent]['t']
return counts, n_tips
parent_counts = counts[n.parent._lo_idx]
if n is n.parent.children[0]:
#t = parent_counts[t_idx] + parent_counts[l_idx]
#k = parent_counts[k_idx]

k = parent_counts[k_idx] + parent_counts[r_idx]
t = parent_counts[t_idx]
else:
#k = parent_counts[k_idx] + parent_counts[r_idx]
#t = parent_counts[t_idx]

k = parent_counts[k_idx]
t = parent_counts[t_idx] + parent_counts[l_idx]

node_counts[k_idx] = k
node_counts[t_idx] = t

counts[i] = node_counts

return counts, treenode._tip_count, node_count


def _attach_balances(balances, tree):
Expand Down
8 changes: 8 additions & 0 deletions gneiss/layouts.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2016--, gneiss development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------

from ete3 import faces, AttrFace, CircleFace, BarChartFace


Expand Down
116 changes: 116 additions & 0 deletions gneiss/sort.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2016--, gneiss development team.
#
# Distributed under the terms of the GPLv3 License.
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------
import numpy as np
import pandas as pd
from functools import partial
from gneiss.util import match


def mean_niche_estimator(abundances, gradient):
""" Estimates the mean niche of an organism.

Calculates the mean niche of an organism along a gradient.
This is done by calculating the expected value of an organism
across the gradient.

Specifically, this module calculates the following

.. math::
E[g | x] =
\sum\limits_{i=1}^N g_i \frac{x_i}{\sum\limits_{j=1}^N x_j}

Where :math:`N` is the number of samples, :math:`x_i` is the proportion of
species :math:`x` in sample :math:`i`, :math:`g_i` is the gradient value
at sample `i`.

Parameters
----------
abundances : pd.Series, np.float
Vector of fraction abundances of an organism over
a list of samples.
gradient : pd.Series, np.float
Vector of numerical gradient values.

Returns
-------
np.float :
The mean gradient that the organism lives in.

Raises
------
ValueError:
If the length of `abundances` is not the same length as `gradient`.
ValueError:
If the length of `gradient` contains nans.
"""
len_abundances = len(abundances)
len_gradient = len(gradient)
if len_abundances != len_gradient:
raise ValueError("Length of `abundances` (%d) doesn't match the length"
" of the `gradient` (%d)" % (len_abundances,
len_gradient))
if np.any(pd.isnull(gradient)):
raise ValueError("`gradient` cannot have any nans.")

# normalizes the proportions of the organism across all of the
# samples to add to 1.
v = abundances / abundances.sum()
m = np.dot(gradient, v)
return m


def niche_sort(table, gradient, niche_estimator=mean_niche_estimator):
""" Sort the table according to estimated niches.

Sorts the table by samples along the gradient
and otus by their estimated niche along the gradient.

Parameters
----------
table : pd.DataFrame
Contingency table where samples are rows and
features (i.e. OTUs) are columns.
gradient : pd.Series
Vector of numerical gradient values.
niche_estimator : function, optional
A function that takes in two pandas series and returns an ordered
object. The ability for the object to be ordered is critical, since
this will allow the table to be sorted according to this ordering.
By default, `mean_niche_estimator` will be used.

Returns
-------
pd.DataFrame :
Sorted table according to the gradient of the samples, and the niches
of the organisms along that gradient.

Raises
------
ValueError :
Raised if `niche_estimator` is not a function.
"""
if not callable(niche_estimator):
raise ValueError("`niche_estimator` is not a function.")

table, gradient = match(table, gradient)

niche_estimator = partial(niche_estimator,
gradient=gradient)

# normalizes feature abundances to sum to 1, for each sample.
# (i.e. scales values in each row to sum to 1).
normtable = table.apply(lambda x: x/x.sum(), axis=1)

# calculates estimated niche for each feature
est_niche = normtable.apply(niche_estimator, axis=0)
gradient = gradient.sort_values()
est_niche = est_niche.sort_values()

table = table.reindex(index=gradient.index,
columns=est_niche.index)
return table
1 change: 1 addition & 0 deletions gneiss/tests/data/large_tree2.nwk

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions gneiss/tests/data/small_tree.nwk
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(O0:0.316212845735,(O1:0.0544249673249,((((O8:6.17283950617e-05,O9:6.17283950617e-05):0.000396077412446,(O6:0.00015943877551,O7:0.00015943877551):0.000298367031998):0.00183128228143,(O4:0.000555555555556,O5:0.000555555555556):0.00173353253338):0.0105194882737,(O2:0.00347222222222,O3:0.00347222222222):0.00933635414042):0.0416163909622):0.26178787841);
Loading