-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[python] to_dataframe
does not produce sparse data frames
#808
Comments
Interesting. So, unlike Do you have a fix by chance? In [1]: import biom
In [2]: print(biom.example_table)
# Constructed from biom file
#OTU ID S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [3]: biom.example_table.to_dataframe()
Out[3]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [4]: biom.example_table.to_dataframe().info()
<class 'pandas.core.sparse.frame.SparseDataFrame'>
Index: 2 entries, O1 to O2
Data columns (total 3 columns):
S1 2 non-null float64
S2 2 non-null float64
S3 2 non-null float64
dtypes: float64(3)
memory usage: 64.0+ bytes
In [5]: biom.example_table.to_dataframe(dense=True)
Out[5]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [6]: biom.example_table.to_dataframe(dense=True).to_sparse()
Out[6]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [7]: biom.example_table.to_dataframe(dense=True).to_sparse().info
Out[7]:
<bound method DataFrame.info of S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0>
In [8]: biom.example_table.to_dataframe(dense=True).to_sparse().info()
<class 'pandas.core.sparse.frame.SparseDataFrame'>
Index: 2 entries, O1 to O2
Data columns (total 3 columns):
S1 2 non-null float64
S2 2 non-null float64
S3 2 non-null float64
dtypes: float64(3)
memory usage: 64.0+ bytes
In [9]: biom.example_table.to_dataframe(dense=True).to_sparse().density
Out[9]: 1.0 |
I think you would just have to set |
That would be wonderful, thank you! |
Merged
2 tasks
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Hi,
I noticed that the
pandas.SparseDataFrame
returned byTable.to_dataframe
is not really sparse. For instance for the American Gut data:This is basically the memory use of the full table including zeros. Also the densities of the original table and the
SparseDataTable
are pretty different (~0% vs 100%).The text was updated successfully, but these errors were encountered: