-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
NA ambiguous in recursive_dbscan #349
Comments
I'm getting the same error in several metagenomes - 4 out of 5 metagenomes failed with this message. The fifth one had no markers and so was similarly killed at the binning step. The work-around suggested seems reasonable to me, but I'm still curious how a completeness of NA pops up in the first place? |
I spent a lot of time today debugging the PRs: Autometa/autometa/binning/recursive_dbscan.py Lines 190 to 191 in 3ae76dc
It seems like the "failed to recover clusters" error only occurs after this modification so I think it might be masking the real issue (ie the NAs are a clue that something changed upstream?) It's probably going to take comparing the intermediate results/DFs when using both pandas 1.5 and 2.1 I rebased dev onto main and created a new branch that has the biopython changes and pandas pinned to 1.5, feel free to work off that branch Sort of related: there is a tests/environment.yml that the unit test runs on (if using the Makefile). IMO I think this needs to go away and it should only pull from the main ./autometa-env.ymlfile and then pip install pytest things within the make command |
related: #350 |
I am working with autometa 2.2.2 and have the same error autometa-binning Conda list gives this Name Version Build Channel_libgcc_mutex 0.1 conda_forge conda-forge |
There's some general issues throughout Autometa (I don't know how pervasive) where recent changes to Pandas could cause issues. The issue mentioned here appears to be when a recursive dbscan iteration comes up with no clusters. A fix in is in progress and a no-promises fix can be installed in the interim via pip: For devs: Part of the issue is Pandas changed how NAs are handled, and this project isn't the only that's had issues, https://pandas.pydata.org/docs/user_guide/missing_data.html#na-in-a-boolean-context I found at least one case where div by 0 coerces np.nan and these are then mixed in a dataframe with pd.NA which may cause issues. The whole code base may need to be checked |
@imonteroo, just wanted to reach out because you seem to be in active use. No promises but you can try the interim install in the comment above |
@chasemc Thank you for your advise. You are rigth when you say tha I am in active use of autometa and I do not use any cluster. That could be the problem. Unfortunately the error keeps after install hotfix-pandas-na. Well, a bit different autometa-binning --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv --clustering-method dbscan --completeness 20 --purity 95 --cov-stddev-limit 25 --gc-stddev-limit 5 --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv --starting-rank superkingdom --rank-filter superkingdom --rank-name-filter bacteria
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/11/2024 11:10:26 AM INFO] root: Selected clustering method: dbscan
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
sys.exit(main())
^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
main_out = taxon_guided_binning(
^^^^^^^^^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
clusters_df = get_clusters(
^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
clustered_df, unclustered_df = clusterer(
^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
if median_completeness >= best_median:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous |
It looks like the Make sure to do the e.g. conda activate /media/microviable/d/miniconda3/envs/autometa_env
pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
|
I did, but nothing better (autometa_env) microviable@microviable:~$ pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Collecting git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Cloning https://github.com/KwanLab/Autometa.git (to revision hotfix-pandas-na) to /tmp/pip-req-build-12x6zl4f
Running command git clone --filter=blob:none --quiet https://github.com/KwanLab/Autometa.git /tmp/pip-req-build-12x6zl4f
Running command git checkout -b hotfix-pandas-na --track origin/hotfix-pandas-na
Cambiado a nueva rama 'hotfix-pandas-na'
Rama 'hotfix-pandas-na' configurada para hacer seguimiento a la rama remota 'hotfix-pandas-na' de 'origin'.
Resolved https://github.com/KwanLab/Autometa.git to commit f7f99ea7d9c644e7fd963a5b00e7b3a3618de1c1
Preparing metadata (setup.py) ... done
(autometa_env) microviable@microviable:~$ autometa-binning --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv --clustering-method dbscan --completeness 20 --purity 95 --cov-stddev-limit 25 --gc-stddev-limit 5 --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv --starting-rank superkingdom --rank-filter superkingdom --rank-name-filter bacteria
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/12/2024 09:09:52 AM INFO] root: Selected clustering method: dbscan
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
sys.exit(main())
^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
main_out = taxon_guided_binning(
^^^^^^^^^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
clusters_df = get_clusters(
^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
clustered_df, unclustered_df = clusterer(
^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
if median_completeness >= best_median:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous |
My bad, the package version isn't bumped in the branch yet so you need to add If the install is successful
should show
and not:
|
Thank you so much. It works |
It should be fixed in the latest update v2.2.3 #361 |
Autometa/autometa/binning/recursive_dbscan.py
Line 190 in 5e3250c
I am getting this 'NA' error -
if I protect it. I think this will work but still testing, I assume getting a NA value means just skip it anyways??
The text was updated successfully, but these errors were encountered: