-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
At least one NUCmer comparison failed. Please investigate (exiting) #445
Comments
Ah, in your case I think this is the key line in the error output:
There was a problem with the bioconda mummer package on macOS bioconda/bioconda-recipes#28209 - since resolved. But you seem to be on Linux? Anyway, try getting mummer installed manually first... |
(Looking at this prompted me to suggest #446, but that's macOS specific) |
I did see that, but it did not make its way through my synapses. You are right of course. Still, how do I find out which version is the right one? "Anyway, try getting mummer installed manually first..." It is and it was. conda list | grep mummer |
When you run |
nucmer -h USAGE: nucmer [options] DESCRIPTION: MANDATORY: OPTIONS:
nucmer -V |
That rules out the same issue as was the case for macOS, where the call to the underlying Perl binary was hardcoded to an unavailable path within the Now that we know |
Yes, there is. A directory for each genome containing empty (0 bytes) files of all other genomes. |
How closely-related do you expect the genomes to be? If they are too distantly-related, then |
This might be the reason. Most genomes come from Alignment-free genome distance estimation against the NCBI RefSeq representative genome database or 16S rRNA MegaBlast. I picked the ones which are most closely related. They are probably quite closely related but the quality of the alignment might be the issue. I do have a contig the same size of the genome though. I'll retry with a smaller subset. |
I tested 6 genomes from ncbi with the error as above. All of them were from taxon 1390. No test strain. This taxon is most common among assemblies I'm using. I can't download whole taxon because of issue 444. |
Have you tried one of the documented example https://widdowquinn.github.io/pyani/#walkthrough-a-first-analysis or https://github.com/widdowquinn/pyani/tree/master/tests/test_input/subcmd_anim from the test suite where we know the nucmer comparisons work? |
Sorry, still the same error pyani anim -i subcmd_anim -o subcmd_anim/results -v -l test1_anim.log --name "test 1" |
Progress of a kind. It doesn't seem to be the genomes themselves (an outlier in the dataset can cause nucmer failures), but more likely a problem with nucmer on your system. Have you every used Does this work for you?:
The paths will differ depending on where your files are. For me: ❯ nucmer --version
nucmer
NUCmer (NUCleotide MUMmer) version 3.1
❯ which nucmer
/Users/peterjc/miniforge3/bin/nucmer |
When I run your code from the pyani_env nucmer -p /tmp/test-case --maxmatch GCF_000011745.1_ASM1174v1_genomic.fna GCF_000043285.1_ASM4328v1_genomic.fna USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options] Try '/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h' for more information. The nucmer.error file contains I also tried with the environment deactivated and with the direct path to nucmer but with the same result. |
Progress. It looks like the ❯ file /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: Mach-O 64-bit executable arm64
❯ /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h
USAGE: /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc [options] <reference>
-h display help information
Input is one multi-fasta sequence file.
Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences. If you don't get something similar, the first thing I would try to fix this is reinstalling mummer:
|
Reinstalling mummer does not fix it. This is what conda installed |
And what does trying to run |
Sorry forgot to say that I get something similar but reinstalling did not fix it. file /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options] -h display help information Input is one multi-fasta sequence file. |
That matches our x86-64 Linux machine: $ file ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped $ ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h
USAGE: /home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc [options] <reference>
-h display help information
Input is one multi-fasta sequence file.
Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences. I am running out of ideas, but one thing which I know trips up mummer is atypical characters in paths and filenames. For example, spaces break it - and things like accented characters, emoji, or even some punctuation are also potential trouble. Does your home directory, project directory, or system temp directory have any spaces etc? |
It does, I took this project over from a windows user. Will retry after removing spaces. |
After removing spaces, it completed the test data without problems. Thanks a lot I really appreciate your time and efforts. Now back to my original dataset. |
My own input data also finished without problems. |
Hurray. It would be possible for pyANI to work around this limitation, but rather a lot of work. As a stop gap, checking for spaces at the start and aborting with a clear message would be much better for the user experience. |
FWIW we do handle this better in the new version of |
Sorry for bringing up old issues. This seems related to this issue.
413
The fix from this issue does not work anymore.
mamba install mummer=3.23=h589c0e0_12 -y
Looking for: ['mummer==3.23=h589c0e0_12']
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/497deca9.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/09cdf8bf.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/ffeee55f.json" was modified by another program
bioconda/linux-64 (check zst) Checked 0.1s
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/2a957770.json" was modified by another program
bioconda/noarch (check zst) Checked 0.0s
bioconda/linux-64 5.0MB @ 5.7MB/s 0.9s
bioconda/noarch 4.7MB @ 3.7MB/s 1.3s
conda-forge/noarch 18.7MB @ 9.4MB/s 2.0s
conda-forge/linux-64 41.4MB @ 16.7MB/s 2.6s
Pinned packages:
Could not solve for environment specs
The following package could not be installed
└─ mummer ==3.23 h589c0e0_12 does not exist (perhaps a typo or a missing channel).
but this could also be something entirely different.
I have a bunch of genomes I want to compare.
So, what I did
I created the database
pyani createdb -v -l pyANI_create_db.log
indexed my genomes.
pyani index -i genomes
and then I run
pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim"
--labels genomes/labels.txt --classes genomes/classes.txt
Except for the test_strain.fasta file, I downloaded all genomes manually from ncbi. They came up in other analysis to be related to my test strain. The assembly of the test strain was done with flye in galaxy.
pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim" --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=PosixPath('genomes/classes.txt'), dbpath=PosixPath('.pyani/pyanidb'), debug=False, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x739c0036b280>, indir=PosixPath('genomes'), jobprefix='PYANI', labels=PosixPath('genomes/labels.txt'), logfile=PosixPath('Stammbewerung_anim.log'), maxmatch=False, name='Stammbewertung anim', nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('genomes/results'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /opt/miniforge3/envs/pyani_env/bin/pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name Stammbewertung anim --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: & Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: Analytical Methods, 8(1), 12–24. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 25:3389-3402
[INFO] [pyani.scripts.pyani_script]: Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: (2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.pyani_script]: fastANI: Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis K, and
[INFO] [pyani.scripts.pyani_script]: Aluru S (2018) 'High throughput ANI analysis of 90K prokaryotic
[INFO] [pyani.scripts.pyani_script]: genomes reveals clear species boundaries.' Nature Communications 9, 5114
[INFO] [pyani.scripts.pyani_script]: Checking for database file: .pyani/pyanidb
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Linux_3.1 (/opt/miniforge3/envs/pyani_env/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: Stammbewertung anim
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000008425.1_ASM842v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000015785.2_ASM1578v2_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000262045.1_KCTC_13613_01_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000009045.1_ASM904v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000195515.1_ASM19551v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000204275.1_ASM20427v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000221645.1_ASM22164v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000747705.1_ASM74770v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001587435.1_B425_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001672615.1_ASM167261v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001687185.1_ASM168718v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001705195.1_ASM170519v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001866745.1_ASM186674v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/test_strain.fasta.md5.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 441505.68it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...total pairwise comparisons to be performed: 182
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...after check, still need to run 182 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 24607.16it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Results not found for 182 comparisons; 182 new jobs built.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running jobs with multiprocessing
[ERROR] [pyani.scripts.subcommands.subcmd_anim]: At least one NUCmer comparison failed. Please investigate (exiting)
Traceback (most recent call last):
File "/opt/miniforge3/envs/pyani_env/bin/pyani", line 10, in
sys.exit(run_main())
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/pyani_script.py", line 143, in run_main
returnval = args.func(args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 296, in subcmd_anim
run_anim_jobs(joblist, args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 401, in run_anim_jobs
raise PyaniException("Multiprocessing run failed in ANIm")
pyani.PyaniException: Multiprocessing run failed in ANIm
Let me know if you need anything else.
Regards
Christoph
The text was updated successfully, but these errors were encountered: