At least one NUCmer comparison failed. Please investigate (exiting) #445

ChristophKnapp · 2025-01-21T09:00:50Z

Sorry for bringing up old issues. This seems related to this issue.

The fix from this issue does not work anymore.

mamba install mummer=3.23=h589c0e0_12 -y

Looking for: ['mummer==3.23=h589c0e0_12']

warning libmamba Cache file "/opt/miniforge3/pkgs/cache/497deca9.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/09cdf8bf.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/ffeee55f.json" was modified by another program
bioconda/linux-64 (check zst) Checked 0.1s
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/2a957770.json" was modified by another program
bioconda/noarch (check zst) Checked 0.0s
bioconda/linux-64 5.0MB @ 5.7MB/s 0.9s
bioconda/noarch 4.7MB @ 3.7MB/s 1.3s
conda-forge/noarch 18.7MB @ 9.4MB/s 2.0s
conda-forge/linux-64 41.4MB @ 16.7MB/s 2.6s

Pinned packages:

python 3.8.*

Could not solve for environment specs
The following package could not be installed
└─ mummer ==3.23 h589c0e0_12 does not exist (perhaps a typo or a missing channel).

but this could also be something entirely different.

I have a bunch of genomes I want to compare.

So, what I did

I created the database

pyani createdb -v -l pyANI_create_db.log

indexed my genomes.

pyani index -i genomes

and then I run

pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim"
--labels genomes/labels.txt --classes genomes/classes.txt

Except for the test_strain.fasta file, I downloaded all genomes manually from ncbi. They came up in other analysis to be related to my test strain. The assembly of the test strain was done with flye in galaxy.

pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim" --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=PosixPath('genomes/classes.txt'), dbpath=PosixPath('.pyani/pyanidb'), debug=False, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x739c0036b280>, indir=PosixPath('genomes'), jobprefix='PYANI', labels=PosixPath('genomes/labels.txt'), logfile=PosixPath('Stammbewerung_anim.log'), maxmatch=False, name='Stammbewertung anim', nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('genomes/results'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /opt/miniforge3/envs/pyani_env/bin/pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name Stammbewertung anim --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: & Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: Analytical Methods, 8(1), 12–24. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 25:3389-3402
[INFO] [pyani.scripts.pyani_script]: Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: (2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.pyani_script]: fastANI: Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis K, and
[INFO] [pyani.scripts.pyani_script]: Aluru S (2018) 'High throughput ANI analysis of 90K prokaryotic
[INFO] [pyani.scripts.pyani_script]: genomes reveals clear species boundaries.' Nature Communications 9, 5114
[INFO] [pyani.scripts.pyani_script]: Checking for database file: .pyani/pyanidb
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Linux_3.1 (/opt/miniforge3/envs/pyani_env/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: Stammbewertung anim
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000008425.1_ASM842v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000015785.2_ASM1578v2_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000262045.1_KCTC_13613_01_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000009045.1_ASM904v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000195515.1_ASM19551v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000204275.1_ASM20427v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000221645.1_ASM22164v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000747705.1_ASM74770v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001587435.1_B425_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001672615.1_ASM167261v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001687185.1_ASM168718v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001705195.1_ASM170519v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001866745.1_ASM186674v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/test_strain.fasta.md5.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 441505.68it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...total pairwise comparisons to be performed: 182
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...after check, still need to run 182 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 24607.16it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Results not found for 182 comparisons; 182 new jobs built.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running jobs with multiprocessing
[ERROR] [pyani.scripts.subcommands.subcmd_anim]: At least one NUCmer comparison failed. Please investigate (exiting)
Traceback (most recent call last):
File "/opt/miniforge3/envs/pyani_env/bin/pyani", line 10, in
sys.exit(run_main())
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/pyani_script.py", line 143, in run_main
returnval = args.func(args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 296, in subcmd_anim
run_anim_jobs(joblist, args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 401, in run_anim_jobs
raise PyaniException("Multiprocessing run failed in ANIm")
pyani.PyaniException: Multiprocessing run failed in ANIm

Let me know if you need anything else.

Regards

Christoph

peterjc · 2025-01-21T09:21:00Z

Ah, in your case I think this is the key line in the error output:

mummer ==3.23 h589c0e0_12 does not exist (perhaps a typo or a missing channel).

There was a problem with the bioconda mummer package on macOS bioconda/bioconda-recipes#28209 - since resolved. But you seem to be on Linux?

Anyway, try getting mummer installed manually first...

peterjc · 2025-01-21T09:24:57Z

(Looking at this prompted me to suggest #446, but that's macOS specific)

ChristophKnapp · 2025-01-21T09:42:15Z

I did see that, but it did not make its way through my synapses. You are right of course. Still, how do I find out which version is the right one?

"Anyway, try getting mummer installed manually first..."

It is and it was.

conda list | grep mummer
mummer 3.23 pl5321h503566f_21 bioconda

widdowquinn · 2025-01-21T09:54:25Z

When you run nucmer -h what does it return?

ChristophKnapp · 2025-01-21T09:56:20Z

nucmer -h

USAGE: nucmer [options]

DESCRIPTION:
nucmer generates nucleotide alignments between two mutli-FASTA input
files. The out.delta output file lists the distance between insertions
and deletions that produce maximal scoring alignments between each
sequence. The show-* utilities know how to read this format.

MANDATORY:
Reference Set the input reference multi-FASTA filename
Query Set the input query multi-FASTA filename

OPTIONS:
--mum Use anchor matches that are unique in both the reference
and query
--mumcand Same as --mumreference
--mumreference Use anchor matches that are unique in in the reference
but not necessarily unique in the query (default behavior)
--maxmatch Use all anchor matches regardless of their uniqueness

-b|breaklen     Set the distance an alignment extension will attempt to
                extend poor scoring regions before giving up (default 200)
--[no]banded    Enforce absolute banding of dynamic programming matrix
                based on diagdiff parameter EXPERIMENTAL (default no)
-c|mincluster   Sets the minimum length of a cluster of matches (default 65)
--[no]delta     Toggle the creation of the delta file (default --delta)
--depend        Print the dependency information and exit
-D|diagdiff     Set the maximum diagonal difference between two adjacent
                anchors in a cluster (default 5)
-d|diagfactor   Set the maximum diagonal difference between two adjacent
                anchors in a cluster as a differential fraction of the gap
                length (default 0.12)
--[no]extend    Toggle the cluster extension step (default --extend)
-f
--forward       Use only the forward strand of the Query sequences
-g|maxgap       Set the maximum gap between two adjacent matches in a
                cluster (default 90)
-h
--help          Display help information and exit
-l|minmatch     Set the minimum length of a single match (default 20)
-o
--coords        Automatically generate the original NUCmer1.1 coords
                output file using the 'show-coords' program
--[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                extension reaches the end of a sequence, it will backtrack
                to optimize the alignment score instead of terminating the
                alignment at the end of the sequence (default --optimize)
-p|prefix       Set the prefix of the output files (default "out")
-r
--reverse       Use only the reverse complement of the Query sequences
--[no]simplify  Simplify alignments by removing shadowed clusters. Turn
                this option off if aligning a sequence to itself to look
                for repeats (default --simplify)
-V
--version       Display the version information and exit

nucmer -V
nucmer
NUCmer (NUCleotide MUMmer) version 3.1

widdowquinn · 2025-01-21T10:04:17Z

That rules out the same issue as was the case for macOS, where the call to the underlying Perl binary was hardcoded to an unavailable path within the nucmer script itself.

Now that we know nucmer is available and not immediately broken itself, a likely issue is that the nucmer comparisons are not writing output. Is there any output in the output directory at all?

ChristophKnapp · 2025-01-21T10:06:39Z

Yes, there is. A directory for each genome containing empty (0 bytes) files of all other genomes.

widdowquinn · 2025-01-21T10:09:23Z

How closely-related do you expect the genomes to be? If they are too distantly-related, then nucmer will not find any homologous regions, and that may give an empty output .filter file.

ChristophKnapp · 2025-01-21T10:18:13Z

This might be the reason. Most genomes come from Alignment-free genome distance estimation against the NCBI RefSeq representative genome database or 16S rRNA MegaBlast. I picked the ones which are most closely related.

They are probably quite closely related but the quality of the alignment might be the issue. I do have a contig the same size of the genome though.

I'll retry with a smaller subset.

ChristophKnapp · 2025-01-21T12:12:35Z

I tested 6 genomes from ncbi with the error as above. All of them were from taxon 1390. No test strain. This taxon is most common among assemblies I'm using. I can't download whole taxon because of issue 444.

peterjc · 2025-01-21T12:26:24Z

Have you tried one of the documented example https://widdowquinn.github.io/pyani/#walkthrough-a-first-analysis or https://github.com/widdowquinn/pyani/tree/master/tests/test_input/subcmd_anim from the test suite where we know the nucmer comparisons work?

ChristophKnapp · 2025-01-21T12:39:20Z

Sorry, still the same error

pyani anim -i subcmd_anim -o subcmd_anim/results -v -l test1_anim.log --name "test 1"
--labels subcmd_anim/labels.txt --classes subcmd_anim/classes.txt
[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=PosixPath('subcmd_anim/classes.txt'), dbpath=PosixPath('.pyani/pyanidb'), debug=False, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x71dba3d6b280>, indir=PosixPath('subcmd_anim'), jobprefix='PYANI', labels=PosixPath('subcmd_anim/labels.txt'), logfile=PosixPath('test1_anim.log'), maxmatch=False, name='test 1', nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('subcmd_anim/results'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /opt/miniforge3/envs/pyani_env/bin/pyani anim -i subcmd_anim -o subcmd_anim/results -v -l test1_anim.log --name test 1 --labels subcmd_anim/labels.txt --classes subcmd_anim/classes.txt
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: & Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: Analytical Methods, 8(1), 12–24. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 25:3389-3402
[INFO] [pyani.scripts.pyani_script]: Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: (2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.pyani_script]: fastANI: Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis K, and
[INFO] [pyani.scripts.pyani_script]: Aluru S (2018) 'High throughput ANI analysis of 90K prokaryotic
[INFO] [pyani.scripts.pyani_script]: genomes reveals clear species boundaries.' Nature Communications 9, 5114
[INFO] [pyani.scripts.pyani_script]: Checking for database file: .pyani/pyanidb
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Linux_3.1 (/opt/miniforge3/envs/pyani_env/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: test 1
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.md5.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 340078.70it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...total pairwise comparisons to be performed: 30
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...after check, still need to run 30 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 18685.64it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Results not found for 30 comparisons; 30 new jobs built.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running jobs with multiprocessing
[ERROR] [pyani.scripts.subcommands.subcmd_anim]: At least one NUCmer comparison failed. Please investigate (exiting)
Traceback (most recent call last):
File "/opt/miniforge3/envs/pyani_env/bin/pyani", line 10, in
sys.exit(run_main())
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/pyani_script.py", line 143, in run_main
returnval = args.func(args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 296, in subcmd_anim
run_anim_jobs(joblist, args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 401, in run_anim_jobs
raise PyaniException("Multiprocessing run failed in ANIm")
pyani.PyaniException: Multiprocessing run failed in ANIm

peterjc · 2025-01-21T12:53:03Z

Progress of a kind. It doesn't seem to be the genomes themselves (an outlier in the dataset can cause nucmer failures), but more likely a problem with nucmer on your system. Have you every used nucmer yourself?

Does this work for you?:

❯ cd tests/test_input/subcmd_anim
❯ nucmer -p /tmp/test-case --maxmatch GCF_000011745.1_ASM1174v1_genomic.fna GCF_000043285.1_ASM4328v1_genomic.fna
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "/tmp/test-case.ntref" of length 791655
# construct suffix tree for sequence of length 791655
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 7916 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 0.07
# reading input file "/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna" of length 705557
# matching query-file "/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna"
# against subject-file "/tmp/test-case.ntref"
# COMPLETETIME /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 0.22
# SPACE /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 1.45
4: FINISHING DATA

❯ head /tmp/test-case.delta
/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna /Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna
NUCMER
>NC_007292.1 NC_005061.1 791654 705557
67950 69177 59703 60930 196 196 0
808
-8
16
-8
90
-19

The paths will differ depending on where your files are. For me:

❯ nucmer --version
nucmer
NUCmer (NUCleotide MUMmer) version 3.1

❯ which nucmer
/Users/peterjc/miniforge3/bin/nucmer

ChristophKnapp · 2025-01-21T13:20:13Z

When I run your code from the pyani_env

nucmer -p /tmp/test-case --maxmatch GCF_000011745.1_ASM1174v1_genomic.fna GCF_000043285.1_ASM4328v1_genomic.fna
1: PREPARING DATA

USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options]

Try '/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h' for more information.
ERROR: prenuc returned non-zero

The nucmer.error file contains
20250121|141052| 47631| ERROR: prenuc returned non-zero

I also tried with the environment deactivated and with the direct path to nucmer but with the same result.

peterjc · 2025-01-21T13:29:23Z

Progress. It looks like the prenuc part of mummer used by nucmer is broken. On my system:

❯ file /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: Mach-O 64-bit executable arm64

❯ /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc  [options]  <reference>

-h     display help information

  Input is one multi-fasta sequence file.
  Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

If you don't get something similar, the first thing I would try to fix this is reinstalling mummer:

❯ conda uninstall mummer
...
❯ conda install mummer
...

ChristophKnapp · 2025-01-21T13:35:43Z

Reinstalling mummer does not fix it.

This is what conda installed
mummer bioconda/linux-64::mummer-3.23-pl5321h503566f_21
perl conda-forge/linux-64::perl-5.32.1-7_hd590300_perl5

peterjc · 2025-01-21T13:41:09Z

And what does trying to run prenuc directly reveal?

ChristophKnapp · 2025-01-21T13:41:30Z

Sorry forgot to say that I get something similar but reinstalling did not fix it.

file /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc
/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options]

-h display help information

Input is one multi-fasta sequence file.
Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

peterjc · 2025-01-21T13:46:49Z

That matches our x86-64 Linux machine:

$ file ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

$ ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc  [options]  <reference>

-h     display help information

  Input is one multi-fasta sequence file.
  Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

I am running out of ideas, but one thing which I know trips up mummer is atypical characters in paths and filenames. For example, spaces break it - and things like accented characters, emoji, or even some punctuation are also potential trouble.

Does your home directory, project directory, or system temp directory have any spaces etc?

ChristophKnapp · 2025-01-21T13:49:30Z

It does, I took this project over from a windows user. Will retry after removing spaces.

ChristophKnapp · 2025-01-21T13:58:48Z

After removing spaces, it completed the test data without problems.

Thanks a lot I really appreciate your time and efforts.

Now back to my original dataset.

ChristophKnapp · 2025-01-21T14:08:21Z

My own input data also finished without problems.

peterjc · 2025-01-21T14:23:55Z

Hurray. It would be possible for pyANI to work around this limitation, but rather a lot of work.

As a stop gap, checking for spaces at the start and aborting with a clear message would be much better for the user experience.

widdowquinn · 2025-01-21T14:31:07Z

FWIW we do handle this better in the new version of pyani that we're developing (as @peterjc's comment suggests ;) )

peterjc closed this as completed Jan 21, 2025

peterjc mentioned this issue Jan 21, 2025

Spaces in input filenames/paths cause cryptic failures #447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At least one NUCmer comparison failed. Please investigate (exiting) #445

At least one NUCmer comparison failed. Please investigate (exiting) #445

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

peterjc commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

widdowquinn commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025

widdowquinn commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

widdowquinn commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

widdowquinn commented Jan 21, 2025

At least one NUCmer comparison failed. Please investigate (exiting) #445

At least one NUCmer comparison failed. Please investigate (exiting) #445

Comments

ChristophKnapp commented Jan 21, 2025 • edited Loading

peterjc commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025 • edited Loading

widdowquinn commented Jan 21, 2025 • edited Loading

ChristophKnapp commented Jan 21, 2025

widdowquinn commented Jan 21, 2025 • edited Loading

ChristophKnapp commented Jan 21, 2025 • edited Loading

widdowquinn commented Jan 21, 2025 • edited Loading

ChristophKnapp commented Jan 21, 2025 • edited Loading

ChristophKnapp commented Jan 21, 2025 • edited Loading

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025

peterjc commented Jan 21, 2025

widdowquinn commented Jan 21, 2025

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

widdowquinn commented Jan 21, 2025 •

edited

Loading

widdowquinn commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

widdowquinn commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading

ChristophKnapp commented Jan 21, 2025 •

edited

Loading