Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

At least one NUCmer comparison failed. Please investigate (exiting) #445

Closed
ChristophKnapp opened this issue Jan 21, 2025 · 24 comments
Closed

Comments

@ChristophKnapp
Copy link

ChristophKnapp commented Jan 21, 2025

Sorry for bringing up old issues. This seems related to this issue.

413

The fix from this issue does not work anymore.

mamba install mummer=3.23=h589c0e0_12 -y

Looking for: ['mummer==3.23=h589c0e0_12']

warning libmamba Cache file "/opt/miniforge3/pkgs/cache/497deca9.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/09cdf8bf.json" was modified by another program
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/ffeee55f.json" was modified by another program
bioconda/linux-64 (check zst) Checked 0.1s
warning libmamba Cache file "/opt/miniforge3/pkgs/cache/2a957770.json" was modified by another program
bioconda/noarch (check zst) Checked 0.0s
bioconda/linux-64 5.0MB @ 5.7MB/s 0.9s
bioconda/noarch 4.7MB @ 3.7MB/s 1.3s
conda-forge/noarch 18.7MB @ 9.4MB/s 2.0s
conda-forge/linux-64 41.4MB @ 16.7MB/s 2.6s

Pinned packages:

  • python 3.8.*

Could not solve for environment specs
The following package could not be installed
└─ mummer ==3.23 h589c0e0_12 does not exist (perhaps a typo or a missing channel).

but this could also be something entirely different.

I have a bunch of genomes I want to compare.

So, what I did

I created the database

pyani createdb -v -l pyANI_create_db.log

indexed my genomes.

pyani index -i genomes

and then I run

pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim"
--labels genomes/labels.txt --classes genomes/classes.txt

Except for the test_strain.fasta file, I downloaded all genomes manually from ncbi. They came up in other analysis to be related to my test strain. The assembly of the test strain was done with flye in galaxy.

pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name "Stammbewertung anim" --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=PosixPath('genomes/classes.txt'), dbpath=PosixPath('.pyani/pyanidb'), debug=False, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x739c0036b280>, indir=PosixPath('genomes'), jobprefix='PYANI', labels=PosixPath('genomes/labels.txt'), logfile=PosixPath('Stammbewerung_anim.log'), maxmatch=False, name='Stammbewertung anim', nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('genomes/results'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /opt/miniforge3/envs/pyani_env/bin/pyani anim -i genomes -o genomes/results -v -l Stammbewerung_anim.log --name Stammbewertung anim --labels genomes/labels.txt --classes genomes/classes.txt
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: & Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: Analytical Methods, 8(1), 12–24. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 25:3389-3402
[INFO] [pyani.scripts.pyani_script]: Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: (2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.pyani_script]: fastANI: Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis K, and
[INFO] [pyani.scripts.pyani_script]: Aluru S (2018) 'High throughput ANI analysis of 90K prokaryotic
[INFO] [pyani.scripts.pyani_script]: genomes reveals clear species boundaries.' Nature Communications 9, 5114
[INFO] [pyani.scripts.pyani_script]: Checking for database file: .pyani/pyanidb
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Linux_3.1 (/opt/miniforge3/envs/pyani_env/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: Stammbewertung anim
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000008425.1_ASM842v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000015785.2_ASM1578v2_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCA_000262045.1_KCTC_13613_01_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000009045.1_ASM904v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000195515.1_ASM19551v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000204275.1_ASM20427v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000221645.1_ASM22164v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_000747705.1_ASM74770v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001587435.1_B425_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001672615.1_ASM167261v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001687185.1_ASM168718v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001705195.1_ASM170519v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/GCF_001866745.1_ASM186674v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: genomes/test_strain.fasta.md5.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 441505.68it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...total pairwise comparisons to be performed: 182
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...after check, still need to run 182 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 24607.16it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Results not found for 182 comparisons; 182 new jobs built.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running jobs with multiprocessing
[ERROR] [pyani.scripts.subcommands.subcmd_anim]: At least one NUCmer comparison failed. Please investigate (exiting)
Traceback (most recent call last):
File "/opt/miniforge3/envs/pyani_env/bin/pyani", line 10, in
sys.exit(run_main())
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/pyani_script.py", line 143, in run_main
returnval = args.func(args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 296, in subcmd_anim
run_anim_jobs(joblist, args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 401, in run_anim_jobs
raise PyaniException("Multiprocessing run failed in ANIm")
pyani.PyaniException: Multiprocessing run failed in ANIm

Let me know if you need anything else.

Regards

Christoph

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

Ah, in your case I think this is the key line in the error output:

mummer ==3.23 h589c0e0_12 does not exist (perhaps a typo or a missing channel).

There was a problem with the bioconda mummer package on macOS bioconda/bioconda-recipes#28209 - since resolved. But you seem to be on Linux?

Anyway, try getting mummer installed manually first...

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

(Looking at this prompted me to suggest #446, but that's macOS specific)

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Jan 21, 2025

I did see that, but it did not make its way through my synapses. You are right of course. Still, how do I find out which version is the right one?

"Anyway, try getting mummer installed manually first..."

It is and it was.

conda list | grep mummer
mummer 3.23 pl5321h503566f_21 bioconda

@widdowquinn
Copy link
Owner

widdowquinn commented Jan 21, 2025

When you run nucmer -h what does it return?

@ChristophKnapp
Copy link
Author

nucmer -h

USAGE: nucmer [options]

DESCRIPTION:
nucmer generates nucleotide alignments between two mutli-FASTA input
files. The out.delta output file lists the distance between insertions
and deletions that produce maximal scoring alignments between each
sequence. The show-* utilities know how to read this format.

MANDATORY:
Reference Set the input reference multi-FASTA filename
Query Set the input query multi-FASTA filename

OPTIONS:
--mum Use anchor matches that are unique in both the reference
and query
--mumcand Same as --mumreference
--mumreference Use anchor matches that are unique in in the reference
but not necessarily unique in the query (default behavior)
--maxmatch Use all anchor matches regardless of their uniqueness

-b|breaklen     Set the distance an alignment extension will attempt to
                extend poor scoring regions before giving up (default 200)
--[no]banded    Enforce absolute banding of dynamic programming matrix
                based on diagdiff parameter EXPERIMENTAL (default no)
-c|mincluster   Sets the minimum length of a cluster of matches (default 65)
--[no]delta     Toggle the creation of the delta file (default --delta)
--depend        Print the dependency information and exit
-D|diagdiff     Set the maximum diagonal difference between two adjacent
                anchors in a cluster (default 5)
-d|diagfactor   Set the maximum diagonal difference between two adjacent
                anchors in a cluster as a differential fraction of the gap
                length (default 0.12)
--[no]extend    Toggle the cluster extension step (default --extend)
-f
--forward       Use only the forward strand of the Query sequences
-g|maxgap       Set the maximum gap between two adjacent matches in a
                cluster (default 90)
-h
--help          Display help information and exit
-l|minmatch     Set the minimum length of a single match (default 20)
-o
--coords        Automatically generate the original NUCmer1.1 coords
                output file using the 'show-coords' program
--[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                extension reaches the end of a sequence, it will backtrack
                to optimize the alignment score instead of terminating the
                alignment at the end of the sequence (default --optimize)
-p|prefix       Set the prefix of the output files (default "out")
-r
--reverse       Use only the reverse complement of the Query sequences
--[no]simplify  Simplify alignments by removing shadowed clusters. Turn
                this option off if aligning a sequence to itself to look
                for repeats (default --simplify)
-V
--version       Display the version information and exit

nucmer -V
nucmer
NUCmer (NUCleotide MUMmer) version 3.1

@widdowquinn
Copy link
Owner

widdowquinn commented Jan 21, 2025

That rules out the same issue as was the case for macOS, where the call to the underlying Perl binary was hardcoded to an unavailable path within the nucmer script itself.

Now that we know nucmer is available and not immediately broken itself, a likely issue is that the nucmer comparisons are not writing output. Is there any output in the output directory at all?

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Jan 21, 2025

Yes, there is. A directory for each genome containing empty (0 bytes) files of all other genomes.

@widdowquinn
Copy link
Owner

widdowquinn commented Jan 21, 2025

How closely-related do you expect the genomes to be? If they are too distantly-related, then nucmer will not find any homologous regions, and that may give an empty output .filter file.

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Jan 21, 2025

This might be the reason. Most genomes come from Alignment-free genome distance estimation against the NCBI RefSeq representative genome database or 16S rRNA MegaBlast. I picked the ones which are most closely related.

They are probably quite closely related but the quality of the alignment might be the issue. I do have a contig the same size of the genome though.

I'll retry with a smaller subset.

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Jan 21, 2025

I tested 6 genomes from ncbi with the error as above. All of them were from taxon 1390. No test strain. This taxon is most common among assemblies I'm using. I can't download whole taxon because of issue 444.

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

Have you tried one of the documented example https://widdowquinn.github.io/pyani/#walkthrough-a-first-analysis or https://github.com/widdowquinn/pyani/tree/master/tests/test_input/subcmd_anim from the test suite where we know the nucmer comparisons work?

@ChristophKnapp
Copy link
Author

Sorry, still the same error

pyani anim -i subcmd_anim -o subcmd_anim/results -v -l test1_anim.log --name "test 1"
--labels subcmd_anim/labels.txt --classes subcmd_anim/classes.txt
[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=PosixPath('subcmd_anim/classes.txt'), dbpath=PosixPath('.pyani/pyanidb'), debug=False, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x71dba3d6b280>, indir=PosixPath('subcmd_anim'), jobprefix='PYANI', labels=PosixPath('subcmd_anim/labels.txt'), logfile=PosixPath('test1_anim.log'), maxmatch=False, name='test 1', nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('subcmd_anim/results'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /opt/miniforge3/envs/pyani_env/bin/pyani anim -i subcmd_anim -o subcmd_anim/results -v -l test1_anim.log --name test 1 --labels subcmd_anim/labels.txt --classes subcmd_anim/classes.txt
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: & Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: Analytical Methods, 8(1), 12–24. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 25:3389-3402
[INFO] [pyani.scripts.pyani_script]: Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: (2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.pyani_script]: fastANI: Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis K, and
[INFO] [pyani.scripts.pyani_script]: Aluru S (2018) 'High throughput ANI analysis of 90K prokaryotic
[INFO] [pyani.scripts.pyani_script]: genomes reveals clear species boundaries.' Nature Communications 9, 5114
[INFO] [pyani.scripts.pyani_script]: Checking for database file: .pyani/pyanidb
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Linux_3.1 (/opt/miniforge3/envs/pyani_env/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: test 1
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000185985.2_ASM18598v2_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000331065.1_ASM33106v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000973505.1_ASM97350v1_genomic.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.fna.md5.
[WARNING] [pyani.pyani_files]: Hashfile subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.fna.md5 does not exist...
[WARNING] [pyani.pyani_files]: ... trying subcmd_anim/GCF_000973545.1_ASM97354v1_genomic.md5.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 340078.70it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...total pairwise comparisons to be performed: 30
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: ...after check, still need to run 30 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 18685.64it/s]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Results not found for 30 comparisons; 30 new jobs built.
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running jobs with multiprocessing
[ERROR] [pyani.scripts.subcommands.subcmd_anim]: At least one NUCmer comparison failed. Please investigate (exiting)
Traceback (most recent call last):
File "/opt/miniforge3/envs/pyani_env/bin/pyani", line 10, in
sys.exit(run_main())
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/pyani_script.py", line 143, in run_main
returnval = args.func(args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 296, in subcmd_anim
run_anim_jobs(joblist, args)
File "/opt/miniforge3/envs/pyani_env/lib/python3.8/site-packages/pyani/scripts/subcommands/subcmd_anim.py", line 401, in run_anim_jobs
raise PyaniException("Multiprocessing run failed in ANIm")
pyani.PyaniException: Multiprocessing run failed in ANIm

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

Progress of a kind. It doesn't seem to be the genomes themselves (an outlier in the dataset can cause nucmer failures), but more likely a problem with nucmer on your system. Have you every used nucmer yourself?

Does this work for you?:

❯ cd tests/test_input/subcmd_anim
❯ nucmer -p /tmp/test-case --maxmatch GCF_000011745.1_ASM1174v1_genomic.fna GCF_000043285.1_ASM4328v1_genomic.fna
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "/tmp/test-case.ntref" of length 791655
# construct suffix tree for sequence of length 791655
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 7916 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 0.07
# reading input file "/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna" of length 705557
# matching query-file "/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna"
# against subject-file "/tmp/test-case.ntref"
# COMPLETETIME /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 0.22
# SPACE /Users/peterjc/miniforge3/opt/mummer-3.23/mummer /tmp/test-case.ntref 1.45
4: FINISHING DATA

❯ head /tmp/test-case.delta
/Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000011745.1_ASM1174v1_genomic.fna /Users/peterjc/repositories/pyani/tests/test_input/subcmd_anim/GCF_000043285.1_ASM4328v1_genomic.fna
NUCMER
>NC_007292.1 NC_005061.1 791654 705557
67950 69177 59703 60930 196 196 0
808
-8
16
-8
90
-19

The paths will differ depending on where your files are. For me:

nucmer --version
nucmer
NUCmer (NUCleotide MUMmer) version 3.1which nucmer
/Users/peterjc/miniforge3/bin/nucmer

@ChristophKnapp
Copy link
Author

When I run your code from the pyani_env

nucmer -p /tmp/test-case --maxmatch GCF_000011745.1_ASM1174v1_genomic.fna GCF_000043285.1_ASM4328v1_genomic.fna
1: PREPARING DATA

USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options]

Try '/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h' for more information.
ERROR: prenuc returned non-zero

The nucmer.error file contains
20250121|141052| 47631| ERROR: prenuc returned non-zero

I also tried with the environment deactivated and with the direct path to nucmer but with the same result.

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

Progress. It looks like the prenuc part of mummer used by nucmer is broken. On my system:

file /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: Mach-O 64-bit executable arm64/Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /Users/peterjc/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc  [options]  <reference>

-h     display help information

  Input is one multi-fasta sequence file.
  Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

If you don't get something similar, the first thing I would try to fix this is reinstalling mummer:

❯ conda uninstall mummer
...
❯ conda install mummer
...

@ChristophKnapp
Copy link
Author

Reinstalling mummer does not fix it.

This is what conda installed
mummer bioconda/linux-64::mummer-3.23-pl5321h503566f_21
perl conda-forge/linux-64::perl-5.32.1-7_hd590300_perl5

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

And what does trying to run prenuc directly reveal?

@ChristophKnapp
Copy link
Author

Sorry forgot to say that I get something similar but reinstalling did not fix it.

file /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc
/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

/opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /opt/miniforge3/envs/pyani_env/opt/mummer-3.23/aux_bin/prenuc [options]

-h display help information

Input is one multi-fasta sequence file.
Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

That matches our x86-64 Linux machine:

$ file ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc
/home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped
$ ~/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc -h

USAGE: /home/pjacock/miniforge3/envs/pyani-plus_py312/opt/mummer-3.23/aux_bin/prenuc  [options]  <reference>

-h     display help information

  Input is one multi-fasta sequence file.
  Output is to stdout, and it consists of each sequence in the
FASTA file appended together with all the headers removed. A
new generic header is inserted at the beginning of the file to
adhere to FASTA standards. An `x' is placed at the end of all
sequences so that no MUMs will span two different sequences.

I am running out of ideas, but one thing which I know trips up mummer is atypical characters in paths and filenames. For example, spaces break it - and things like accented characters, emoji, or even some punctuation are also potential trouble.

Does your home directory, project directory, or system temp directory have any spaces etc?

@ChristophKnapp
Copy link
Author

It does, I took this project over from a windows user. Will retry after removing spaces.

@ChristophKnapp
Copy link
Author

After removing spaces, it completed the test data without problems.

Thanks a lot I really appreciate your time and efforts.

Now back to my original dataset.

@ChristophKnapp
Copy link
Author

My own input data also finished without problems.

@peterjc
Copy link
Collaborator

peterjc commented Jan 21, 2025

Hurray. It would be possible for pyANI to work around this limitation, but rather a lot of work.

As a stop gap, checking for spaces at the start and aborting with a clear message would be much better for the user experience.

@widdowquinn
Copy link
Owner

FWIW we do handle this better in the new version of pyani that we're developing (as @peterjc's comment suggests ;) )

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants