Inquiry on error #388

songmj86 · 2025-01-19T11:12:08Z

Hi

I am currently running VAMB using the following code and I got this error

conda activate vamb ; vamb bin default -h ;
vamb bin default --outdir $Output_dir --fasta Concatenate.fna.gz--abundance_tsv abundance.tsv
conda deactivate ;

I think the issue is associated with abundance file (Attached file)
abundance.zip

Do you have any idea how to resolve this issue ?

Thanks!

2025-01-19 11:15:50.947 | INFO | Starting Vamb version 4.1.4.dev150+g8fa3280
2025-01-19 11:15:50.948 | INFO | Random seed is 59081373319474722
2025-01-19 11:15:50.948 | INFO | Invoked with CLI args: '/opt/anaconda3/envs/vamb/bin/vamb bin default --outdir /data/MJ/Gaya_JY/3_MAG/3_Vamb/2_vamb --fasta /data/Concatenate.fna.gz --abundance_tsv /data/MJ/Gaya_JY/3_MAG/3_Vamb/1_strobealign/abundance.tsv'
2025-01-19 11:15:50.948 | INFO | Loading TNF
2025-01-19 11:15:50.948 | INFO | Minimum sequence length: 2000
2025-01-19 11:15:50.948 | INFO | Loading data from FASTA file /data/Concatenate.fna.gz

2025-01-19 11:17:35.312 | WARNING | The minimum sequence length has been set to 2000, but 199872 sequences fell below this threshold and was filtered away.
Better results are obtained if the sequence file is filtered to the minimum sequence length before mapping.

2025-01-19 11:17:35.313 | INFO | Kept 2082809749 bases in 407197 sequences
2025-01-19 11:17:35.313 | INFO | Processed TNF in 104.36 seconds.

2025-01-19 11:17:35.313 | INFO | Loading depths
2025-01-19 11:17:35.313 | INFO | Reference hash: 0801662903dd7c81887c69e225a00dac
2025-01-19 11:17:35.313 | INFO | Parsing abundance from TSV at "/data/abundance.tsv"
2025-01-19 11:17:36.348 | ERROR | An error has been caught in function 'main', process 'MainProcess' (2919898), thread 'MainThread' (23445510059840):
Traceback (most recent call last):

File "/opt/anaconda3/envs/vamb/bin/vamb", line 8, in
sys.exit(main())
│ │ └ <function main at 0x1550fc9ce700>
│ └
└ <module 'sys' (built-in)>

File "/home/sandia/softwares/vamb/vamb/main.py", line 2399, in main
run(runner, opt.common.general)
│ │ │ │ └ <vamb.main.GeneralOptions object at 0x1550fc9c9a80>
│ │ │ └ <vamb.main.BinnerCommonOptions object at 0x1550fceea3c0>
│ │ └ <vamb.main.BinDefaultOptions object at 0x1550fceea510>
│ └ functools.partial(<function run_bin_default at 0x1550fc9cdb20>, <vamb.main.BinDefaultOptions object at 0x1550fceea510>)
└ <function run at 0x1550fc9cc9a0>

File "/home/sandia/softwares/vamb/vamb/main.py", line 701, in run
runner()
└ functools.partial(<function run_bin_default at 0x1550fc9cdb20>, <vamb.main.BinDefaultOptions object at 0x1550fceea510>)

File "/home/sandia/softwares/vamb/vamb/main.py", line 1309, in run_bin_default
composition, abundance = load_composition_and_abundance(
└ <function load_composition_and_abundance at 0x1550fc9cd1c0>

File "/home/sandia/softwares/vamb/vamb/main.py", line 1007, in load_composition_and_abundance
abundance = calc_abundance(
└ <function calc_abundance at 0x1550fc9cd120>

File "/home/sandia/softwares/vamb/vamb/main.py", line 980, in calc_abundance
abundance = vamb.parsebam.Abundance.from_tsv(
│ │ │ └ <classmethod(<function Abundance.from_tsv at 0x1552496edd00>)>
│ │ └ <class 'vamb.parsebam.Abundance'>
│ └ <module 'vamb.parsebam' from '/home/sandia/softwares/vamb/vamb/parsebam.py'>
└ <module 'vamb' from '/home/sandia/softwares/vamb/vamb/init.py'>

File "/home/sandia/softwares/vamb/vamb/parsebam.py", line 260, in from_tsv
raise ValueError(

ValueError: Too many rows in abundance TSV file "/data/abundance.tsv", expected 407198, got at least 407198

jakobnissen · 2025-01-22T07:24:00Z

Hey and sorry for being late on this.
The issue is that the default minimum sequence length is 2000. So, nearly 200,000 contigs are being removed when parsing the k-mers.
When reading the abundance file, the sequences are not filtered by length, and so it complains that too many sequences are found.
The workaround is to manually filter your sequences to a minimum length of 2000. I will look at the code and see if it's possible to automatically apply the filter when reading the abundance file.

songmj86 · 2025-01-22T07:32:44Z

Oh. I got it. Thanks for the notice !!

songmj86 closed this as completed Jan 22, 2025

jakobnissen mentioned this issue Jan 22, 2025

Mask short contigs when loading TSV abundance #389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry on error #388

Inquiry on error #388

songmj86 commented Jan 19, 2025

jakobnissen commented Jan 22, 2025

songmj86 commented Jan 22, 2025

Inquiry on error #388

Inquiry on error #388

Comments

songmj86 commented Jan 19, 2025

jakobnissen commented Jan 22, 2025

songmj86 commented Jan 22, 2025