Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Inquiry on error #388

Closed
songmj86 opened this issue Jan 19, 2025 · 2 comments · Fixed by #389
Closed

Inquiry on error #388

songmj86 opened this issue Jan 19, 2025 · 2 comments · Fixed by #389

Comments

@songmj86
Copy link

Hi

I am currently running VAMB using the following code and I got this error

conda activate vamb ; vamb bin default -h ;
vamb bin default --outdir $Output_dir --fasta Concatenate.fna.gz--abundance_tsv abundance.tsv
conda deactivate ;

I think the issue is associated with abundance file (Attached file)
abundance.zip

Do you have any idea how to resolve this issue ?

Thanks!

2025-01-19 11:15:50.947 | INFO | Starting Vamb version 4.1.4.dev150+g8fa3280
2025-01-19 11:15:50.948 | INFO | Random seed is 59081373319474722
2025-01-19 11:15:50.948 | INFO | Invoked with CLI args: '/opt/anaconda3/envs/vamb/bin/vamb bin default --outdir /data/MJ/Gaya_JY/3_MAG/3_Vamb/2_vamb --fasta /data/Concatenate.fna.gz --abundance_tsv /data/MJ/Gaya_JY/3_MAG/3_Vamb/1_strobealign/abundance.tsv'
2025-01-19 11:15:50.948 | INFO | Loading TNF
2025-01-19 11:15:50.948 | INFO | Minimum sequence length: 2000
2025-01-19 11:15:50.948 | INFO | Loading data from FASTA file /data/Concatenate.fna.gz

2025-01-19 11:17:35.312 | WARNING | The minimum sequence length has been set to 2000, but 199872 sequences fell below this threshold and was filtered away.
Better results are obtained if the sequence file is filtered to the minimum sequence length before mapping.

2025-01-19 11:17:35.313 | INFO | Kept 2082809749 bases in 407197 sequences
2025-01-19 11:17:35.313 | INFO | Processed TNF in 104.36 seconds.

2025-01-19 11:17:35.313 | INFO | Loading depths
2025-01-19 11:17:35.313 | INFO | Reference hash: 0801662903dd7c81887c69e225a00dac
2025-01-19 11:17:35.313 | INFO | Parsing abundance from TSV at "/data/abundance.tsv"
2025-01-19 11:17:36.348 | ERROR | An error has been caught in function 'main', process 'MainProcess' (2919898), thread 'MainThread' (23445510059840):
Traceback (most recent call last):

File "/opt/anaconda3/envs/vamb/bin/vamb", line 8, in
sys.exit(main())
│ │ └ <function main at 0x1550fc9ce700>
│ └
└ <module 'sys' (built-in)>

File "/home/sandia/softwares/vamb/vamb/main.py", line 2399, in main
run(runner, opt.common.general)
│ │ │ │ └ <vamb.main.GeneralOptions object at 0x1550fc9c9a80>
│ │ │ └ <vamb.main.BinnerCommonOptions object at 0x1550fceea3c0>
│ │ └ <vamb.main.BinDefaultOptions object at 0x1550fceea510>
│ └ functools.partial(<function run_bin_default at 0x1550fc9cdb20>, <vamb.main.BinDefaultOptions object at 0x1550fceea510>)
└ <function run at 0x1550fc9cc9a0>

File "/home/sandia/softwares/vamb/vamb/main.py", line 701, in run
runner()
└ functools.partial(<function run_bin_default at 0x1550fc9cdb20>, <vamb.main.BinDefaultOptions object at 0x1550fceea510>)

File "/home/sandia/softwares/vamb/vamb/main.py", line 1309, in run_bin_default
composition, abundance = load_composition_and_abundance(
└ <function load_composition_and_abundance at 0x1550fc9cd1c0>

File "/home/sandia/softwares/vamb/vamb/main.py", line 1007, in load_composition_and_abundance
abundance = calc_abundance(
└ <function calc_abundance at 0x1550fc9cd120>

File "/home/sandia/softwares/vamb/vamb/main.py", line 980, in calc_abundance
abundance = vamb.parsebam.Abundance.from_tsv(
│ │ │ └ <classmethod(<function Abundance.from_tsv at 0x1552496edd00>)>
│ │ └ <class 'vamb.parsebam.Abundance'>
│ └ <module 'vamb.parsebam' from '/home/sandia/softwares/vamb/vamb/parsebam.py'>
└ <module 'vamb' from '/home/sandia/softwares/vamb/vamb/init.py'>

File "/home/sandia/softwares/vamb/vamb/parsebam.py", line 260, in from_tsv
raise ValueError(

ValueError: Too many rows in abundance TSV file "/data/abundance.tsv", expected 407198, got at least 407198

@jakobnissen
Copy link
Member

Hey and sorry for being late on this.
The issue is that the default minimum sequence length is 2000. So, nearly 200,000 contigs are being removed when parsing the k-mers.
When reading the abundance file, the sequences are not filtered by length, and so it complains that too many sequences are found.
The workaround is to manually filter your sequences to a minimum length of 2000. I will look at the code and see if it's possible to automatically apply the filter when reading the abundance file.

@songmj86
Copy link
Author

Oh. I got it. Thanks for the notice !!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants