You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I encountered an issue while using the sawfish v0.12.7 to perform joint calling on 140 HiFi samples with ~30X sequencing depth. Below is the error message I received: [2024-12-12][10:49:27][sawfish][INFO] Merging SV haplotypes across samples [2024-12-12][13:33:18][sawfish][INFO] Finished merging SV haplotypes across samples [E::hts_idx_load3] Could not load local index file '/home/align/sample54/sample54.pbmm2.sorted.bam.bai': Too many open files thread 'main' panicked at src/worker_thread_data.rs:17:69: calledResult::unwrap()on anErrvalue: BamInvalidIndex { target: "/home/align/sample54/sample54.pbmm2.sorted.bam" } note: run withRUST_BACKTRACE=1environment variable to display a backtrace
I have double-checked that the BAM and BAI files exist and seem to be in good condition.
From the user guide, I noticed that sawfish has been tested successfully on merging data for 47 HPRC samples, but it mentions challenges with larger datasets. Could this issue be due to the large number of samples I am processing? If so, do you have any recommendations or strategies to address this?
Any suggestions or workarounds to resolve this would be greatly appreciated!
Best wishes
The text was updated successfully, but these errors were encountered:
In general, we haven't written sawfish to scale very well beyond pedigree-like sample counts at this point. The joint-call step runtime will scale non-linearly with sample count, so 140 may be challenging from the runtime perspective alone.
Given this approach to scalability, the current scheme does not scale particularly well with file handles either. It will open n_threads*n_samples bam file handles (+ misc others).
There are many ways we could improve this scalability, it is not clear where we'll want to prioritize this in our upcoming feature efforts but it's not being worked on immediately. For a quick workaround if you'd still like to try this larger joint-sample analysis I'd suggest changing the open file limit before running sawfish using something like this:
ulimit -n 100000
I'll check and see if I can add a similar change programmatically on sawfish startup to help temporarily workaround the high file handle usage.
...this will only help if your system's hard limit allows you to go higher already, but in this case the setting directly in sawfish means you won't need to run a separate ulimit command.
Hello!
I encountered an issue while using the sawfish v0.12.7 to perform joint calling on 140 HiFi samples with ~30X sequencing depth. Below is the error message I received:
[2024-12-12][10:49:27][sawfish][INFO] Merging SV haplotypes across samples [2024-12-12][13:33:18][sawfish][INFO] Finished merging SV haplotypes across samples [E::hts_idx_load3] Could not load local index file '/home/align/sample54/sample54.pbmm2.sorted.bam.bai': Too many open files thread 'main' panicked at src/worker_thread_data.rs:17:69: called
Result::unwrap()on an
Errvalue: BamInvalidIndex { target: "/home/align/sample54/sample54.pbmm2.sorted.bam" } note: run with
RUST_BACKTRACE=1environment variable to display a backtrace
I have double-checked that the BAM and BAI files exist and seem to be in good condition.
From the user guide, I noticed that sawfish has been tested successfully on merging data for 47 HPRC samples, but it mentions challenges with larger datasets. Could this issue be due to the large number of samples I am processing? If so, do you have any recommendations or strategies to address this?
Any suggestions or workarounds to resolve this would be greatly appreciated!
Best wishes
The text was updated successfully, but these errors were encountered: