"Too Many Open Files" Error When Using Sawfish for Joint Calling #9

Zoeyoungxy · 2024-12-13T06:37:37Z

Hello!
I encountered an issue while using the sawfish v0.12.7 to perform joint calling on 140 HiFi samples with ~30X sequencing depth. Below is the error message I received:
[2024-12-12][10:49:27][sawfish][INFO] Merging SV haplotypes across samples [2024-12-12][13:33:18][sawfish][INFO] Finished merging SV haplotypes across samples [E::hts_idx_load3] Could not load local index file '/home/align/sample54/sample54.pbmm2.sorted.bam.bai': Too many open files thread 'main' panicked at src/worker_thread_data.rs:17:69: calledResult::unwrap()on anErrvalue: BamInvalidIndex { target: "/home/align/sample54/sample54.pbmm2.sorted.bam" } note: run withRUST_BACKTRACE=1environment variable to display a backtrace
I have double-checked that the BAM and BAI files exist and seem to be in good condition.
From the user guide, I noticed that sawfish has been tested successfully on merging data for 47 HPRC samples, but it mentions challenges with larger datasets. Could this issue be due to the large number of samples I am processing? If so, do you have any recommendations or strategies to address this?
Any suggestions or workarounds to resolve this would be greatly appreciated!

Best wishes

The text was updated successfully, but these errors were encountered:

ctsa · 2024-12-13T17:47:19Z

Thanks for reporting this.

In general, we haven't written sawfish to scale very well beyond pedigree-like sample counts at this point. The joint-call step runtime will scale non-linearly with sample count, so 140 may be challenging from the runtime perspective alone.

Given this approach to scalability, the current scheme does not scale particularly well with file handles either. It will open n_threads*n_samples bam file handles (+ misc others).

There are many ways we could improve this scalability, it is not clear where we'll want to prioritize this in our upcoming feature efforts but it's not being worked on immediately. For a quick workaround if you'd still like to try this larger joint-sample analysis I'd suggest changing the open file limit before running sawfish using something like this:

ulimit -n 100000

I'll check and see if I can add a similar change programmatically on sawfish startup to help temporarily workaround the high file handle usage.

ctsa · 2024-12-13T22:35:22Z

I want ahead and included an open file limit modification in the latest minor update here:

https://github.com/PacificBiosciences/sawfish/releases/tag/v0.12.8

...this will only help if your system's hard limit allows you to go higher already, but in this case the setting directly in sawfish means you won't need to run a separate ulimit command.

Zoeyoungxy · 2024-12-16T02:31:09Z

Thanks for your patience. The guidance really resolved my problem.

-Zoey

ctsa closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Too Many Open Files" Error When Using Sawfish for Joint Calling #9

"Too Many Open Files" Error When Using Sawfish for Joint Calling #9

Zoeyoungxy commented Dec 13, 2024

ctsa commented Dec 13, 2024

ctsa commented Dec 13, 2024

Zoeyoungxy commented Dec 16, 2024

"Too Many Open Files" Error When Using Sawfish for Joint Calling #9

"Too Many Open Files" Error When Using Sawfish for Joint Calling #9

Comments

Zoeyoungxy commented Dec 13, 2024

ctsa commented Dec 13, 2024

ctsa commented Dec 13, 2024

Zoeyoungxy commented Dec 16, 2024