Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Silent OOM bug with large Hi-C Datasets #186

Open
ignacio3437 opened this issue Dec 11, 2024 · 1 comment
Open

Silent OOM bug with large Hi-C Datasets #186

ignacio3437 opened this issue Dec 11, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ignacio3437
Copy link

Description of the bug

The FQ2HIC-RUNASSEMBLYVIZ module failes with a java OOM error and the pipeline continues. The .hic file that is produced is not usable and too small (200kb). The hi-c fastq reads were 8.5GB each.

Command used and terminal output

MMC commands here:
https://pfr-powerplant.s3.ap-southeast-2.amazonaws.com/output/genomic/plant/Actinidia/chinensis/KBC-pangenome/Red9/curated-scaffolds/assemblyqc_params/

Relevant files

####FQ2HIC-RUNASSEMBLYVIZ stdout.autosave

Picked up _JAVA_OPTIONS: -Djava.util.prefs.userRoot=user_prefs -Duser.home=user_home -Xms4g -Xmx4g
Dec 11, 2024 7:14:46 AM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space

  at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
  at java.base/java.util.ArrayList.grow(ArrayList.java:238)
  at java.base/java.util.ArrayList.grow(ArrayList.java:243)
  at java.base/java.util.ArrayList.add(ArrayList.java:486)
  at java.base/java.util.ArrayList.add(ArrayList.java:499)
  at com.google.common.base.Splitter.splitToList(Splitter.java:422)
  at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:93)

  at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:194)
  at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:387)
  at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

  at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)
  at juicebox.tools.HiCTools.main(HiCTools.java:86)

System information

This was run on AWS using MMC.
Outputs here:
https://ap-southeast-2.console.aws.amazon.com/s3/buckets/pfr-powerplant?prefix=output/genomic/plant/Actinidia/chinensis/KBC-pangenome/Red9/assemblyqc/&region=ap-southeast-2&bucketType=general

@ignacio3437 ignacio3437 added the bug Something isn't working label Dec 11, 2024
@GallVp
Copy link
Member

GallVp commented Dec 11, 2024

Hi @ignacio3437

Thank you for the bug report. This is not an OOM issue. Rather, it is a JAVA out of heap memory issue. By default, RUNASSEMBLYVISUALIZER is labelled process_single which means it has 6.GB of memory. Java heap memory is calculated as 80% of the task memory (6.GB) here,

if ( !task.memory ) { error '[RUNASSEMBLYVISUALIZER] Available memory not known. Specify process memory requirements to fix this.' }
def avail_mem = (task.memory.giga*0.8).intValue()
"""
assembly_tag=\$(echo $sample_id_on_tag | sed 's/.*\\.on\\.//g')
file_name="${agp_assembly_file}"
mkdir user_home
export _JAVA_OPTIONS="-Djava.util.prefs.userRoot=user_prefs -Duser.home=user_home -Xms${avail_mem}g -Xmx${avail_mem}g"

Essentially, we need to bump the task memory for RUNASSEMBLYVISUALIZER. Perhaps, we should allocate memory based on the size of the sorted_links_txt_file file,

tuple val(sample_id_on_tag), path(agp_assembly_file), path(sorted_links_txt_file)

This approach is also being experimented elsewhere: https://github.com/nf-core/modules/pull/6628/files

For now, the easiest solution is to add the following lines in your custom config (mmc.config) for large datasets,

withName: RUNASSEMBLYVISUALIZER {
    memory = { 16.GB  * task.attempt }
}

Thus, the problem can be resolved without changing the pipeline codebase.

@GallVp GallVp self-assigned this Dec 11, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants