You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Everyone!
I've been using Picard on RNAseq data BAM. I've tried one sample per step to ensure everything is working well.
When trying to use parallel command, I get a few issues
I'm using ArchLinux with 32 threads (16 cores) and 256gb RAM. Disk Space is ok also.
Some jobs for MarkDuplicates and FixMateInformation never finishes, even tought they start.
For MarkDuplicates, some jobs just don't make BAM file (some BAM files got from 5gb to 160mb, but when I check the output of the program, it finishes too soon without any errors). For the FixMateInformation, I had to repeat the operation for 20% of the samples everytime until everything finished (no errors spilled).
Even though I have 55 BAM files generated, I only got 24 TXT files.
What I get from the script output:
[Sun Apr 17 13:44:49 GMT-03:00 2022] MarkDuplicates --INPUT /home/picard/mate_information/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.bam --OUTPUT /home/picard/mark_duplicates/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.bam --METRICS_FILE /home/picard/mar_duplicates_txt/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.txt --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Apr 17 13:44:49 GMT-03:00 2022] Executing as gabriel.gama@tcg on Linux 5.13.13-arch1-1 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.0
INFO 2022-04-17 13:44:49 MarkDuplicates Start of doWork freeMemory: 2036971400; totalMemory: 2058354688; maxMemory: 28631367680
INFO 2022-04-17 13:44:49 MarkDuplicates Reading input file and constructing read end information.
INFO 2022-04-17 13:44:49 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
INFO 2022-04-17 13:44:56 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 1:12,013,314
finishes with:
INFO 2022-04-17 14:36:41 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2022-04-17 14:36:45 MarkDuplicates Sorting list of duplicate records.
INFO 2022-04-17 14:36:50 MarkDuplicates After generateDuplicateIndexes freeMemory: 24194363144; totalMemory: 31524388864; maxMemory: 31524388864
INFO 2022-04-17 14:36:50 MarkDuplicates Marking 72796758 records as duplicates.
INFO 2022-04-17 14:36:50 MarkDuplicates Found 362973 optical duplicate clusters.
INFO 2022-04-17 14:36:50 MarkDuplicates Reads are assumed to be ordered by: coordinate
One sample that did work:
[Sun Apr 17 14:16:14 GMT-03:00 2022] MarkDuplicates --INPUT /home/picard/mate_information/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.bam --OUTPUT /home/picard/mark_duplicates/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.bam --METRICS_FILE /home/picard/mark_duplicates_txt/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.txt --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Apr 17 14:16:14 GMT-03:00 2022] Executing as gabriel.gama@tcg on Linux 5.13.13-arch1-1 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.0
INFO 2022-04-17 14:16:14 MarkDuplicates Start of doWork freeMemory: 2036970664; totalMemory: 2058354688; maxMemory: 28631367680
INFO 2022-04-17 14:16:14 MarkDuplicates Reading input file and constructing read end information.
INFO 2022-04-17 14:16:14 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
INFO 2022-04-17 14:16:21 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 1:25,484,661
INFO 2022-04-17 14:16:21 MarkDuplicates Tracking 37883 as yet unmatched pairs. 37875 records in RAM.
INFO 2022-04-17 14:16:26 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:11s. Time for last 1,000,000: 5s. Last read position: 1:66,121,764
INFO 2022-04-17 14:16:26 MarkDuplicates Tracking 410 as yet unmatched pairs. 299 records in RAM.
INFO 2022-04-17 14:16:31 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:16s. Time for last 1,000,000: 4s. Last read position: 1:116,625,993
INFO 2022-04-17 14:16:31 MarkDuplicates Tracking 138 as yet unmatched pairs. 18 records in RAM.
INFO 2022-04-17 14:16:36 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:21s. Time for last 1,000,000: 5s. Last read position: 1:161,191,238
Finishes with:
INFO 2022-04-17 15:01:54 MarkDuplicates Writing complete. Closing input iterator.
INFO 2022-04-17 15:01:54 MarkDuplicates Duplicate Index cleanup.
INFO 2022-04-17 15:01:54 MarkDuplicates Getting Memory Stats.
INFO 2022-04-17 15:01:55 MarkDuplicates Before output close freeMemory: 27596498792; totalMemory: 27786215424; maxMemory: 28631367680
INFO 2022-04-17 15:01:55 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2022-04-17 15:01:55 MarkDuplicates After output close freeMemory: 27830331240; totalMemory: 28020047872;
The text was updated successfully, but these errors were encountered:
Hello Everyone!
I've been using Picard on RNAseq data BAM. I've tried one sample per step to ensure everything is working well.
When trying to use parallel command, I get a few issues
I'm using ArchLinux with 32 threads (16 cores) and 256gb RAM. Disk Space is ok also.
Some jobs for MarkDuplicates and FixMateInformation never finishes, even tought they start.
For MarkDuplicates, some jobs just don't make BAM file (some BAM files got from 5gb to 160mb, but when I check the output of the program, it finishes too soon without any errors). For the FixMateInformation, I had to repeat the operation for 20% of the samples everytime until everything finished (no errors spilled).
Let's focus on MarkDuplicates
I'm using:
Even though I have 55 BAM files generated, I only got 24 TXT files.
What I get from the script output:
finishes with:
One sample that did work:
The text was updated successfully, but these errors were encountered: