Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error in map_reads_STAR #135

Open
MiracleEaTu opened this issue Feb 7, 2025 · 6 comments
Open

Error in map_reads_STAR #135

MiracleEaTu opened this issue Feb 7, 2025 · 6 comments

Comments

@MiracleEaTu
Copy link

MiracleEaTu commented Feb 7, 2025

Hi spacemake developer,

I got an error message when running openst & spacemake (v0.8.0) several times and I cannot overcome it.

The error is related to reading unaligned_bc_tagged.polyA_adapter_trimmed.bam using pysam. The full error message is showing below:

[Fri Feb  7 00:00:37 2025]
rule map_reads_STAR:
    input: projects/test/processed_data/tutorial/illumina/complete_data/unaligned_bc_tagged.polyA_adapter_trimmed.bam, species_data/human/genome/star_index/SAindex, human.genome.genomeLoad.done, species_data/human/genome/annotation.gtf, human.genome.genomeLoad.done
    output: projects/test/processed_data/tutorial/illumina/complete_data/genome.STAR.bam, projects/test/processed_data/tutorial/illumina/complete_data/star.genome.Log.final.out
    jobid: 3
    wildcards: project_id=test, sample_id=tutorial, ref_name=genome
    threads: 16

INFO	2025-02-07 00:00:39	TagReadWithGeneFunction	

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** 
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    TagReadWithGeneFunction -I /dev/stdin -O projects/test/processed_data/tutorial/illumina/complete_data/genome.STAR.bam -ANNOTATIONS_FILE species_data/human/genome/annotation.gtf
**********


Feb 07, 2025 12:00:39 AM com.intel.gkl.NativeLibraryLoader load
INFO: Loading libgkl_compression.so from jar:file:/vast/palmer/scratch/ya-chi_ho/qg66/OpenST/spacemake_20250204/dropseq-3.0.0/lib/picard-3.1.0.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Feb 07 00:00:39 EST 2025] TagReadWithGeneFunction INPUT=/dev/stdin OUTPUT=projects/test/processed_data/tutorial/illumina/complete_data/genome.STAR.bam ANNOTATIONS_FILE=species_data/human/genome/annotation.gtf    GENE_NAME_TAG=gn GENE_STRAND_TAG=gs GENE_FUNCTION_TAG=gf READ_FUNCTION_TAG=XF USE_STRAND_INFO=true PCT_REQUIRED_OVERLAP=0.0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Feb 07 00:00:39 EST 2025] Executing on Linux 4.18.0-477.74.1.el8_8.x86_64 amd64; OpenJDK 64-Bit Server VM 21.0.2+13-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 3.0.0

error type 1:
EXITING because of FATAL ERROR in reads input: short read sequence line: 0
Read Name=@SRR27331456.1
Read Sequence=""
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=650

Feb 07 00:01:29 ...... FATAL ERROR, exiting

error type 2:
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at: __n (which is 3) >= this->size() (which is 2)
~/.conda/envs/spacemake/bin/STAR: line 8: 2737561 Aborted                 (core dumped) "${cmd}" "$@"

Same traceback and downstream message:
Traceback (most recent call last):
  File "~/.conda/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/scripts/splice_bam_header.py", line 106, in <module>
    mbam = pysam.AlignmentFile(args.in_bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 956, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file does not contain alignment data
INFO	2025-02-07 00:01:32	GTFParser	Seen many non-increasing record positions. Printing Read-names as well.
INFO	2025-02-07 00:01:33	GTFParser	read       100,000 GTF records.  Elapsed time: 00:00:01s.  Time for last 100,000:    1s.  Last read position: chr1:108,939,636
INFO	2025-02-07 00:01:34	GTFParser	read       200,000 GTF records.  Elapsed time: 00:00:02s.  Time for last 100,000:    1s.  Last read position: chr1:228,480,521
INFO	2025-02-07 00:01:35	GTFParser	read       300,000 GTF records.  Elapsed time: 00:00:03s.  Time for last 100,000:    1s.  Last read position: chr2:131,239,278
INFO	2025-02-07 00:01:37	GTFParser	read       400,000 GTF records.  Elapsed time: 00:00:05s.  Time for last 100,000:    1s.  Last read position: chr3:20,161,009
INFO	2025-02-07 00:01:38	GTFParser	read       500,000 GTF records.  Elapsed time: 00:00:06s.  Time for last 100,000:    1s.  Last read position: chr3:177,934,102
INFO	2025-02-07 00:01:39	GTFParser	read       600,000 GTF records.  Elapsed time: 00:00:07s.  Time for last 100,000:    1s.  Last read position: chr4:149,429,922
INFO	2025-02-07 00:01:40	GTFParser	read       700,000 GTF records.  Elapsed time: 00:00:08s.  Time for last 100,000:    1s.  Last read position: chr5:151,052,165
INFO	2025-02-07 00:01:41	GTFParser	read       800,000 GTF records.  Elapsed time: 00:00:09s.  Time for last 100,000:    1s.  Last read position: chr6:116,616,624
INFO	2025-02-07 00:01:42	GTFParser	read       900,000 GTF records.  Elapsed time: 00:00:10s.  Time for last 100,000:    1s.  Last read position: chr7:102,070,339
INFO	2025-02-07 00:01:44	GTFParser	read     1,000,000 GTF records.  Elapsed time: 00:00:12s.  Time for last 100,000:    1s.  Last read position: chr8:119,244,295
INFO	2025-02-07 00:01:45	GTFParser	read     1,100,000 GTF records.  Elapsed time: 00:00:13s.  Time for last 100,000:    1s.  Last read position: chr9:135,753,937
INFO	2025-02-07 00:01:46	GTFParser	read     1,200,000 GTF records.  Elapsed time: 00:00:14s.  Time for last 100,000:    1s.  Last read position: chr10:132,834,041
INFO	2025-02-07 00:01:47	GTFParser	read     1,300,000 GTF records.  Elapsed time: 00:00:15s.  Time for last 100,000:    1s.  Last read position: chr11:101,130,689
INFO	2025-02-07 00:01:48	GTFParser	read     1,400,000 GTF records.  Elapsed time: 00:00:16s.  Time for last 100,000:    1s.  Last read position: chr12:71,861,405
INFO	2025-02-07 00:01:49	GTFParser	read     1,500,000 GTF records.  Elapsed time: 00:00:18s.  Time for last 100,000:    1s.  Last read position: chr14:24,214,492
INFO	2025-02-07 00:01:51	GTFParser	read     1,600,000 GTF records.  Elapsed time: 00:00:19s.  Time for last 100,000:    1s.  Last read position: chr15:61,921,947
INFO	2025-02-07 00:01:52	GTFParser	read     1,700,000 GTF records.  Elapsed time: 00:00:20s.  Time for last 100,000:    1s.  Last read position: chr16:66,738,773
INFO	2025-02-07 00:01:53	GTFParser	read     1,800,000 GTF records.  Elapsed time: 00:00:21s.  Time for last 100,000:    1s.  Last read position: chr17:47,733,236
INFO	2025-02-07 00:01:54	GTFParser	read     1,900,000 GTF records.  Elapsed time: 00:00:22s.  Time for last 100,000:    1s.  Last read position: chr19:4,621,194
INFO	2025-02-07 00:01:55	GTFParser	read     2,000,000 GTF records.  Elapsed time: 00:00:23s.  Time for last 100,000:    1s.  Last read position: chr19:55,694,726
INFO	2025-02-07 00:01:57	GTFParser	read     2,100,000 GTF records.  Elapsed time: 00:00:25s.  Time for last 100,000:    1s.  Last read position: chr22:25,231,604
INFO	2025-02-07 00:01:58	GTFParser	read     2,200,000 GTF records.  Elapsed time: 00:00:26s.  Time for last 100,000:    1s.  Last read position: chrX:132,069,478
[Fri Feb 07 00:01:58 EST 2025] org.broadinstitute.dropseqrna.metrics.TagReadWithGeneFunction done. Elapsed time: 1.32 minutes.
Runtime.totalMemory()=2149580800
Waiting at most 3 seconds for missing files.
MissingOutputException in line 269 of ~/.conda/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/mapping.smk:
Job Missing files after 3 seconds:
projects/test/processed_data/tutorial/illumina/complete_data/star.genome.Log.final.out
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 3 completed successfully, but some output files are missing. 3
  File "~/.conda/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 584, in handle_job_success
  File "~/.conda/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 252, in handle_job_success
Removing output files of failed job map_reads_STAR since they might be corrupted:
projects/test/processed_data/tutorial/illumina/complete_data/genome.STAR.bam
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2025-02-06T231041.757553.snakemake.log
CAUGHT A SPACEMAKE ERROR ERROR: SpacemakeError
an error occurred while snakemake() ran
ERROR: SpacemakeError
an error occurred while snakemake() ran

I am using Dropseq-3.0.0 with Java-21.0.2.
And config command for spacemeke is (sample downloaded from openST GEO repository):

spacemake config add_species --name human --reference genome --sequence ../OpenST/spacemake/human_genome/GRCh38.primary_assembly.genome.fa --annotation ../OpenST/spacemake/human_genome/gencode.v47.basic.annotation.gtf

spacemake projects add_sample --project-id test --sample-id tutorial --R1 ../SRR27331456_1.fastq.gz --R2 ../SRR27331456_2.fastq.gz --species human --puck openst --puck-barcode-file ../fc_1_coords/*.txt.gz --run-mode openst --barcode-flavor openst --map-strategy STAR:genome:final

spacemake run --cores 16 --keep-going

Please advise on how to overcome this issue..

Best,
Qijie

@MiracleEaTu
Copy link
Author

Update:
When I downgrade to spacemake v0.7.8, I don't have the map_reads_STAR error anymore...

@nukappa
Copy link
Member

nukappa commented Feb 10, 2025

Hi there, thanks for opening the issue and for the update. We'll investigate this.

@marvin-jens
Copy link
Member

Hi Qijie,

the error is triggered by STAR, complaining about a read with 0 bytes of sequence. (The crash of STAR then also crashes the splice_bam_header.py script, which produces more error messages).

EXITING because of FATAL ERROR in reads input: short read sequence line: 0
Read Name=@SRR27331456.1
Read Sequence=""

Can you confirm that indeed there is such a read in the FASTQ input data? What does the fastq record for SRR27331456.1 look like?

@marvin-jens
Copy link
Member

Is it this read?

@SRR27331456.1 A00643:620:HFM7YDSX5:1:1101:1316:1016 length=128
TTGCTCTGTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAA
+SRR27331456.1 A00643:620:HFM7YDSX5:1:1101:1316:1016 length=128

It is indeed very short and should not survive the adapter and polyA trimming. Should never end up being fed into STAR. I have a suspicion that the behavior of the DropSeqtools may have changed. We use 2.5.1, not 3.0.0. Could you perhaps give it a try with DropSeqTools 2.5.1 ?

When you say it runs with spacemake 2.7.8, have you tested 2.7.9 as well? Do you use the same version of DropSeqTools?

Best,
-Marvin

@MiracleEaTu
Copy link
Author

MiracleEaTu commented Feb 10, 2025

Is it this read?

@SRR27331456.1 A00643:620:HFM7YDSX5:1:1101:1316:1016 length=128
TTGCTCTGTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAA
+SRR27331456.1 A00643:620:HFM7YDSX5:1:1101:1316:1016 length=128

It is indeed very short and should not survive the adapter and polyA trimming. Should never end up being fed into STAR. I have a suspicion that the behavior of the DropSeqtools may have changed. We use 2.5.1, not 3.0.0. Could you perhaps give it a try with DropSeqTools 2.5.1 ?

When you say it runs with spacemake 2.7.8, have you tested 2.7.9 as well? Do you use the same version of DropSeqTools?

Best, -Marvin

Hi @marvin-jens ,

Thank you for the guidance!
I was using Dropseqtools 3.0.0, and that read was the one I tried, agree with the read length problem here.
I also got another error with our sequencing result with the same step:

error type 2:
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at: __n (which is 3) >= this->size() (which is 2)
~/.conda/envs/spacemake/bin/STAR: line 8: 2737561 Aborted                 (core dumped) "${cmd}" "$@"

It might be because of the Dropseqtools version. Now I am testing Dropseqtools 2.5.1 with spacemake 0.8.0 and Java 21.0.2
Will keep you posted!
BTW, is there any specific Java version that works better for Dropseqtools?
Thank you!!

Qijie

@MiracleEaTu
Copy link
Author

Hi @marvin-jens ,

I tried spacemake 0.7.8, 0.7.9 and 0.8.0 with Dropseqtools 2.5.1 with Java 11.0.15
version 0.8.0 cannot pass the rule map_reads_STAR with following error message:

[Wed Feb 12 09:54:12 EST 2025] Executing on Linux 4.18.0-477.74.1.el8_8.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.15-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.5.1(680c2ea_1642084299)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/project/conda_env/openst-spacemake-0210/bin/STAR: line 8: 3833077 Aborted                 (core dumped) "${cmd}" "$@"
Traceback (most recent call last):
  File "/project/conda_env/openst-spacemake-0210/lib/python3.10/site-packages/spacemake/snakemake/scripts/splice_bam_header.py", line 106, in <module>
    mbam = pysam.AlignmentFile(args.in_bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 956, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file does not contain alignment data

While 0.7.9 and 0.7.8 sometimes worked, sometimes cannot pass the same rule with following message:

Executing on Linux 4.18.0-477.74.1.el8_8.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.16+8; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.5.1(680c2ea_1642084299)

EXITING because of FATAL ERROR: waited too long for the other job to finish loading the genomeSuccess
SOLUTION: remove the shared memory chunk by running STAR with --genomeLoad Remove, and restart STAR
Feb 11 23:44:42 ...... FATAL ERROR, exiting
Traceback (most recent call last):
  File "project/conda_env/openst-spacemake_new2/lib/python3.10/site-packages/spacemake/snakemake/scripts/splice_bam_header.py", line 106, in <module>
    mbam = pysam.AlignmentFile(args.in_bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 956, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file does not contain alignment data

Do you have any suggestion on this? Or can you please share your environment with me (I also feel the anndata, pandas etc should be another version to run the script)?
Many thanks!

Qijie

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants