-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
bcftools mpileup handling of corrupted/truncated data/streaming errors #2177
Comments
Re your note (1), this behaviour comes from the days when many tools producing BAM files did not write EOF markers, so treating its absence as a hard error would have been counter-productive. Perhaps that behaviour could be revisited now — especially for CRAM files, which probably defined an EOF marker block right from the beginning. It would be interesting to add testing of The good news is that propagating |
@jmarshall You are quite right, samtools mpileup truncated_corrupted.cram
[mpileup] 1 samples in 1 input files
samtools mpileup: error reading from input file
returns 1
samtools mpileup truncated_corrupted.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[mpileup] 1 samples in 1 input files
[E::bgzf_read_block] Failed to read BGZF block data at offset 289804 expected 9098 bytes; hread returned 8902
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools mpileup: error reading from input file
returns 1 glad to hear it is an easy fix! |
Thank you for the report, this is now fixed in 57b9072, the command |
bcftools mpileup
doesn't seem to check if there were errors while reading read data in. This means thatbcftools mpileup
can appear to have run successfully, but there may have been an underlying data reading problem. As an example, I created 3 bams and crams as follows:good.{bam,cram}
: these are a small valid bam/cram.no_eof.{bam,cram}
: these aregood.{bam,cram}
but with the eof markers removed from the end of the file.truncated_corrupted.{bam,cram}
: these aregood.{bam,cram}
but with a random set of bytes (including the eof bytes) removed from the end of the file.I then ran these files through both
bcftools mpileup
andsamtools view
, using the following script.this results in the following output:
A few things to note.
samtools view
andbcftools mpileup
will print a warning, but return a code of 0, indicating success. This feels a little counterintuitive to me (I would expect an error return code here), but certainly I see the argument for this behavior.My main concern about this is that this mimics the behavior we would see if a totally valid cram or bam was being streamed from a google bucket, and the google endpoint decided to return a 429 (or 404, or some other error code) at some point in the middle of streaming the data. I think that in this scenario bcftools would output a valid pileup file, with no indication that anything was amiss.
The text was updated successfully, but these errors were encountered: