-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[solved] empty output after count_snps() with BD Rhapsody bam files #32
Comments
Hi @leqi0001
As a next step, here is default parse_read that you use with custom UMI. Return None means this read will be ignored. It is possible that some of filters (nhits tag maybe?) don't work for your data, in this case you should get rid of that check. def parse_read(read: AlignedRead, umi_tag="UB", nhits_tag="NH", score_tag="AS",
score_diff_max = 8, mapq_threshold = 20,
# max. 2 edits --^
p_misaligned_default = 0.01) -> Optional[Tuple[float, int]]:
"""
returns None if read should be ignored.
Read still can be ignored if it is not in the barcode list
"""
if read.get_tag(score_tag) <= len(read.seq) - score_diff_max:
# too many edits
return None
if read.get_tag(nhits_tag) > 1:
# multi-mapped
return None
if not read.has_tag(umi_tag):
# does not have molecule barcode
return None
if read.mapq < mapq_threshold:
# this one should not be triggered because of NH, but just in case
return None
p_misaligned = p_misaligned_default # default value
ub = hash_string(read.get_tag(umi_tag))
return p_misaligned, ub |
agree, that's weird. How about iterating over the BAM file and checking that we can read 'CB' tag? import pysam
handler = BarcodeHandler.from_file(barcode_filename, tag="CB")
samfile = pysam.AlignmentFile("path-to-your.bam", "rb")
for i, read in enumerate(samfile.fetch('chr1', 100, 120)): # select some good region
if i > 1000: break
print(i, handler.get_barcode_index(read)) |
Can confirm that if I force the barcodes to be read as strings, it works without any errors. |
Glad you found the reason and solution, good job! Didn't cross my mind that barcodes could be read as integer type. If you have a |
Thank you so much for your help! Let me make sure the whole pipeline works before I post the code. |
I just created a pull request with small modifications. It's the first time I made a pull request. Hopefully I didn't mess up anything. |
Suggested solution for rhapsody is to use parse_read from PR #33 that was just merged. I've renamed the issue and let's keep it open so that others could find this. |
Hi,
I'm having trouble finding SNPs overlapping with reads using BAM files generated with BD Rhapsody pipeline. I specified UMI tag (MA) manually and cell barcode is the same (CB). However, the count_snps() step found 0 snps anywhere. It's not due to chromosomal naming or anything like that. The biggest difference is that these bam files have integers as cell barcodes instead of bases. Your suggestions are appreciated!
Le
My code is like this:
The text was updated successfully, but these errors were encountered: