-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12
Comments
FYI, just discovered that another program that I run in parallel failed because I was running out of disk space. Maybe that can be the cause. Will keep you updated EDIT update: nope, the error is still there. thanks! If it helps:
|
Can confirm, got the same error using ONT data was similarly run on CentOS |
I found a workaround. The solution is to add about a hundred lines of just N's to the reference FASTA file and then marginPhase works. Note that already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa starts with such a block. |
Sorry for the trouble, what you're seeing looks like a mismatch between the
read coordinates (presumably coming from the bam) and the reference
sequences, as provided by the input fasta. This could happen if you did not
have the same reference assembly in the bam and reference fasta. Hope that
helps.
PS, you might want to switch off debug flags once you've got it running
fast and enable compile optimization. It will be a lot faster.
…On Fri, Nov 23, 2018 at 4:03 AM klucek ***@***.***> wrote:
I found a workaround. The solution is to add about a hundred lines of just
N's to the reference FASTA file and then marginPhase works. Note that
already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa
starts with such a block.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAkOIIRRnNMJjWi-ny3l04rWXIEewS15ks5ux-QUgaJpZM4Yns-m>
.
--
Benedict
(calendar invites: bpaten@ucsc.edu,
appointments: Tina Bernard <tibernar@ucsc.edu>)
|
Hi,
thanks for the reply, I was actually about to write you more about this
to add some background, the data that I have is Oxford Nanopore amplicon sequences of two highly variable genes that are linked to self-incompatibility in the plant Arabidopsis lyrata. I generated for each individual an assembly using CANU and identified for each gene 7 major haplotypes that differ in their length due to different deletions in the introns. I thus generated a fake reference genome using these major contigs and aligned my ONT data with bwa mem. Because each individual is expected to be heterozygous (similar to MHC), I was hoping to use marginPhase.
I have to note that the sequences in the BAM file still include the multiplex barcodes, which is not true for the reference. So even though I use the reference against which I have aligned my data, the reads in the BAM file are longer.
Another question - marginPhase seems to downsample reads to a coverage of 64 - given that I have amplicons, I basically have a read-depth of >6000x. Do you think it could change something allowing for a higher depth?
Thanks a lot
Best
Kay
… On 23 Nov 2018, at 17:24, Benedict Paten ***@***.***> wrote:
Sorry for the trouble, what you're seeing looks like a mismatch between the
read coordinates (presumably coming from the bam) and the reference
sequences, as provided by the input fasta. This could happen if you did not
have the same reference assembly in the bam and reference fasta. Hope that
helps.
PS, you might want to switch off debug flags once you've got it running
fast and enable compile optimization. It will be a lot faster.
On Fri, Nov 23, 2018 at 4:03 AM klucek ***@***.***> wrote:
> I found a workaround. The solution is to add about a hundred lines of just
> N's to the reference FASTA file and then marginPhase works. Note that
> already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa
> starts with such a block.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#12 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAkOIIRRnNMJjWi-ny3l04rWXIEewS15ks5ux-QUgaJpZM4Yns-m>
> .
>
--
Benedict
(calendar invites: ***@***.***,
appointments: Tina Bernard ***@***.***>)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUNm5cRM19o1AVft9799Bk_48e8Gg2qWks5uyCEugaJpZM4Yns-m>.
|
Hi @benedictpaten, thanks for the reply. well, for my original run, I really believe that the reference fed as assembly (input fasta) and as mapping reference for the reads (input bam) was exactly the same (since I only have one). I am then not sure where the problem comes from. The mapping was generated with minimap2 (mode long-reads to reference), but I will also try to feed the mapping by bwa-mem. I'll also see if @klucek's workaround (to add Ns to the contig sequence) turns out helpful in any way. Keeping you updated. Thanks, |
Hi Amina, Kay -
Bit confused about who is doing what (sorry to not read in detail), but two
points:
- If the assembly provided and the bam that the reads were mapped to is
different then all bets are off. Padding might be a solution to at least
avoid a seg fault (this is C code - no bounds checking), but if the
assembly and the ref differs in base composition, it will produce odd
results.
- For coverage beyond 64 depth, the thing to do is to generate the two
haplotypes from the vcf and then map the reads to the haplotypes, picking
for each read the haplotype that has the best score.
It is easy to do this within marginPhase too, and I'll add as a feature if
I can.
…On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.***> wrote:
Hi @benedictpaten <https://github.com/benedictpaten>, thanks for the
reply.
well, for my original run, I really believe that the reference fed as
assembly (input fasta) and as mapping reference for the reads (input bam)
was exactly the same (since I only have one).
I am then not sure where the problem comes from. The mapping was generated
with minimap2 (mode long-reads to reference), but I will also try to feed
the mapping by bwa-mem. I'll also see if @klucek
<https://github.com/klucek>'s workaround (to add Ns to the contig
sequence) turns out helpful in any way.
Keeping you updated.
Thanks,
Amina
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m>
.
--
Benedict
(calendar invites: bpaten@ucsc.edu,
appointments: Tina Bernard <tibernar@ucsc.edu>)
|
Hi Benedict,
I finally got my hands on this again and I guess I have some additional insights - the problem was indeed that the reads were longer than my “reference genome”. This is because I work with amplicons for a region that is not covered in the real reference genome. My fake reference is a consensus of individual assemblies obtained with CANU. The solution to this was hard clipping of the initial alignment followed by a realignment and marginPhase works perfectly.
Attached is an example - the two haplotypes differ among other by a 500bp deletion which marginPhase gets mostly resolved. A question though - because I have amplicons sequenced, I have depths of up to 1000x, yet marginPhase has a hard limit of 64 reads. Do you think increasing this would improve things in my case? To do so, I suppose I can just increase the number in the partitions.c file (line 18).
Thanks a lot
Best
Kay
… On 27 Nov 2018, at 22:10, Benedict Paten ***@***.*** ***@***.***>> wrote:
Hi Amina, Kay -
Bit confused about who is doing what (sorry to not read in detail), but two
points:
- If the assembly provided and the bam that the reads were mapped to is
different then all bets are off. Padding might be a solution to at least
avoid a seg fault (this is C code - no bounds checking), but if the
assembly and the ref differs in base composition, it will produce odd
results.
- For coverage beyond 64 depth, the thing to do is to generate the two
haplotypes from the vcf and then map the reads to the haplotypes, picking
for each read the haplotype that has the best score.
It is easy to do this within marginPhase too, and I'll add as a feature if
I can.
On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.*** ***@***.***>>
wrote:
> Hi @benedictpaten <https://github.com/benedictpaten <https://github.com/benedictpaten>>, thanks for the
> reply.
>
> well, for my original run, I really believe that the reference fed as
> assembly (input fasta) and as mapping reference for the reads (input bam)
> was exactly the same (since I only have one).
>
> I am then not sure where the problem comes from. The mapping was generated
> with minimap2 (mode long-reads to reference), but I will also try to feed
> the mapping by bwa-mem. I'll also see if @klucek
> <https://github.com/klucek <https://github.com/klucek>>'s workaround (to add Ns to the contig
> sequence) turns out helpful in any way.
>
> Keeping you updated.
>
> Thanks,
> Amina
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#12 (comment) <#12 (comment)>>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m <https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m>>
> .
>
--
Benedict
(calendar invites: ***@***.*** ***@***.***>,
appointments: Tina Bernard ***@***.*** ***@***.***>>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUNm5UfvfURaiD4BRwYqLlZzNTIC6MUvks5uzapdgaJpZM4Yns-m>.
|
Great. It is not trivial to increase the max allowable depth (because we
use 64 bit integers to model bipartitions), but the simple solution is to
realign each read > 64x coverage to each haplotype and then pick the
haplotype that fits best. Does that make sense?
…On Fri, Jan 11, 2019 at 12:23 AM klucek ***@***.***> wrote:
Hi Benedict,
I finally got my hands on this again and I guess I have some additional
insights - the problem was indeed that the reads were longer than my
“reference genome”. This is because I work with amplicons for a region that
is not covered in the real reference genome. My fake reference is a
consensus of individual assemblies obtained with CANU. The solution to this
was hard clipping of the initial alignment followed by a realignment and
marginPhase works perfectly.
Attached is an example - the two haplotypes differ among other by a 500bp
deletion which marginPhase gets mostly resolved. A question though -
because I have amplicons sequenced, I have depths of up to 1000x, yet
marginPhase has a hard limit of 64 reads. Do you think increasing this
would improve things in my case? To do so, I suppose I can just increase
the number in the partitions.c file (line 18).
Thanks a lot
Best
Kay
> On 27 Nov 2018, at 22:10, Benedict Paten ***@***.***
***@***.***>> wrote:
>
> Hi Amina, Kay -
>
> Bit confused about who is doing what (sorry to not read in detail), but
two
> points:
>
> - If the assembly provided and the bam that the reads were mapped to is
> different then all bets are off. Padding might be a solution to at least
> avoid a seg fault (this is C code - no bounds checking), but if the
> assembly and the ref differs in base composition, it will produce odd
> results.
>
> - For coverage beyond 64 depth, the thing to do is to generate the two
> haplotypes from the vcf and then map the reads to the haplotypes, picking
> for each read the haplotype that has the best score.
>
> It is easy to do this within marginPhase too, and I'll add as a feature
if
> I can.
>
> On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.***
***@***.***>>
> wrote:
>
> > Hi @benedictpaten <https://github.com/benedictpaten <
https://github.com/benedictpaten>>, thanks for the
> > reply.
> >
> > well, for my original run, I really believe that the reference fed as
> > assembly (input fasta) and as mapping reference for the reads (input
bam)
> > was exactly the same (since I only have one).
> >
> > I am then not sure where the problem comes from. The mapping was
generated
> > with minimap2 (mode long-reads to reference), but I will also try to
feed
> > the mapping by bwa-mem. I'll also see if @klucek
> > <https://github.com/klucek <https://github.com/klucek>>'s workaround
(to add Ns to the contig
> > sequence) turns out helpful in any way.
> >
> > Keeping you updated.
> >
> > Thanks,
> > Amina
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
#12 (comment)
<
#12 (comment)
>>,
> > or mute the thread
> > <
https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m
<
https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m
>>
> > .
> >
>
>
> --
> Benedict
> (calendar invites: ***@***.*** ***@***.***>,
> appointments: Tina Bernard ***@***.*** ***@***.***
>>)
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <
#12 (comment)>,
or mute the thread <
https://github.com/notifications/unsubscribe-auth/AUNm5UfvfURaiD4BRwYqLlZzNTIC6MUvks5uzapdgaJpZM4Yns-m
>.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAkOIG4micWAQBCwMpu8ctZyBcBanpbpks5vCEn7gaJpZM4Yns-m>
.
--
Benedict
(calendar invites: bpaten@ucsc.edu,
appointments: Tina Bernard <tibernar@ucsc.edu>)
|
Ive run into the same issue as described above using ONT data. |
Hi @benedictpaten
do you know what would cause error
Assertion seqLen >= rProbs->length + rProbs->refStart' failed.
?here is the execution log:
thanks
Amina
The text was updated successfully, but these errors were encountered: