marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12

aechchiki · 2018-11-17T23:58:09Z

do you know what would cause error Assertion seqLen >= rProbs->length + rProbs->refStart' failed. ?

here is the execution log:

aechchik@dee-serv07:/scratch/beegfs/monthly/aechchik/amphioxus/mp$ ../../build/marginPhase /scratch/beegfs/monthly/aechchik/amphioxus/mp/reads_to_haploADref.bam  /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa marginPhase/params/params.pacbio.json
Set log level to INFO
> Parsing model parameters from file: marginPhase/params/params.pacbio.json
> Parsing input reads from file: /scratch/beegfs/monthly/aechchik/amphioxus/mp/reads_to_haploADref.bam
        Created 9724438 profile sequences
> Parsing prior probabilities on positions from reference sequences: /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa
marginPhase: /scratch/beegfs/monthly/aechchik/amphioxus/mp/marginPhase/impl/referencePriorProbs.c:171: createReferencePriorProbabilities: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed.
Aborted

thanks
Amina

The text was updated successfully, but these errors were encountered:

aechchiki · 2018-11-18T00:20:47Z

FYI, just discovered that another program that I run in parallel failed because I was running out of disk space. Maybe that can be the cause.

Will keep you updated

EDIT update: nope, the error is still there. thanks!

If it helps:

$ ./marginPhase/build/marginPhase -v
usage: marginPhase <BAM> <REFERENCE_FASTA> <PARAMS> [options]
Version: 1.0.0

$ uname -a
Linux dee-serv07.vital-it.ch 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

klucek · 2018-11-20T12:45:25Z

Can confirm, got the same error using ONT data
./marginPhase/build/marginPhase /scratch/test/sizealleles/bams/Lib2_BC06.bam /scratch/test/sizealleles/ref/sizealleles.fasta ./marginPhase/params/params.nanopore.json
Set log level to INFO
> Parsing model parameters from file: ./marginPhase/params/params.nanopore.json
> Parsing input reads from file: /scratch/test/sizealleles/bams/Lib2_BC06.bam Created 36512 profile sequences
> Parsing prior probabilities on positions from reference sequences: /scratch/test/sizealleles/ref/sizealleles.fasta
marginPhase: ./marginPhase/impl/referencePriorProbs.c:171: createReferencePriorProbabilities: Assertion seqLen >= rProbs->length + rProbs->refStart' failed.``
Aborted

was similarly run on CentOS

klucek · 2018-11-23T12:03:32Z

I found a workaround. The solution is to add about a hundred lines of just N's to the reference FASTA file and then marginPhase works. Note that already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa starts with such a block.

benedictpaten · 2018-11-23T16:24:14Z

Sorry for the trouble, what you're seeing looks like a mismatch between the read coordinates (presumably coming from the bam) and the reference sequences, as provided by the input fasta. This could happen if you did not have the same reference assembly in the bam and reference fasta. Hope that helps. PS, you might want to switch off debug flags once you've got it running fast and enable compile optimization. It will be a lot faster.

…

On Fri, Nov 23, 2018 at 4:03 AM klucek ***@***.***> wrote: I found a workaround. The solution is to add about a hundred lines of just N's to the reference FASTA file and then marginPhase works. Note that already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa starts with such a block. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAkOIIRRnNMJjWi-ny3l04rWXIEewS15ks5ux-QUgaJpZM4Yns-m> .

-- Benedict (calendar invites: bpaten@ucsc.edu, appointments: Tina Bernard <tibernar@ucsc.edu>)

klucek · 2018-11-23T16:29:24Z

Hi, thanks for the reply, I was actually about to write you more about this to add some background, the data that I have is Oxford Nanopore amplicon sequences of two highly variable genes that are linked to self-incompatibility in the plant Arabidopsis lyrata. I generated for each individual an assembly using CANU and identified for each gene 7 major haplotypes that differ in their length due to different deletions in the introns. I thus generated a fake reference genome using these major contigs and aligned my ONT data with bwa mem. Because each individual is expected to be heterozygous (similar to MHC), I was hoping to use marginPhase. I have to note that the sequences in the BAM file still include the multiplex barcodes, which is not true for the reference. So even though I use the reference against which I have aligned my data, the reads in the BAM file are longer. Another question - marginPhase seems to downsample reads to a coverage of 64 - given that I have amplicons, I basically have a read-depth of >6000x. Do you think it could change something allowing for a higher depth? Thanks a lot Best Kay

…

On 23 Nov 2018, at 17:24, Benedict Paten ***@***.***> wrote: Sorry for the trouble, what you're seeing looks like a mismatch between the read coordinates (presumably coming from the bam) and the reference sequences, as provided by the input fasta. This could happen if you did not have the same reference assembly in the bam and reference fasta. Hope that helps. PS, you might want to switch off debug flags once you've got it running fast and enable compile optimization. It will be a lot faster. On Fri, Nov 23, 2018 at 4:03 AM klucek ***@***.***> wrote: > I found a workaround. The solution is to add about a hundred lines of just > N's to the reference FASTA file and then marginPhase works. Note that > already the example reference file of marginPhase, i.e. hg19.chr3.9mb.fa > starts with such a block. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#12 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAkOIIRRnNMJjWi-ny3l04rWXIEewS15ks5ux-QUgaJpZM4Yns-m> > . > -- Benedict (calendar invites: ***@***.***, appointments: Tina Bernard ***@***.***>) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUNm5cRM19o1AVft9799Bk_48e8Gg2qWks5uyCEugaJpZM4Yns-m>.

aechchiki · 2018-11-26T10:35:17Z

Hi @benedictpaten, thanks for the reply.

well, for my original run, I really believe that the reference fed as assembly (input fasta) and as mapping reference for the reads (input bam) was exactly the same (since I only have one).

I am then not sure where the problem comes from. The mapping was generated with minimap2 (mode long-reads to reference), but I will also try to feed the mapping by bwa-mem. I'll also see if @klucek's workaround (to add Ns to the contig sequence) turns out helpful in any way.

Keeping you updated.

Thanks,
Amina

benedictpaten · 2018-11-27T21:10:52Z

Hi Amina, Kay - Bit confused about who is doing what (sorry to not read in detail), but two points: - If the assembly provided and the bam that the reads were mapped to is different then all bets are off. Padding might be a solution to at least avoid a seg fault (this is C code - no bounds checking), but if the assembly and the ref differs in base composition, it will produce odd results. - For coverage beyond 64 depth, the thing to do is to generate the two haplotypes from the vcf and then map the reads to the haplotypes, picking for each read the haplotype that has the best score. It is easy to do this within marginPhase too, and I'll add as a feature if I can.

…

On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.***> wrote: Hi @benedictpaten <https://github.com/benedictpaten>, thanks for the reply. well, for my original run, I really believe that the reference fed as assembly (input fasta) and as mapping reference for the reads (input bam) was exactly the same (since I only have one). I am then not sure where the problem comes from. The mapping was generated with minimap2 (mode long-reads to reference), but I will also try to feed the mapping by bwa-mem. I'll also see if @klucek <https://github.com/klucek>'s workaround (to add Ns to the contig sequence) turns out helpful in any way. Keeping you updated. Thanks, Amina — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m> .

-- Benedict (calendar invites: bpaten@ucsc.edu, appointments: Tina Bernard <tibernar@ucsc.edu>)

klucek · 2019-01-11T08:23:22Z

Hi Benedict, I finally got my hands on this again and I guess I have some additional insights - the problem was indeed that the reads were longer than my “reference genome”. This is because I work with amplicons for a region that is not covered in the real reference genome. My fake reference is a consensus of individual assemblies obtained with CANU. The solution to this was hard clipping of the initial alignment followed by a realignment and marginPhase works perfectly. Attached is an example - the two haplotypes differ among other by a 500bp deletion which marginPhase gets mostly resolved. A question though - because I have amplicons sequenced, I have depths of up to 1000x, yet marginPhase has a hard limit of 64 reads. Do you think increasing this would improve things in my case? To do so, I suppose I can just increase the number in the partitions.c file (line 18). Thanks a lot Best Kay

…

On 27 Nov 2018, at 22:10, Benedict Paten ***@***.*** ***@***.***>> wrote: Hi Amina, Kay - Bit confused about who is doing what (sorry to not read in detail), but two points: - If the assembly provided and the bam that the reads were mapped to is different then all bets are off. Padding might be a solution to at least avoid a seg fault (this is C code - no bounds checking), but if the assembly and the ref differs in base composition, it will produce odd results. - For coverage beyond 64 depth, the thing to do is to generate the two haplotypes from the vcf and then map the reads to the haplotypes, picking for each read the haplotype that has the best score. It is easy to do this within marginPhase too, and I'll add as a feature if I can. On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.*** ***@***.***>> wrote: > Hi @benedictpaten <https://github.com/benedictpaten <https://github.com/benedictpaten>>, thanks for the > reply. > > well, for my original run, I really believe that the reference fed as > assembly (input fasta) and as mapping reference for the reads (input bam) > was exactly the same (since I only have one). > > I am then not sure where the problem comes from. The mapping was generated > with minimap2 (mode long-reads to reference), but I will also try to feed > the mapping by bwa-mem. I'll also see if @klucek > <https://github.com/klucek <https://github.com/klucek>>'s workaround (to add Ns to the contig > sequence) turns out helpful in any way. > > Keeping you updated. > > Thanks, > Amina > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#12 (comment) <#12 (comment)>>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m <https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m>> > . > -- Benedict (calendar invites: ***@***.*** ***@***.***>, appointments: Tina Bernard ***@***.*** ***@***.***>>) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUNm5UfvfURaiD4BRwYqLlZzNTIC6MUvks5uzapdgaJpZM4Yns-m>.

benedictpaten · 2019-01-11T17:25:42Z

Great. It is not trivial to increase the max allowable depth (because we use 64 bit integers to model bipartitions), but the simple solution is to realign each read > 64x coverage to each haplotype and then pick the haplotype that fits best. Does that make sense?

…

On Fri, Jan 11, 2019 at 12:23 AM klucek ***@***.***> wrote: Hi Benedict, I finally got my hands on this again and I guess I have some additional insights - the problem was indeed that the reads were longer than my “reference genome”. This is because I work with amplicons for a region that is not covered in the real reference genome. My fake reference is a consensus of individual assemblies obtained with CANU. The solution to this was hard clipping of the initial alignment followed by a realignment and marginPhase works perfectly. Attached is an example - the two haplotypes differ among other by a 500bp deletion which marginPhase gets mostly resolved. A question though - because I have amplicons sequenced, I have depths of up to 1000x, yet marginPhase has a hard limit of 64 reads. Do you think increasing this would improve things in my case? To do so, I suppose I can just increase the number in the partitions.c file (line 18). Thanks a lot Best Kay > On 27 Nov 2018, at 22:10, Benedict Paten ***@***.*** ***@***.***>> wrote: > > Hi Amina, Kay - > > Bit confused about who is doing what (sorry to not read in detail), but two > points: > > - If the assembly provided and the bam that the reads were mapped to is > different then all bets are off. Padding might be a solution to at least > avoid a seg fault (this is C code - no bounds checking), but if the > assembly and the ref differs in base composition, it will produce odd > results. > > - For coverage beyond 64 depth, the thing to do is to generate the two > haplotypes from the vcf and then map the reads to the haplotypes, picking > for each read the haplotype that has the best score. > > It is easy to do this within marginPhase too, and I'll add as a feature if > I can. > > On Mon, Nov 26, 2018 at 2:35 AM Amina Echchiki ***@***.*** ***@***.***>> > wrote: > > > Hi @benedictpaten <https://github.com/benedictpaten < https://github.com/benedictpaten>>, thanks for the > > reply. > > > > well, for my original run, I really believe that the reference fed as > > assembly (input fasta) and as mapping reference for the reads (input bam) > > was exactly the same (since I only have one). > > > > I am then not sure where the problem comes from. The mapping was generated > > with minimap2 (mode long-reads to reference), but I will also try to feed > > the mapping by bwa-mem. I'll also see if @klucek > > <https://github.com/klucek <https://github.com/klucek>>'s workaround (to add Ns to the contig > > sequence) turns out helpful in any way. > > > > Keeping you updated. > > > > Thanks, > > Amina > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > < #12 (comment) < #12 (comment) >>, > > or mute the thread > > < https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m < https://github.com/notifications/unsubscribe-auth/AAkOIBlIP0nykCdwwP93n-06YLPE0OB8ks5uy8PlgaJpZM4Yns-m >> > > . > > > > > -- > Benedict > (calendar invites: ***@***.*** ***@***.***>, > appointments: Tina Bernard ***@***.*** ***@***.*** >>) > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub < #12 (comment)>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AUNm5UfvfURaiD4BRwYqLlZzNTIC6MUvks5uzapdgaJpZM4Yns-m >. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAkOIG4micWAQBCwMpu8ctZyBcBanpbpks5vCEn7gaJpZM4Yns-m> .

-- Benedict (calendar invites: bpaten@ucsc.edu, appointments: Tina Bernard <tibernar@ucsc.edu>)

da-i · 2021-12-14T13:05:54Z

Ive run into the same issue as described above using ONT data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12

marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12

aechchiki commented Nov 17, 2018 •

edited

Loading

aechchiki commented Nov 18, 2018 •

edited

Loading

klucek commented Nov 20, 2018

klucek commented Nov 23, 2018

benedictpaten commented Nov 23, 2018 via email

klucek commented Nov 23, 2018 via email

aechchiki commented Nov 26, 2018

benedictpaten commented Nov 27, 2018 via email

klucek commented Jan 11, 2019 via email

benedictpaten commented Jan 11, 2019 via email

da-i commented Dec 14, 2021

marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12

marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed. #12

Comments

aechchiki commented Nov 17, 2018 • edited Loading

aechchiki commented Nov 18, 2018 • edited Loading

klucek commented Nov 20, 2018

klucek commented Nov 23, 2018

benedictpaten commented Nov 23, 2018 via email

klucek commented Nov 23, 2018 via email

aechchiki commented Nov 26, 2018

benedictpaten commented Nov 27, 2018 via email

klucek commented Jan 11, 2019 via email

benedictpaten commented Jan 11, 2019 via email

da-i commented Dec 14, 2021

aechchiki commented Nov 17, 2018 •

edited

Loading

aechchiki commented Nov 18, 2018 •

edited

Loading