Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

help interpreting output #7

Open
RichardCorbett opened this issue Jun 30, 2017 · 7 comments
Open

help interpreting output #7

RichardCorbett opened this issue Jun 30, 2017 · 7 comments

Comments

@RichardCorbett
Copy link

Hi Kevin,
I was looking at some GIAB data this morning and found the link to your tool. I gave it a whirl with this command:

vgraph repmatch --include-regions GIAB/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed --reference /home/pubseq/genomes/Homo_sapiens/GRCh37/1000genomes/bwa_ind/genome/GRCh37-lite.fa GIAB/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz gsc/GSC.vcf.gz > out.txt

in which the output file contained these match lines:

107     MATCH== TYPE=H
 2     MATCH=. TYPE=N

3429981 MATCH== TYPE=T
176428 MATCH=X TYPE=H

I think I can guess what the bottom two lines represent, but I was wondering if you could explain all 4 lines? If there is a better way to quantify a match I'd be happy to know that as well.

thanks,
Richard

@bioinformed
Copy link
Owner

Hi Richard,

Thanks for asking.

  • Type=T represents a trivial match, where the two superloci are identical in terms of genomic coordinates, alleles and genotypes. i.e. no need to invoke the full power of the haplotype matcher.
  • Type=H is where the haplotype matcher is needed.
  • Match="=" are superloci that match
  • Match="X" are superloci that don't match.
  • Match="N" are nocalls, typically due to out of spec VCF records that overlap, as are occasionally generated by GATK.

@RichardCorbett
Copy link
Author

Perfect. Many thanks.

@RichardCorbett
Copy link
Author

One more question - How would you recommend counting the variants uniquely called in my set or in the GIAB set?

@bioinformed
Copy link
Owner

I have been working on a wrapper around vgraph that does much more detailed accounting. I'll see if I can share it, as it was developed as part of my day job.

@RichardCorbett
Copy link
Author

Thanks. Any word on permission to share your code?

@bioinformed
Copy link
Owner

I've asked and am waiting for an answer. I expect to hear back by the end of next week.

@RichardCorbett
Copy link
Author

Many thanks. I'm not up against a deadline or anything I just wanted to try it out.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants