Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to get the vg file could as input file for sequenceTubeMap #388

Open
ld9866 opened this issue Jan 24, 2024 · 2 comments
Open

How to get the vg file could as input file for sequenceTubeMap #388

ld9866 opened this issue Jan 24, 2024 · 2 comments

Comments

@ld9866
Copy link

ld9866 commented Jan 24, 2024

Dear developer:
We built the graphical pan-genome using Minigraph-cactus, and then we wanted to convert the format through vg to enable subsequent visual analysis using SequenceTubeMaps.
Here, we used the "vg construct -r reference.fa -v chr2.vcf.gz (from minigraph cactus) > chr2.vg get the vg file.
But when we want to use the sequenceTubeMap/script/prepare_vg.sh we will encounter warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bd73c23e6ebc2deb2c1d5e5be95374f2b2ce5367 at chr2:9 missing/empty! Was the variant skipped during construction?

We did not know how to solve the problem and we also used the example in vg folder, we met the error in the step "vg index x.vg -v x.vcf.gz -x x.vg.xg --gbwt-name x.gbwt" which are same with the Minigraph-cactus result.
Error:
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bd73c23e6ebc2deb2c1d5e5be95374f2b2ce5367 at x:9 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5b70e6015d6cc1755fe821abb642a7ac72055833 at x:10 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 63b24cce9605adfadaa2d1168646b3ac722d2833 at x:14 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 58140fbc706f4076680095c8dcf6cbe1e5d80509 at x:34 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 8c38474f679b54ca250ffd521a9844eadb09642e at x:39 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 55df8b817a480aa353fbcccdd4207cc45483bc63 at x:52 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 15cff0c46efeb562a3125ed4186e5249820cf62b at x:58 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for dabba4a16c6a43642e90ae10769b046a9c4aa4eb at x:100 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for a94cd8cc4e3b353aaf29161418962f96c40399de at x:103 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5899689e351a42b8d9fff229b1e71ed199296de3 at x:122 missing/empty! Was the variant skipped during construction?
warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings
warning: [HaplotypeIndexer::parse_vcf] Found 75/0 variants in phasing VCF but not in graph! Do your graph and VCF match?

@ld9866
Copy link
Author

ld9866 commented Jan 24, 2024

Besides, we also want to find the different methods we get the vg file, because there is an example file in the exampleData, when we use the provided x.vg we will find all things are OK. but if we want to use the code "vg construct -a -r x.fa x.vcf.gz > x.vg" we will find the File size is different (801 vs 6506) which l think may be the problem that causes not Found variants.
Could you help us?
Thank you!

@adamnovak
Copy link
Member

I think your vg construct -r reference.fa -v chr2.vcf.gz > chr2.vg command should have a -a option if you want to use haplotypes from the samples in the VCF file:

vg construct -r reference.fa -v chr2.vcf.gz -a > chr2.vg

If you don't use -a when making the graph, it doesn't embed the information in the graph that lets vg index look up variants in it later to make the haplotypes for the .gbwt file.

If you don't want to use haplotypes from the VCF samples, try using a different filename for the vg file so the prepare_vg.sh script doesn't try to read the VCF:

vg construct -r reference.fa -v chr2.vcf.gz > chr2-graph.vg

Then you won't get a .gbwt file.

I don't think the different file size you get if you try to regenerate x.vg is important; the x.vg file we include in exampleData was generated a long time ago and might not be in exactly the same format as vg construct will output if you run it today.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants