Diablo G25 - is a tool for conversion of VCF file into 23AndMe format and calculation of admixture scores from it (implemented with admix).
Currently, the best way to install Diablo G25 is to git clone the repository and install the dependencies with yml file:
git clone https://github.com/TheSergeyPixel/Diablo_G25
conda env create -f /path/to/cloned/repo/diablo_g25.yml
You can also git clone the repository and manually install the required packages as follows:
git clone https://github.com/TheSergeyPixel/Diablo_G25
conda install pandas>=1.5.1
conda install numpy>=1.23.4
pip install admix
We are currently working on creating conda package.
Diablo G25 requires basic gzipped VCF file (for example, HaplotypeCaller + GenotypeGVCFs output) as input.
Important to note, that VCF has to be annotated with rsID in ID column (e.g with GATK VariantAnnotator).
The output is always generated as tsv file. Run main.py from downloaded repository as follows:
python main.py -i /path/to/vcf/file.vcf -o /desired/output/direcotry/output.tsv -m model_name
-i
and -o
arguments are always required.
23andMe style tsv file with all genotypes will be generated in the directory of output file.
For the -m
option, enter the name of any model provided by admix.
If -m
is not provided, K36 model will be used by default.
After you have obtained the scores for the model you chose, you can convert them into G25 scores via visiting Allelocator calculator and performing following steps:
- Paste your result from the Diablo G25 output into Calculator results field
- Choose the model you used from Linear regression matrix field
- Simulated G25 coordinates field will auto generate your simulated G25 scores
- You can proceed to Vahaduo admixture calculator to estimate admixture proportions and calculate Euclidean distances.
When you open Vahaduo admixture calculator, you would need to paste your G25 coordinates into target field,
G25 populations (can be downloaded from another Vahaduo tool) into source
field followed by choice of desired options and running the tool at single tab if you have one
sample (line) in your target field or multi if you have multiple samples.
We have also provided a video to visualize aforementioned steps: