genotypeCollector

In this code, we do the following things:

1_ We download the .fastq files from the european nucleotide archive (ENA) for each isolate.

2_ We find the SNPs for each isolate by using the bwa-mem and then samtools and GATK.

3_ We create a binary table. Each row represents an isolate and each column represents the SNP.

dataCollector.py

In this code by using the isolate id in the "FILE_NAME".txt file we will download its .fastq files from the ENA database.

SNP.py

In this code by using the isolate id in the "FILE_NAME".txt file and its .fastq files we will find its SNPs using bwa-mem and samtools and GATK. Then we get the common SNPs of both approaches (samtools and GATK) and store them in ("Final_" + id + ".vcf").

If the code fails to find SNPs of special isolate, the id will be written in the "missIdForSNP.txt"

By calling preprocessing() function you will get the ( "Final_" + id + "_table.csv" ) file which contatin "pos, ref, alt" for each SNP of that isolate.

tableCreatorGit.py

In this code, we create a binary table. Each row represents an isolate and each column represents the SNP.

sparseMatrix.py

Will store the table of tableCreatorGit.py in the sparse format to reduce size.

Citation

If you found the content of this repository useful, please cite us:

https://dl.acm.org/doi/abs/10.1145/3459930.3469534

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
bcftools		bcftools
bwa		bwa
htslib		htslib
samtools		samtools
vcftools		vcftools
README.md		README.md
SNP.py		SNP.py
dataCollector.py		dataCollector.py
finalMissIds.txt		finalMissIds.txt
sparseMatrix.py		sparseMatrix.py
tableCreatorGit.py		tableCreatorGit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

genotypeCollector

dataCollector.py

SNP.py

tableCreatorGit.py

sparseMatrix.py

Citation

About

Releases

Packages

Languages

AmirHoseinSafari/Genotype-collector-and-SNP-dataset-creator

Folders and files

Latest commit

History

Repository files navigation

genotypeCollector

dataCollector.py

SNP.py

tableCreatorGit.py

sparseMatrix.py

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages