Skip to content

Latest commit

 

History

History
81 lines (60 loc) · 3.6 KB

README.md

File metadata and controls

81 lines (60 loc) · 3.6 KB

vcf_to_tsv 🧬

Transforms a VCF (variant call format) file to a tab-separated values (.tsv) one.

Its compilation and functionality have been verified on the following operating system:

  • macOS 🍏
  • Linux 🐧

Download and Compilation 💾

>>> git https://github.com/alexcoppe/vcf_to_tsv
>>> cd vcf_to_tsv
>>> make

After compilation, move the generated executable vcf_to_tsv to a directory listed in the $PATH variable. You can identify these directories by using the echo $PATH command.

Run the software 🏃‍♂️

This software transforms an uncompressed VCF file to a tab-separated values (tsv) file. It also works with VCFs generated by SnpEff and ANNOVAR.

To run it, you need two arguments: the VCF file and a text file specifying the desired fields. Refer to the table below for guidance on creating this file.

When utilizing a SnpEff annotated VCF, the tool currently displays each transcript indicated by SnpEff in separate rows.

Starting character What you get
None get the fields from the VCF
: get a subfield from the INFO field added by SnpEff
; get a specific subfiled from the IMFO field
| get a specific subfield from the Genotype fields

Example of a text file specifying the desired fields and subfields:

:hgvs_c
position
;gnomAD_genome_AMR
|AD

Launching the program with the above text file

vcf_to_tsv a_vcf_file_path.vcf wanted_fields.txt

Output:

n.-3702C>T      157370625       0.0020  14,1    31,5
n.*1931C>T      157370625       0.0020  14,1    31,5
n.-3707C>T      157370630       0       15,1    33,4
...

Currently, the software operates exclusively on 1 or 2 genotype fields.

The table below displays all the sub-fields added by SnpEff along with the corresponding sub-field names used in vcf_to_table (listed in the first column).

Subfield by vcf_to_table Subfield by SnpEff Explanation
:allele Allele (or ALT) The alternative allele
:annotation Annotation (a.k.a. effect) Annotated using Sequence Ontology terms
:putative_impact Putative_impact A simple estimation of putative impact / deleteriousness : {HIGH, MODERATE, LOW, MODIFIER}
:gene_name Gene Name Common gene name (HGNC)
:gene_id Gene ID Gene ID
:feature_type Feature type Which type of feature is in the next field
:feature_id Feature ID Depends on the annotation
:transcript_biotype Transcript biotype The bare minimum is at least a description on whether the transcript is {"Coding", "Noncoding"}. Whenever possible, use ENSEMBL biotypes
:rank Rank / total Exon or Intron rank / total number of exons or introns
:hgvs_c HGVS.c Variant using HGVS notation (DNA level)
:hgvs_p HGVS.p If variant is coding, this field describes the variant using HGVS notation (Protein level)
:cdna_position cDNA_position / cDNA_len Position in cDNA and trancript's cDNA length (one based)
:cds_position CDS_position / CDS_len Position and number of coding bases (one based includes START and STOP codons)
:protein_position Protein_position / Protein_len Position and number of AA (one based, including START, but not STOP)
:distance_to_feature Distance to feature All items in this field are options see SnpEff page for details
:errors Errors, Warnings or Information messages Errors, warnings or informative message that can affect annotation accuracy