Skip to content

2. Assessing the Fitness Cost of Genes For Growth on Nutrient Rich Media

rsalamza edited this page Sep 14, 2020 · 1 revision

We used a modified variant of the d-value (dVal) metric (Valentino et al. 2014), or ratio of observed to expected Tn insertions, in combination with two alternate criteria to confidently classify the fitness costs of genes into three distinct categories for for growth in a nutrient rich environment. Genes were regarded as potentially fitness costly if they had a dVal < 0.1 and non-essential otherwise. Through additional support from either a statistical analysis or homology to known genes of importance to fitness in other gram positive pathogens, we further categorized potentially fitness costly genes as either critical to cell fitness (dVal < 0.01) or merely important to cell fitness (dVal < 0.1) in nutrient rich media.

Because replication bias in the E. faecalis MMH594 Tn mutant library affected not only the abundance of mutants along the chromosome, but also the rate of saturation, we used a permutation based approach to rigorously assess whether genes were significantly depleted in transposon insertions relative to their local surrounding context.

Further, genes regarded as potentially fitness costly, but lacking statistical support, were also examined for orthology to fitness important genes from S. pneumoniae and S. aureus, two gram-positive Bacilli pathogens which had their genomes similarly screened on nutrient rich media using Tn-Seq. If such orthology was detected for potentially fitness critical or potentially fitness important genes (dVal<0.1), they were promoted in confidence.

Calculating the Modified dVal Metric for All Genes

The dVal metric was previously developed to assess the fitness of genes in Staphylococcus aureus based on Tn-Seq experiments conducted using a mutant library with similar saturation rates to ours (Valentino et al. 2014). We modified the metric to:

  1. Only consider insertions within the middle 80% regions of genes, because insertions landing in the flanking sections of genes might not have considerable impacts on their function
  2. Ignore insertions within intergenic regions to transform the primary modality of the D-value distribution to be focal around 1 and be less sparsely distributed
  3. Change the calculation of the expected counts per gene to be based on the number of viable insertion (TA dinucleotide) sites they harbored as opposed to their length.
usage: ModifiedDvalCalculator.py [-h] -i WIG_FILE -o OUTPUT
                                 [-m MASKED_TA_SITES] -f FASTA -g GFF [-e]
                                 [-t TRIM] [-u UPSTREAM]

Program to compute the modified dVal for each gene, a simple metric which
simultaneously normalizes for sequencing depth and TA count (instead of gene
length) and quantifies fitness.

optional arguments:
  -h, --help            show this help message and exit
  -i WIG_FILE, --wig_file WIG_FILE
                        Location of wig file with insertion information.
  -o OUTPUT, --output OUTPUT
                        Location and prefix of output file with Dval stats
                        from aggregated counts.
  -m MASKED_TA_SITES, --masked_TA_sites MASKED_TA_SITES
                        File with coordinates of masked TA sites.
  -f FASTA, --fasta FASTA
                        Fasta file of genome assembly corresponding to GFF
  -g GFF, --gff GFF     GFF file from Vesper/Calhoun. Gene ID needs to be in
                        aliases identified by the key "ID".
  -e, --exon_only       Calculate Dval accounting for only insertions which
                        lie on genes.
  -t TRIM, --trim TRIM  Middle proportion of genes to consider. For instance,
                        0.8 specifies that the first and last 10 percent of
                        bases in the gene will be ignored.
  -u UPSTREAM, --upstream UPSTREAM
                        Include specified number of bases upstream as part of
                        gene to test if including promotor regions maintains
                        essentiallity. Should generally not be used in
                        combination with trim. Default is 0.

A Simulation-Based Permutation Test to Statistically Assess Tn Depletion

In order to account for replication bias, we developed a simulation based permutation test. Using a gene’s local saturation rate, the binomial distribution was used to simulate the number of saturated sites among n instances, where n was the number of TA sites observed in the middle 80% of a gene. Sites simulated as saturated were randomly assigned insertion counts from the local distribution of insertion counts observed. The total sum of insertions observed within the mid-region of each gene was compared to the sum of insertions from 100,000 simulations to calculate empirical one-sided p-values, which were adjusted to account for multiple testing using the Benjamini-Hochberg procedure. The approach was inspired by the permutation test developed by Zhang et al 2012 for assessing fitness of Mycobacterium tuberculosis genes.

usage: GenicPerMutant.py [-h] -i WIG_FILE -a GFF -o OUTPUT -d GENE -w WINDOW
                         -g GENOME_LENGTH [-p PERMUTATION]

Program to simulate localized permutation of Tn mutant abundances in gene 
relative to local surroundings to estimate probability that the gene is 
depleted in mutants to expectations.

optional arguments:
  -h, --help            show this help message and exit
  -i WIG_FILE, --wig_file WIG_FILE
                        Location of wig file with insertion information.
  -a GFF, --gff GFF     gff annotation
  -o OUTPUT, --output OUTPUT
                        Location and prefix of output file with Dval stats
                        from aggregated counts.
  -d GENE, --gene GENE  gene id
  -w WINDOW, --window WINDOW
                        The window size.
  -g GENOME_LENGTH, --genome_length GENOME_LENGTH
                        Length of genome
  -p PERMUTATION, --permutation PERMUTATION
                        permutations

Similarly, we developed a similar program, with slight alterations to PerGenicMutant.py, called MultiGenePermutation.py, for assessing Tn depletion in a general region relative to it's local surrounding context, which was not gene-centric.