Skip to content

nneune/template_nextstrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextstrain Template

This repository provides a comprehensive Nextstrain analysis of "your virus". You can choose to perform either a shorter run with specific proteins or a full genome run.

For those unfamiliar with Nextstrain or needing installation guidance, please refer to the Nextstrain documentation.

Enhancing the Analysis

The data for this analysis is available from NCBI Virus. Instructions for downloading sequences are provided under Sequences.

Repository Organization

This repository includes the following directories and files:

  • scripts: Custom Python scripts called by the snakefile.
  • snakefile: The entire computational pipeline, managed using Snakemake. Snakemake documentation can be found here.
  • ingest: Contains Python scripts and the snakefile for automatic downloading of <your_virus> sequences and metadata.
  • protein_xy: Sequences and configuration files for the specific protein_xy run.
  • whole_genome: Sequences and configuration files for the whole genome run.

Configuration Files

The config, protein_xy/config, and whole_genome/config directories contain necessary configuration files:

  • colors.tsv: Color scheme
  • geo_regions.tsv: Geographical locations
  • lat_longs.tsv: Latitude data
  • dropped_strains.txt: It will exclude these accessions during augur filter
  • clades_genome.tsv: Manually Labeling Clades on a Nextstrain Tree (see documentation here)
  • reference_sequence.gb: Reference sequence (add manually)
  • auspice_config.json: Auspice configuration file - has to be in all data folders!

The reference sequence used is XYZ, accession number, sampled in 19XX.

Quickstart

Setup

Nextstrain Environment

Install the Nextstrain environment by following these instructions.

Running a Build

Activate the Nextstrain environment:

micromamba activate nextstrain

To perform a build, run:

snakemake --cores 9 all

For specific builds:

  • protein_xy build:
snakemake auspice/<your_virus>_protein_xy.json --cores 9
  • Whole genome build:
snakemake auspice/<your_virus>_whole-genome.json --cores 9

Visualizing the Build

To visualize the build, use Auspice:

auspice view --datasetDir auspice

To run two visualizations simultaneously, you may need to set the port:

export PORT=4001

Ingest

For more information on how to run the ingest, please refer to the README in the ingest folder.

Sequences

Sequences can be downloaded manually or automatically.

  1. Manual Download: Visit NCBI Virus, search for <your_virus> or Taxid XXXXXX, and download the sequences.
  2. Automated Download: The ingest functionality, included in the main snakefile, handles automatic downloading.

The ingest pipeline is based on the Nextstrain RSV ingest workflow. Running the ingest pipeline produces data/metadata.tsv and data/sequences.fasta.

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published