Nextstrain Template

This repository provides a comprehensive Nextstrain analysis of "your virus". You can choose to perform either a shorter run with specific proteins or a full genome run.

For those unfamiliar with Nextstrain or needing installation guidance, please refer to the Nextstrain documentation.

Enhancing the Analysis

The data for this analysis is available from NCBI Virus. Instructions for downloading sequences are provided under Sequences.

Repository Organization

This repository includes the following directories and files:

scripts: Custom Python scripts called by the snakefile.
snakefile: The entire computational pipeline, managed using Snakemake. Snakemake documentation can be found here.
ingest: Contains Python scripts and the snakefile for automatic downloading of <your_virus> sequences and metadata.
protein_xy: Sequences and configuration files for the specific protein_xy run.
whole_genome: Sequences and configuration files for the whole genome run.

Configuration Files

The config, protein_xy/config, and whole_genome/config directories contain necessary configuration files:

colors.tsv: Color scheme
geo_regions.tsv: Geographical locations
lat_longs.tsv: Latitude data
dropped_strains.txt: It will exclude these accessions during augur filter
clades_genome.tsv: Manually Labeling Clades on a Nextstrain Tree (see documentation here)
reference_sequence.gb: Reference sequence (add manually)
auspice_config.json: Auspice configuration file - has to be in all data folders!

The reference sequence used is XYZ, accession number, sampled in 19XX.

Quickstart

Setup

Nextstrain Environment

Install the Nextstrain environment by following these instructions.

Running a Build

Activate the Nextstrain environment:

micromamba activate nextstrain

To perform a build, run:

snakemake --cores 9 all

For specific builds:

protein_xy build:

snakemake auspice/<your_virus>_protein_xy.json --cores 9

Whole genome build:

snakemake auspice/<your_virus>_whole-genome.json --cores 9

Visualizing the Build

To visualize the build, use Auspice:

auspice view --datasetDir auspice

To run two visualizations simultaneously, you may need to set the port:

export PORT=4001

Ingest

For more information on how to run the ingest, please refer to the README in the ingest folder.

Sequences

Sequences can be downloaded manually or automatically.

Manual Download: Visit NCBI Virus, search for <your_virus> or Taxid XXXXXX, and download the sequences.
Automated Download: The ingest functionality, included in the main snakefile, handles automatic downloading.

The ingest pipeline is based on the Nextstrain RSV ingest workflow. Running the ingest pipeline produces data/metadata.tsv and data/sequences.fasta.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextstrain Template

Enhancing the Analysis

Repository Organization

Configuration Files

Quickstart

Setup

Nextstrain Environment

Running a Build

Visualizing the Build

Ingest

Sequences

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data		data
genome/config		genome/config
ingest		ingest
protein_xy/config		protein_xy/config
scripts		scripts
.env		.env
.gitignore		.gitignore
README.md		README.md
snakefile		snakefile

nneune/template_nextstrain

Folders and files

Latest commit

History

Repository files navigation

Nextstrain Template

Enhancing the Analysis

Repository Organization

Configuration Files

Quickstart

Setup

Nextstrain Environment

Running a Build

Visualizing the Build

Ingest

Sequences

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages