Getting Started

Getting Started with the DIGS Tool

This guide provides an overview of the steps required to set up and run the DIGS tool, including details on the input data components and what to expect in terms of setup time and computational requirements. DIGS is a powerful tool for performing similarity-based searches in whole genome sequence data, and this guide will help you get a sense of what is needed to use it effectively.

Input Data Components

Before running DIGS, you'll need to prepare the following key data components:

Target Database (TDb):
- A collection of whole genome sequence or transcriptome assemblies that will serve as the target for similarity searches.
- The target database should contain the sequences you aim to analyze and is a critical part of the screening process.
Query Sequences (Probes):
- Input sequences that will be used to perform similarity searches against the Target Database.
- These sequences should be carefully chosen to match the type of genetic features or viral elements you are investigating.
Reference Sequence Library (RSL):
- Represents the genetic diversity associated with the genome feature(s) under investigation.
- The reference library is used to contextualize the results of the similarity searches and validate the detected sequences.

How Long Will It Take to Set Up DIGS and Run a Screen?

The time required to set up and run DIGS depends on several factors, including your platform, computational resources, and the size of your datasets. Here's an outline of what to expect at each stage of the process.

Step 1: Installing the Required Components for DIGS

Time Estimate: Installation should only take a few minutes, particularly for experienced bioinformaticians working on LINUX/UNIX systems.
Details:
- The DIGS tool requires several widely used bioinformatics components, including PERL, BLAST, and MySQL.
- Most bioinformatics servers will already have these programs installed. If installation is necessary, it should be straightforward on LINUX/UNIX operating systems.
- Mac Users: Installing DIGS on a Macintosh computer can be less predictable. Specifically, the PERL library DBD::MySQL does not come pre-installed and may require additional configuration. For guidance, refer to Mac installation instructions.

Step 2: Setting Up Your DIGS Screen

Time Estimate: This step might take longer depending on the complexity of your research question and data preparation.
Details:
- This stage involves selecting and formatting your probes, reference sequences, and target database.
- It is crucial to carefully plan which target genomes you will screen and what kind of sequences you are searching for.
- Spend time framing the research question you aim to address to ensure you select the most relevant targets and queries.

Step 3: Creating a Control File

Time Estimate: Only a few minutes.
Details:
- The control file defines the parameters for the DIGS screening process and should be structured according to the specifications.
- Refer to the control file guide for detailed instructions on creating this file.

Step 4: Running the Screen

Time Estimate: Hours to days, depending on the size of your datasets and computational resources.
Details:
- DIGS performs similarity search-based screening, which can be computationally intensive and time-consuming.
- The length of time required to run a screen will depend on factors like the size of the target database and the abundance or scarcity of matches for the query sequences.
- For long-running screens on a server, consider running the process in the background to avoid disruptions. Detailed instructions on running DIGS in the background can be found here.

Monitoring and Managing Screen Progress

DIGS provides real-time updates on the progress of the screen, indicating how many queries have been executed.
It's important to keep in mind that screen duration will vary with the complexity of the search and the computational power available.

Summary of Setup and Runtime Expectations

Installation: Fast and straightforward on LINUX/UNIX systems; may require additional steps on Macintosh.
Data Preparation: Requires careful planning to ensure relevant probes, references, and targets are selected.
Screening: Can range from a few hours to several days depending on data size and system capabilities.
Control File: Quick setup, guiding how DIGS performs its searches.

By understanding these steps and planning your data preparation carefully, you'll be able to use the DIGS tool efficiently to uncover the viral 'fossil record' hidden within genome sequences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Getting Started with the DIGS Tool

Input Data Components

How Long Will It Take to Set Up DIGS and Run a Screen?

Step 1: Installing the Required Components for DIGS

Step 2: Setting Up Your DIGS Screen

Step 3: Creating a Control File

Step 4: Running the Screen

Monitoring and Managing Screen Progress

Summary of Setup and Runtime Expectations

DIGS Tool

Overview

Input Components

Process

Reference

Further Information

Source Code

Clone this wiki locally