-
Notifications
You must be signed in to change notification settings - Fork 2
Paper Draft
GLUE (Genes Linked by Underlying Evolution) is a flexible software system designed for virus genomics, providing tools for storing, managing, and analyzing genetic data on infectious agents. As advances in DNA sequencing technologies transform biological research, the need for efficient data utilization becomes paramount. GLUE aims to unlock the wealth of information contained within molecular sequence data, enabling comparative analysis of genes and genomes.
-
The Data Deluge: The rapid accumulation of molecular sequence data presents both opportunities and challenges for researchers. With billions of bases generated in a single experiment at low cost, there is unprecedented potential for advancing knowledge in the field.
-
Virus Databases:
- Virus databases enable the examination of viral properties and epidemic patterns by combining genomic data with other associated information.
- Sequencing data is essential for understanding evolutionary histories and tracking viral replication programs.
-
Early Examples: Influenza A virus and HIV-1 were among the first highly sequenced viruses, serving as proving grounds for comparative and phylogenetic approaches.
- Influenza A Virus: Initially focused on epidemiological studies, understanding spread rates, pathways, and later on vaccine design.
- HIV-1: Contributed to the establishment of databases such as HIVdb (Stanford) and the Los Alamos HIV database.
- Challenges: Many species-focused databases for viruses have been developed but often lack maintenance, leaving significant gaps in resources for viruses like measles and RSV.
-
Comparative Analysis: Essential tools include:
- Pairwise and multiple sequence alignments.
- Pattern discovery for mutations and motifs.
- Phylogenetic tree reconstruction.
-
Unique Challenges of Viruses:
- Viruses have greater diversity and higher mutation rates than other organisms, necessitating tailored systems for their study.
- The capacity for rapid evolution presents both challenges and opportunities for real-time tracking of viral epidemics.
GLUE distinguishes between the software engine and GLUE projects, which encapsulate datasets related to specific viral groups. This design allows for effective interaction with project data through a user-friendly programmatic interface.
GLUE employs a model-driven architecture that defines a data schema supporting diverse virus sequence data resources. Key characteristics include:
- Storage of both data and analysis configurations in a relational database.
- Simplified implementations of higher-level logic through standard database mechanisms (structured queries, relational joins, paging, and caching).
- High-quality multiple sequence alignments (MSAs) are critical in virus genomics, requiring significant effort to create, especially for distantly related sequences.
- GLUE prioritizes MSAs by treating them as first-class data objects, streamlining the management and analysis processes associated with them.
GLUE can be deployed within standard web servers, facilitating machine-to-machine interactions via web services. This capability supports the creation of interactive public websites and programmatic services, enhancing the integration of GLUE into broader computational infrastructures.
As genomic data continue to accumulate, the need for robust, well-maintained databases and tools like GLUE will only grow. Efforts to industrialize virus analysis as a service, along with the potential monetization of these services, may become increasingly relevant.
GLUE represents a revolutionary step in the way we handle viral genomic data, offering a unified environment for virus genomics that is essential for advancing research and public health monitoring. Its modular architecture and emphasis on data-centric design enable effective management of the complexities inherent in viral sequences, paving the way for future innovations in bioinformatics.
GLUE by Robert J. Gifford Lab.
For questions, issues, or feedback, please open an issue on the GitHub repository.
- Project Data Model
- Schema Extensions
- Modules
- Alignments
- Variations
- Scripting Layer
- Freemarker Templates
- Example GLUE Project
- Command Line Interpreter
- Build Your Own Project
- Querying the GLUE Database
- Working With Deep Sequencing Data
- Invoking GLUE as a Unix Command
- Known Issues and Fixes
- Overview
- Hepatitis Viruses
- Arboviruses
- Respiratory Viruses
- Animal Viruses
- Spillover Viruses
- Virus Diversity
- Retroviruses
- Paleovirology
- Transposons
- Host Genes