Skip to content

Docker Installation for GLUE on Windows

Robert J. Gifford edited this page Oct 21, 2024 · 4 revisions

Here is the requested table with one row per influenza virus species, including links:

Here is the updated table with the correct sequence entries file links for all species:

Influenza Virus Species Sequence Entries File Complete Genome Isolates Incomplete Genome Isolates
Influenza A virus Sequence Entries Complete Genome Isolates Incomplete Genome Isolates
Influenza B virus [Sequence Entries](https://github.com/giffoDocker Installation for GLUE on Windowsdlabcvr/Flu-GLUE/blob/main/filter/db/gb_entries/ibv-all-combined.tsv) Complete Genome Isolates Incomplete Genome Isolates
Influenza C virus Sequence Entries Complete Genome Isolates Incomplete Genome Isolates
Influenza D virus Sequence Entries Complete Genome Isolates Incomplete Genome Isolates

This now includes the correct sequence entries file links for all influenza virus species. Let me know if yo Let me know if you need any further adjustments!### A Data-Centric Approach to Virus Genomics

The advances in DNA sequencing technology over the past decades have revolutionized how we study viruses. But despite the wealth of genomic data now available, the resources for storing, analyzing, and sharing this data have often been developed in a fragmented and inefficient way.

In my lab, we've been working to address these issues for several years. I'm excited to introduce GLUE, a bioinformatics software environment designed to usher in a new era of stability and reusability in the development of virus sequence data resources.

At its core, a virus sequence data resource is more than just a collection of sequences. It's a scalable software system that encapsulates expertise from virology, allowing for comprehensive genomic analyses. However, managing these resources---storing, manipulating, and extracting meaningful insights from viral sequence datasets---has historically been a complicated and time-consuming process. With non-standardized approaches in widespread use, interoperability between datasets is limited, leading to a great deal of duplicated effort.

A key goal of GLUE is to develop standardized bioinformatic approaches to viral sequence data, streamlining processes and eliminating unnecessary redundancy.

The GLUE Framework

GLUE was created as a generalized computational framework that supports virus sequence analysis across any virus group. The development of this framework has been informed by our previous work with HIV-1 and hepatitis B virus (HBV) databases and pipelines.

At its foundation, GLUE integrates a minimal set of widely used, freely available software tools to handle core bioinformatics tasks, including sequence alignment, phylogenetic reconstruction, and relational database management. This foundation allows GLUE to be easily replicated or extended, using either our online distributions or different components that follow the same guiding principles.

Why Do Viruses Need Their Own System?

While GLUE could, in theory, be applied to any species, viruses are particularly well-suited to benefit from a system like this. Their extreme diversity---greater than that of all other organisms combined---and their rapid mutation rates make viruses both fascinating and formidable pathogens.

The ability to track viral evolution in real-time, particularly during outbreaks, is crucial for public health efforts. This makes viral genomic data incredibly valuable, but also challenging to manage effectively, particularly when working with fast-evolving pathogens. GLUE is built to handle these unique challenges, allowing us to organize, analyze, and share viral sequence data with unprecedented efficiency.

What Makes GLUE Unique?

One of GLUE's standout features is its separation of concerns. The GLUE software package itself (the "engine") is distinct from GLUE "projects," which contain datasets and analysis tools specific to a virus or virus group. These projects can be loaded into the GLUE engine to interact with project data through a simple programmatic interface, which can be used in traditional bioinformatics pipelines or integrated into web resources.

GLUE's flexibility extends across computing environments. It can be used in various formats, such as local installations, web-based services, or even integrated into broader computing infrastructures within organizations. By employing a minimal set of high-quality, cross-platform software components, GLUE facilitates seamless collaboration, allowing virus researchers to build, share, and enhance resources in cloud-based repositories like GitHub.

Alignments at the Heart of GLUE

Alignments are a cornerstone of virus sequence data resources, and GLUE places them at the center of its strategy for organizing data. The framework's core schema aims to capture as much nucleotide homology as possible among sequences, integrating this critical information into a unified data structure. By doing so, GLUE helps researchers leverage alignments to unlock deeper insights into viral evolution and genomic variation.

Extensibility and Data-Centric Design

GLUE is designed to be adaptable. Its model-driven, data-centric architecture defines a data schema and set of functions that support the common needs of diverse virus sequence data resources. All the information required for processing viral sequences---data, analysis configurations, and more---is stored in a relational database, ensuring the consistent application of high-level logic and cross-cutting concerns like referential integrity and data export.

This approach simplifies the deployment of GLUE-based resources. Installing GLUE on a new system is as simple as copying the database contents, ensuring that all required data and analysis functionality are intact.

What's Next?

Our team is already building online resources for endogenous viral elements (EVEs), which will serve as a vital addition to GLUE's growing ecosystem. EVEs represent viral sequences integrated into host genomes, offering a window into ancient virus-host interactions. These resources will further enhance the capacity of GLUE to support a wide range of virus-related studies, from genomics to evolutionary biology.


Clone this wiki locally