Project Database

DIGS project databases have a core schema consisting of five tables. Of these, only three core tables are relevant to the typical DIGS user, the remaining two (indicated in grey) are used internally by the DIGS tool and can be ignored under ordinary circumstances.

DIGS screening database schema. Circles to the left of field names indicate fields that can be used as keys to link across tables in the core schema, or to ancillary tables added for specific projects. The shaded blue and pink background indicates the combination of fields that uniquely identifies (i) target genome files and (ii) reference and probe sequences used for screening, respectively.

`searches_performed` table

Records the details of BLAST searches that have been performed in this DIGS project.

Field	Type	Description
record_ID	INT	Automatically incremented primary key
probe_ID	VARCHAR	Unique iID of probe sequence
probe_name	VARCHAR	Name of probe sequence
probe_gene	VARCHAR	Name of probe sequence gene
target_id	VARCHAR	Unique identifier the TDb file
organism	VARCHAR	Name of the organism (Latin binomial) from which TDb was generated
target_datatype	VARCHAR	Data type of the TDb file
target_version	VARCHAR	Version details of the TDb file
target_name	VARCHAR	Name of TDb file
Timestamp	TIMESTAMP	Timestamp of the table entry

`digs_results` table

Contains the extracted sequences of the loci specified in BLAST_results table, and results of the second round of paired BLAST, in which extracted sequences are 'genotyped' by BLAST comparison to the reference library.

Field	Type	Description
record_ID	INT	Automatically incremented primary key
organism	VARCHAR	Organism name (Latin binomial)
target_datatype	VARCHAR	Genome data type
target_version	VARCHAR	Genome build version details
target_name	VARCHAR	Name of genome data file containing the BLAST hit
probe_type	VARCHAR	Type of probe sequence (amino acid or nucleotide)
extract_start	INT	5’ (start) position of reverse BLAST alignment in the RSL sequence
extract_end	INT	3’ (end) position of reverse BLAST alignment in the RSL sequence
scaffold	VARCHAR	Name of scaffold/contig/chromosome containing the BLAST hit
orientation	ENUM	Orientation of the BLAST hit relative to the probe
assigned_name	VARCHAR	Name of closest matching sequence in RSL
assigned_gene	VARCHAR	Name of gene of closest matching sequence in RSL
bit_score	FLOAT	Bit score of the best match from reverse BLAST
identity	FLOAT	Percentage identity of the best match from reverse BLAST
e_value_num	FLOAT	Coefficient of the expect (e) value for the best match from reverse BLAST
e_value_exp	INT	Exponent (base e) of the expect (e) value for the best match from reverse BLAST
subject_start	INT	5’ (start) position of reverse BLAST alignment in the RSL sequence
subject_end	INT	3’ (end) position of reverse BLAST alignment in the RSL sequence
query_start	INT	5’ (start) position of reverse BLAST alignment in the probe sequence
query_end	INT	3’ (end) position of reverse BLAST alignment in the probe sequence
mismatches	INT	Number of mismatches in alignment from reverse BLAST
gap_openings	INT	Number of gap openings in alignment from reverse BLAST
sequence_length	INT	Length of the extracted sequence
sequence	TEXT	Text string of the extracted sequence
timestamp	TIMESTAMP	Timestamp of the table entry

`active_set` table

This table is used for processing of data generated by DIGS. It is refreshed each time a new similarity search is performed. The active_set table is used to combine new hit coordinates from a round of DIGS with the coordinates of previously extracted loci. Each set of values is entered into the active_set table. Coordinates from previously extracted loci can be identified by their extract_id values. New hits from BLAST will not have extract_ids. The combined results are sorted by scaffold and subject start, and this sorted list of hits is processed to determine how new hits correspond to previously extracted loci, and to respond accordingly.

Field	Type	Description
record_id	INT	Automatically incremented primary key
extract_id	INT	Automatically incremented primary key
organism	VARCHAR	Organism name (Latin binomial)
target_data_type	VARCHAR	Genome data type
target_version	VARCHAR	Genome build version details
target_name	VARCHAR	Name of genome data file containing BLAST hit
scaffold	VARCHAR	Name of scaffold/contig/chromosome containing BLAST hit
extract_start	INT	Start position of extracted sequence
extract_end	INT	End position of extracted sequence
sequence length	INT	Length of the extracted sequence
sequence	TEXT	Text string of the extracted sequence
assigned_name	VARCHAR	Name of the best matching reference sequence from 2nd BLAST
assigned_gene	VARCHAR	Gene annotation of the best matching reference sequence from 2nd BLAST
bitscore	FLOAT	Bit score of the BLAST hit
identity	FLOAT	Percentage identity of BLAST hit
evalue_num	FLOAT	Coefficient of the expect (e) value for the BLAST hit
evalue_exp	INT	Exponent (base e) of the expect (e) value for the BLAST hit
align_len	INT	Length of the BLAST hit alignment
mismatches	INT	Number of mismatches in the BLAST hit alignment
gap_openings	INT	Number of gap openings in the BLAST hit alignment
orientation	ENUM	Orientation of the BLAST hit relative to the probe
subject_start	INT	5’ (start) position of BLAST hit within the reference sequence
subject_end	INT	3’ (end) position of BLAST hit within the reference sequence
query_start	INT	5’ (start) position of the BLAST hit within the probe sequence
query_end	INT	3’ (end) position of the BLAST hit within the probe sequence
timestamp	TIMESTAMP	Timestamp of the table entry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Database

`searches_performed` table

`digs_results` table

`active_set` table

DIGS Tool

Overview

Input Components

Process

Reference

Further Information

Source Code

Clone this wiki locally

Project Database

searches_performed table

digs_results table

active_set table

DIGS Tool

Overview

Input Components

Process

Reference

Further Information

Source Code

Clone this wiki locally

`searches_performed` table

`digs_results` table

`active_set` table