Skip to content
Robert J. Gifford edited this page Jun 23, 2024 · 8 revisions

DIGS project databases have a core schema consisting of five tables. Of these, only three core tables are relevant to the typical DIGS user, the remaining two (indicated in grey) are used internally by the DIGS tool and can be ignored under ordinary circumstances.

DIGS screening database schema. Circles to the left of field names indicate fields that can be used as keys to link across tables in the core schema, or to ancillary tables added for specific projects. The shaded blue and pink background indicates the combination of fields that uniquely identifies (i) target genome files and (ii) reference and probe sequences used for screening, respectively.

searches_performed table

Records the details of BLAST searches that have been performed in this DIGS project.

Field Type Description
record_ID INT Automatically incremented primary key
probe_ID VARCHAR Unique iID of probe sequence
probe_name VARCHAR Name of probe sequence
probe_gene VARCHAR Name of probe sequence gene
target_id VARCHAR Unique identifier the TDb file
organism VARCHAR Name of the organism (Latin binomial) from which TDb was generated
target_datatype VARCHAR Data type of the TDb file
target_version VARCHAR Version details of the TDb file
target_name VARCHAR Name of TDb file
Timestamp TIMESTAMP Timestamp of the table entry

digs_results table

Contains the extracted sequences of the loci specified in BLAST_results table, and results of the second round of paired BLAST, in which extracted sequences are 'genotyped' by BLAST comparison to the reference library.

Field Type Description
record_ID INT Automatically incremented primary key
organism VARCHAR Organism name (Latin binomial)
target_datatype VARCHAR Genome data type
target_version VARCHAR Genome build version details
target_name VARCHAR Name of genome data file containing the BLAST hit
probe_type VARCHAR Type of probe sequence (amino acid or nucleotide)
extract_start INT 5’ (start) position of reverse BLAST alignment in the RSL sequence
extract_end INT 3’ (end) position of reverse BLAST alignment in the RSL sequence
scaffold VARCHAR Name of scaffold/contig/chromosome containing the BLAST hit
orientation ENUM Orientation of the BLAST hit relative to the probe
assigned_name VARCHAR Name of closest matching sequence in RSL
assigned_gene VARCHAR Name of gene of closest matching sequence in RSL
bit_score FLOAT Bit score of the best match from reverse BLAST
identity FLOAT Percentage identity of the best match from reverse BLAST
e_value_num FLOAT Coefficient of the expect (e) value for the best match from reverse BLAST
e_value_exp INT Exponent (base e) of the expect (e) value for the best match from reverse BLAST
subject_start INT 5’ (start) position of reverse BLAST alignment in the RSL sequence
subject_end INT 3’ (end) position of reverse BLAST alignment in the RSL sequence
query_start INT 5’ (start) position of reverse BLAST alignment in the probe sequence
query_end INT 3’ (end) position of reverse BLAST alignment in the probe sequence
mismatches INT Number of mismatches in alignment from reverse BLAST
gap_openings INT Number of gap openings in alignment from reverse BLAST
sequence_length INT Length of the extracted sequence
sequence TEXT Text string of the extracted sequence
timestamp TIMESTAMP Timestamp of the table entry

active_set table

This table is used for processing of data generated by DIGS. It is refreshed each time a new similarity search is performed. The active_set table is used to combine new hit coordinates from a round of DIGS with the coordinates of previously extracted loci. Each set of values is entered into the active_set table. Coordinates from previously extracted loci can be identified by their extract_id values. New hits from BLAST will not have extract_ids. The combined results are sorted by scaffold and subject start, and this sorted list of hits is processed to determine how new hits correspond to previously extracted loci, and to respond accordingly.

Field Type Description
record_id INT Automatically incremented primary key
extract_id INT Automatically incremented primary key
organism VARCHAR Organism name (Latin binomial)
target_data_type VARCHAR Genome data type
target_version VARCHAR Genome build version details
target_name VARCHAR Name of genome data file containing BLAST hit
scaffold VARCHAR Name of scaffold/contig/chromosome containing BLAST hit
extract_start INT Start position of extracted sequence
extract_end INT End position of extracted sequence
sequence length INT Length of the extracted sequence
sequence TEXT Text string of the extracted sequence
assigned_name VARCHAR Name of the best matching reference sequence from 2nd BLAST
assigned_gene VARCHAR Gene annotation of the best matching reference sequence from 2nd BLAST
bitscore FLOAT Bit score of the BLAST hit
identity FLOAT Percentage identity of BLAST hit
evalue_num FLOAT Coefficient of the expect (e) value for the BLAST hit
evalue_exp INT Exponent (base e) of the expect (e) value for the BLAST hit
align_len INT Length of the BLAST hit alignment
mismatches INT Number of mismatches in the BLAST hit alignment
gap_openings INT Number of gap openings in the BLAST hit alignment
orientation ENUM Orientation of the BLAST hit relative to the probe
subject_start INT 5’ (start) position of BLAST hit within the reference sequence
subject_end INT 3’ (end) position of BLAST hit within the reference sequence
query_start INT 5’ (start) position of the BLAST hit within the probe sequence
query_end INT 3’ (end) position of the BLAST hit within the probe sequence
timestamp TIMESTAMP Timestamp of the table entry
Clone this wiki locally