-
Notifications
You must be signed in to change notification settings - Fork 6
Project Database
DIGS project databases have a core schema consisting of five tables. Of these, only three core tables are relevant to the typical DIGS user, the remaining two (indicated in grey) are used internally by the DIGS tool and can be ignored under ordinary circumstances.
DIGS screening database schema. Circles to the left of field names indicate fields that can be used as keys to link across tables in the core schema, or to ancillary tables added for specific projects. The shaded blue and pink background indicates the combination of fields that uniquely identifies (i) target genome files and (ii) reference and probe sequences used for screening, respectively.
Records the details of BLAST searches that have been performed in this DIGS project.
Field | Type | Description |
---|---|---|
record_ID | INT | Automatically incremented primary key |
probe_ID | VARCHAR | Unique iID of probe sequence |
probe_name | VARCHAR | Name of probe sequence |
probe_gene | VARCHAR | Name of probe sequence gene |
target_id | VARCHAR | Unique identifier the TDb file |
organism | VARCHAR | Name of the organism (Latin binomial) from which TDb was generated |
target_datatype | VARCHAR | Data type of the TDb file |
target_version | VARCHAR | Version details of the TDb file |
target_name | VARCHAR | Name of TDb file |
Timestamp | TIMESTAMP | Timestamp of the table entry |
Contains the extracted sequences of the loci specified in BLAST_results table, and results of the second round of paired BLAST, in which extracted sequences are 'genotyped' by BLAST comparison to the reference library.
Field | Type | Description |
---|---|---|
record_ID | INT | Automatically incremented primary key |
organism | VARCHAR | Organism name (Latin binomial) |
target_datatype | VARCHAR | Genome data type |
target_version | VARCHAR | Genome build version details |
target_name | VARCHAR | Name of genome data file containing the BLAST hit |
probe_type | VARCHAR | Type of probe sequence (amino acid or nucleotide) |
extract_start | INT | 5’ (start) position of reverse BLAST alignment in the RSL sequence |
extract_end | INT | 3’ (end) position of reverse BLAST alignment in the RSL sequence |
scaffold | VARCHAR | Name of scaffold/contig/chromosome containing the BLAST hit |
orientation | ENUM | Orientation of the BLAST hit relative to the probe |
assigned_name | VARCHAR | Name of closest matching sequence in RSL |
assigned_gene | VARCHAR | Name of gene of closest matching sequence in RSL |
bit_score | FLOAT | Bit score of the best match from reverse BLAST |
identity | FLOAT | Percentage identity of the best match from reverse BLAST |
e_value_num | FLOAT | Coefficient of the expect (e) value for the best match from reverse BLAST |
e_value_exp | INT | Exponent (base e) of the expect (e) value for the best match from reverse BLAST |
subject_start | INT | 5’ (start) position of reverse BLAST alignment in the RSL sequence |
subject_end | INT | 3’ (end) position of reverse BLAST alignment in the RSL sequence |
query_start | INT | 5’ (start) position of reverse BLAST alignment in the probe sequence |
query_end | INT | 3’ (end) position of reverse BLAST alignment in the probe sequence |
mismatches | INT | Number of mismatches in alignment from reverse BLAST |
gap_openings | INT | Number of gap openings in alignment from reverse BLAST |
sequence_length | INT | Length of the extracted sequence |
sequence | TEXT | Text string of the extracted sequence |
timestamp | TIMESTAMP | Timestamp of the table entry |
This table is used for processing of data generated by DIGS. It is refreshed each time a new similarity search is performed. The active_set table is used to combine new hit coordinates from a round of DIGS with the coordinates of previously extracted loci. Each set of values is entered into the active_set table. Coordinates from previously extracted loci can be identified by their extract_id values. New hits from BLAST will not have extract_ids. The combined results are sorted by scaffold and subject start, and this sorted list of hits is processed to determine how new hits correspond to previously extracted loci, and to respond accordingly.
Field | Type | Description |
---|---|---|
record_id | INT | Automatically incremented primary key |
extract_id | INT | Automatically incremented primary key |
organism | VARCHAR | Organism name (Latin binomial) |
target_data_type | VARCHAR | Genome data type |
target_version | VARCHAR | Genome build version details |
target_name | VARCHAR | Name of genome data file containing BLAST hit |
scaffold | VARCHAR | Name of scaffold/contig/chromosome containing BLAST hit |
extract_start | INT | Start position of extracted sequence |
extract_end | INT | End position of extracted sequence |
sequence length | INT | Length of the extracted sequence |
sequence | TEXT | Text string of the extracted sequence |
assigned_name | VARCHAR | Name of the best matching reference sequence from 2nd BLAST |
assigned_gene | VARCHAR | Gene annotation of the best matching reference sequence from 2nd BLAST |
bitscore | FLOAT | Bit score of the BLAST hit |
identity | FLOAT | Percentage identity of BLAST hit |
evalue_num | FLOAT | Coefficient of the expect (e) value for the BLAST hit |
evalue_exp | INT | Exponent (base e) of the expect (e) value for the BLAST hit |
align_len | INT | Length of the BLAST hit alignment |
mismatches | INT | Number of mismatches in the BLAST hit alignment |
gap_openings | INT | Number of gap openings in the BLAST hit alignment |
orientation | ENUM | Orientation of the BLAST hit relative to the probe |
subject_start | INT | 5’ (start) position of BLAST hit within the reference sequence |
subject_end | INT | 3’ (end) position of BLAST hit within the reference sequence |
query_start | INT | 5’ (start) position of the BLAST hit within the probe sequence |
query_end | INT | 3’ (end) position of the BLAST hit within the probe sequence |
timestamp | TIMESTAMP | Timestamp of the table entry |