Querying the GLUE Database

GLUE allows powerful queries to be run on the stored data, using a few simple concepts and command patterns. These can be used simply to extract data subsets, to ask simple research questions, or as part of a more complex analysis. To understand this section it is useful to be familiar with the core schema and schema extension sections. We will also use the example project for illustration so it will be useful to have this installed.

GLUE "where clauses"
Property paths to specify table columns
Property paths in FreeMarker templates

GLUE "where clauses"

Many GLUE commands have the same basic structure:

Start from some particular set of candidate objects of the same type (i.e. in the same table)
(Optionally) select a subset of candidate objects
Perform some operation on the selected objects

GLUE "where" clauses are used in step 2 to select a subset of objects, using a logical filter. A command with this structure is the amino-acid frequency command. This command is executed in alignment mode, so a specific Alignment object is the data context.

Mode path: /
GLUE> project example
OK
Mode path: /project/example
GLUE> alignment AL_MASTER
OK
Mode path: /project/example/alignment/AL_MASTER
GLUE> amino-acid frequency -c -w "sequence.source.name = 'ncbi-hev-examples'" -r REF_MASTER_M73218 -f MT -l 56 66
+=======+===========+============+============+
| codon | aminoAcid | numMembers | pctMembers |
+=======+===========+============+============+
| 56    | V         | 10         | 100.00     |
| 57    | F         | 10         | 100.00     |
| 58    | R         | 10         | 100.00     |
| 59    | P         | 10         | 100.00     |
| 60    | E         | 10         | 100.00     |
| 61    | V         | 10         | 100.00     |
| 62    | L         | 7          | 70.00      |
| 62    | F         | 3          | 30.00      |
| 63    | W         | 10         | 100.00     |
| 64    | S         | 1          | 10.00      |
| 64    | N         | 9          | 90.00      |
| 65    | H         | 10         | 100.00     |
| 66    | P         | 10         | 100.00     |
+=======+===========+============+============+

In this case the Alignment AL_MASTER is the data context, and the command operates as follows:

The candidate object set consists of all AlignmentMembers which belong to Alignment AL_MASTER, or (since -c was used) one of its descendent Alignments in the alignment tree
Select a subset of these AlignmentMembers which pass the where clause filter specified by -w "sequence.source.name = 'ncbi-hev-examples'"
Amongst the selected AlignmentMembers, compute the frequency of amino acid residues in the MT (Methyltransferase) genome region at ORF1 codon positions 56 to 66 inclusive (as specified by the -r, -f and -l options)

All the objects identified in step 1 will be tested in step 2 against the where clause; those which pass this filter will then contribute to the computation in step 3.

The where clause string sequence.source.name = 'ncbi-hev-examples' is supplied in double quotes in order to allow space and single quote characters to be used within the string.

The first part of the where clause is sequence.source.name. This is a property path. Starting from the type of the candidate object, in this case AlignmentMember, the property path traverses named relational links specified by the data schema in order to arrive at a data value. In this case the all links are specified in the core schema; the relevant schema fragment is shown. However any or all of them could also be specified by a schema extension.

Start from the AlignmentMember candidate object type
Traverse the sequence relational link to the target Sequence object
Traverse the source relational link to the target Source object
Access the value for the field name on the Source object

The where clause expression uses the binary operator '=' to compare the Source name with the literal string 'ncbi-hev-examples', so that only AlignmentMembers based on Sequences belongs to Source ncbi-hev-examples are selected. It is also possible to use parentheses and logical operators such as 'and', 'not' and 'or' to join sub clauses together. Here are some where clause patterns you could use in the example project:

Candidate object type	Where clause	Meaning	Note	Example command
Sequence	`length >= 7200 and length < 7300`	The field value for `length` is at least 7200 and less than 7300	Numeric comparison operators may be used for INTEGER fields such as `length`, DOUBLE or DATE fields	list sequence
Sequence	`length between 7200 and 7300`	The field value for `length` is at least 7200 and at most 7300	The `between ... and` numeric comparison operator is for testing values within an inclusive range	list sequence
m49_country	`is_ldc = true or is_lldc = true`	Country is classified as least developed country or landlocked developing country	BOOLEAN fields such as `is_ldc` and `is_lldc` are tested for value true	list custom-table-row
m49_country	`m49_region.id = 'europe' and m49_sub_region.id != 'northern_europe'`	Country region is Europe but sub-region is not Northern Europe	The `!=` comparison operator, which may be applied to field values of any type, is used for disequality, and the `and` logical operator conjoins two sub-clauses	list custom-table-row
Sequence	`gb_create_date > #gluedate(01-Apr-2009)`	First submitted to GenBank after 1st April 2009	The `#gluedate` syntax specifies a concrete date, then numeric comparison operators may be used	list sequence
Sequence	`m49_country.id in ('CHN', 'JPN')`	Country of origin China or Japan	The `in` syntax allows comparison with any one of a fixed set of strings	list sequence
Sequence	`sequenceID like 'FJ%'`	The `sequenceID` field value starts with FJ	The `like` operator my be used for String fields such as `sequenceID`. If so, the '%' character acts as a wildcard, matching any number of characters.	list sequence
ReferenceSequence	`alignmentsWhereRefSequence.name != null`	ReferenceSequence is the constraining reference sequence of at least one Alignment	The `alignmentsWhereRefSequence` link is specified in one of the core schema diagrams. When traversing from ReferenceSequence to Alignment the link allows multiple targets, so the where clause is effectingly requiring that there exists at least one link target	list reference
Sequence	`source.name = 'ncbi-hev-examples' and m49_country+.id = null`	Within Source ncbi-hev-examples but M.49 country is undefined	The keyword `null` can be used to test for an undefined value in the database. In this case we must also use the outer join operator `+` on link `m49_country` since by default sub-clauses which traverse to a non-existent link target will return false. The operator allows such sub-clauses to evaluate to null	list sequence

For those interested in implementation details, where clause strings are actually Apache Cayenne object path expressions.

Property paths to specify table columns

The property path concept introduced above has other uses in GLUE. One example use is to configure the result columns of various "list ..." commands such as list custom-table-row, list sequence or list member. Such commands output a table where each row is an instance of a specific given object type. Each such command has a default set of columns, for example list member will give alignment.name, sequence.source.name and sequence.sequenceID. Note that these column headings are themselves property paths starting at the AlignmentMember object type. When invoking this command, you can supply your own property paths, overriding the columns in the result table, for example:

GLUE> alignment AL_MASTER
OK
Mode path: /project/example/alignment/AL_MASTER
GLUE> list member --recursive --whereClause "sequence.source.name = 'ncbi-hev-examples'" sequence.sequenceID alignment.name sequence.host_species
+=====================+================+=======================+
| sequence.sequenceID | alignment.name | sequence.host_species |
+=====================+================+=======================+
| AB481226            | AL_3e          | -                     |
| AB591734            | AL_3a          | Herpestes javanicus   |
| AF444003            | AL_1b          | -                     |
| FJ705359            | AL_3c          | Sus scrofa            |
| FJ763142            | AL_4a          | Homo sapiens          |
| FJ998015            | AL_3e          | Sus scrofa            |
| JF443717            | AL_1c          | Homo sapiens          |
| JQ013791            | AL_3ra         | Oryctolagus cuniculus |
| JX855794            | AL_4b          | Sus scrofa            |
| KP294371            | AL_3i          | Sus scrofa            |
+=====================+================+=======================+
AlignmentMembers found: 10

Property paths in FreeMarker templates

Some GLUE modules use FreeMarker templates to assemble text strings. In some cases the model for the template may be a GLUE data object. Property paths may then be used to access the database from within the FreeMarker template.

A simple example is fastaAlignmentExporter. The ID string for each alignment row is formed using a FreeMarker template. The default behaviour is to use the following FreeMarker template to create an ID with three sections:

${alignment.name}.${sequence.source.name}.${sequence.sequenceID}

Each of the three sections is created by evaluating a property path from the AlignmentMember object that contributed the alignment row.

GLUE by Robert J. Gifford Lab.

For questions, issues, or feedback, please open an issue on the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Querying the GLUE Database

GLUE "where clauses"

Property paths to specify table columns

Property paths in FreeMarker templates

GLUE USER GUIDE

Overview

GLUE Concepts

GLUE User Guide

Reference Documentation

Gifford Lab GLUE Projects

Clone this wiki locally