-
Notifications
You must be signed in to change notification settings - Fork 8
Understanding the image: track by track
PlasmidID output is a series of images, one for each plasmid that passed mapping and clustering filter. Summary image and individual images are full of information. This guide will help you decipher it track by track:
The outer blue ring represents the reference plasmid from the database. This one includes two size indicators, one constant indicator every 2000 bp and one relative indicator every 10% of the plasmid length. This plasmid is similar to the one in the sample, but doesn’t have to be identical, so imagine it as a SCAFFOLD.
A histogram with the number of reads mapped in each position of the reference plasmid. Grey axes lines space every 50 reads. When more than 200 reads mapped in a single position the graph is colored green, less than 20 orange, turning into red when no coverage was detected. Not mapped regions (RED) are differences between reference and the actual plasmid in the sample
- TIP: Check for highly repetitive sequences such insertion sequences (IS) in this track.
These layers include contig and plasmid database annotation results from Prokka software. Each purple and grey box indicated a predicted named gene or cds with its identity tag placed over the sequence, for contigs and plasmid database respectively. Genes over the line and darker are located in + strand, the rest in – strand.
Additionally, there is a representation of specific databases location. As example, ARG and Incompatibility groups are highlighted in red and yellow respectively.
- TIP: Sometimes assembled contigs are not long enough to be annotated. Gray boxes (reference annotation) can hint the presence of a gene in our sample (missing purple box).
- TIP2: If reference and contig annotation are in different strands, the contig has been assembled in different direction.
- TIP3: Purple annotation is closer to the reference plasmid (Blue ring) even though it refers to contigs. This is because contig annotation is more important. Grey boxes are a way to double check.
- TIP4: You can annotate as many databases as you want by filling the annotation config file 😉.
This layer represents local alignments of contigs sequences that match plasmid region. Contigs are automatically colored depending on its number which is also displayed above the contig. Alignments shorter than 500 bp are not displayed since they are overrepresented in repetitive region and does not contribute to resolve contigs distribution.
This inner layer represents the full length of contigs that had at least 20% of their size homologous to the reference plasmid. This is the most important track since this will depict the plasmid structure, the filters applied (non repeated and minimum length percentage) and the full contig information makes this track the closest to the actual plasmid in the sample.
The minimum contig size is set to 20%, meaning that homology with largest contigs, normally chromosome derived, will not generate a spiral in this track as the image show. This way, only contigs with a proper alignment length will be considered for plasmid reconstruction.
Contigs are positioned considering the best match, so no repeated contigs are present. This track is useful to detect misassemblies and determine contigs layout.
- TIP: SMRT sequencing assembly for small sequences needs to be circularized, unfortunately many of the submitted plasmids are not. If there is a repeated region at the beginning and end of the plasmid sequence, this track will not consider them.