Skip to content

akutils format_database

alk224 edited this page May 24, 2016 · 2 revisions

akutils format_database - format a database for a specific primer set

Usage (order is important!!):

akutils format_database  <input_fasta> <input_taxonomy> <input_primers> <read_length> <output_directory> <input_phylogeny>  

This script will take an input reference fasta, an associated taxonomy file already formatted for use with QIIME, and a pair of primers (degenerate OK) and produces four pairs of fasta and taxonomy files as outputs:

  1. read1 (for use with read 1 only)
  2. read2 (for use with read 2 only)
  3. amplicon (for use with joined data)
  4. composite (combines the other three outputs for maximum completeness)

Tree file:
If a phylogeny is supplied (tree file), an associated tree will be filtered for each pair of taxonomy and fasta files.

Primers file:
<input_primers> must be formatted for Primer Prospector and contain no more than two primers. Example:


Notes: Primer suffixes (f and r) are essential. Primer sequences are supplied 5->3 prime. Database fasta must be correctly oriented with respect to primer direction. Input files can have only ONE "." character immediately preceeding the file extension or this workflow will fail.


akutils format_database greengenes_97repset.fasta greengenes_97tax.txt 515-806.txt 150 16S_v4_db  

The example will take representative sequences and associated taxonomy file for greengenes97 and use the primer file 515-806.txt to produce a set of database files for use with v4 amplicons generated with 515f and 806r in paired end 2x150 mode. Output will be placed in a directory called 16S_v4_db.

Please cite Primer Prospector if you find this utility useful.

PrimerProspector: de novo design and taxonomic analysis of PCR primers (2011). William A. Walters, J. Gregory Caporaso, Christian L. Lauber, Donna Berg-Lyons, Noah Fierer, and Rob Knight. Bioinformatics 27(8):1159-1161.

Clone this wiki locally