bbduk_search_for_primers.py README
python3 python-dev (sudo apt-get install python-dev),
bbduk.sh (can be installed using the conda comand)
conda install -c bioconda/label/cf201901 bbmap
Alternatively, can be installed locally by downloading the source code from:
https://github.com/BioInfoTools/BBMap/blob/master/sh/bbduk.sh
search-primers-in-reads.py is a python3 script allowing to identify in a sequencing reads dataset the subset of reads containing a given pair of PCR (polymerase chain reaction) primers.
input file with primers should have .txt extension
example: Primers.txt file
primers format: primers name must be written as shown below in Primers.txt file: <primer_name> <forward_primer> <reverse_primer>
example: Ft-M22_6bp_73bp_2u CTGCTATATTTAGACAAAGTGA TCTGAAAGTGCTTGTTGTTGAT
Note: use tab(\t) as delimiter in Primers.txt
-
Script "search-primers-in-reads.py" will produce one fastq file format for each fastq.gz input file and each primer pair listed in Primers.txt.
-
Script "Fastq_to_fasta.py" https://github.com/Vladislav-Shevtsov/Fastq-to-fasta- will convert all fastq files associated with each fastq.gz input into files containing all reads for all primer pairs, in fasta format.
-
Script "Sort-files-by-pattern" https://github.com/Vladislav-Shevtsov/Sort-files-by-pattern will help with fasta files sorting to folders corresponding name of the loci
-
Script "Fa_to_one" https://github.com/Vladislav-Shevtsov/Fa_to_one will combine all loci in a single fasta file located in folders
By default search-primers-in-reads.py receive 3 arguments: path to input directory, path to output directory and path to file with primers.
The following command structure will generate .sh bash file:
python3 search-primers-in-reads.py [path/to/input.gz] [path/to/output/folder] [path/to/Primers.txt]
Example:
python3 search-primers-in-reads.py ./input_data ./output_data Primers.txt
#This will generate the bash file
The following command will execute generated bbduk_search_primers.sh bash file:
./bbduk_search_primers.sh
#This will run bash file
By default 2 mismatches allowed but can be modified in the code by changing the value of hdist, for example hdist=3.
#For any questions please open issue on github or email me at shevtsovvladislav111@gmail.com
#written by Shevtsov V.