Scripts, notes and documents about the BIND annotation workflow (BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.) Published December 20, 2021
Note: I recommend that you review the challenges document/scripts.
Step 1: Download data from SRA
Step 2: Run fastqc, trim reads and concatenate reads together
Step 3: Ensure fasta header names in reference genome do not have the following items:
- Training white space
- White space between numbers or letters
- A colon
- An underscore
Step 4: Align reads to a reference genome & generate a BAM file
- Examples of alignment programs
- STAR
- HiStat2
- TopHat2
- If using both Paired-end and Single end reads: merge PE and SE BAM files and sort combined BAM files
- If combined BAM file is larger than 100 GB it is recommened to split the BAM file by chromsome
- A large BAM file could take weeks to finish. See challenges.
- Examples of genome guided transcriptome assembly programs
- Cufflinks
- Class2
- Strawberry
- StringTie
- Note: Not recommonded to use Trinity because it generartes too many small incomplete transcripts
Step 6: Run Portcullis for splice junctions analysis
Step 7: Mikado pipeline & Quality control of Mikado pipeline output
Step 8: BRAKER & Filter TE sequences
- NOTE: There are multiple ways to run BRAKER. I recommend reviewing documentation for BRAKER2 and review BRAKER2 tutorial
- Manual about BRAKER with RNA-Seq data
- Scripts and notes in this repo about running BRAKER with RNA-Seq data only
- Manual about BRAKER with RNA-Seq and Protein data
- Scripts and notes in this repo about running BRAKER with RNA-Seq and protein data
- Manual about BRAKER with RNA-Seq data
[Step 9: Merge & Filter -- Mikado output & BRAKER output]
Tool | Purpose |
---|---|
SRA Toolkit(v ) | SRA access |
FASTQC (v ) | Quality Control |
Trimmomatic (v ) | Quality Control |
STAR (v ) | Alignment |
SamTools (v ) | Tool |
Strawberry (v1.1.1) | Transcript Assembly |
CLASS2 (v ) | Transcript Assembly |
StringTie (v ) | Transcript Assembly |
Cufflinks (v ) | Transcript Assembly |
Portcullis (v ) | Splice junctions |
Mikado (v ) | Direct Inference prediction |
Transdecoder (v ) | CDS prediction |
TeSorter (v ) | identify retrotransposons |
Kallisto (v ) | Quality Control |
RepeatMasker (v) | Hardmask genome |
Braker (v ) | Ab initio prediction |
GeneMark (v ) | Ab initio prediction |