Skip to content

Scripts, notes and documents about the BIND annotation workflow (BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.)

Notifications You must be signed in to change notification settings

PeanutBase/BIND_annotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BIND annotation

BIND = Braker combined with gene predictions INferred Directly

Scripts, notes and documents about the BIND annotation workflow (BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.) Published December 20, 2021

GitHub Repo

Overview of Steps

Note: I recommend that you review the challenges document/scripts.

Step 1: Download data from SRA

Step 2: Run fastqc, trim reads and concatenate reads together

Step 3: Ensure fasta header names in reference genome do not have the following items:

  • Training white space
  • White space between numbers or letters
  • A colon
  • An underscore

Step 4: Align reads to a reference genome & generate a BAM file

  • Examples of alignment programs
    • STAR
    • HiStat2
    • TopHat2
  • If using both Paired-end and Single end reads: merge PE and SE BAM files and sort combined BAM files
  • If combined BAM file is larger than 100 GB it is recommened to split the BAM file by chromsome
    • A large BAM file could take weeks to finish. See challenges.

Step 5: Use BAM file for input for multiple genome guided transcriptome assembliers, get GFF3/GTF files as output

  • Examples of genome guided transcriptome assembly programs
    • Cufflinks
    • Class2
    • Strawberry
    • StringTie
    • Note: Not recommonded to use Trinity because it generartes too many small incomplete transcripts

Step 6: Run Portcullis for splice junctions analysis

Step 7: Mikado pipeline & Quality control of Mikado pipeline output

Step 8: BRAKER & Filter TE sequences

[Step 9: Merge & Filter -- Mikado output & BRAKER output]

Tools needed:

Tool Purpose
SRA Toolkit(v ) SRA access
FASTQC (v ) Quality Control
Trimmomatic (v ) Quality Control
STAR (v ) Alignment
SamTools (v ) Tool
Strawberry (v1.1.1) Transcript Assembly
CLASS2 (v ) Transcript Assembly
StringTie (v ) Transcript Assembly
Cufflinks (v ) Transcript Assembly
Portcullis (v ) Splice junctions
Mikado (v ) Direct Inference prediction
Transdecoder (v ) CDS prediction
TeSorter (v ) identify retrotransposons
Kallisto (v ) Quality Control
RepeatMasker (v) Hardmask genome
Braker (v ) Ab initio prediction
GeneMark (v ) Ab initio prediction

Flow Chart

About

Scripts, notes and documents about the BIND annotation workflow (BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published