BIND annotation

BIND = Braker combined with gene predictions INferred Directly

Scripts, notes and documents about the BIND annotation workflow (BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.) Published December 20, 2021

GitHub Repo

Overview of Steps

Note: I recommend that you review the challenges document/scripts.

Step 1: Download data from SRA

Step 2: Run fastqc, trim reads and concatenate reads together

Step 3: Ensure fasta header names in reference genome do not have the following items:

Training white space
White space between numbers or letters
A colon
An underscore

Step 4: Align reads to a reference genome & generate a BAM file

Examples of alignment programs
- STAR
- HiStat2
- TopHat2
If using both Paired-end and Single end reads: merge PE and SE BAM files and sort combined BAM files
If combined BAM file is larger than 100 GB it is recommened to split the BAM file by chromsome
- A large BAM file could take weeks to finish. See challenges.

Step 5: Use BAM file for input for multiple genome guided transcriptome assembliers, get GFF3/GTF files as output

Examples of genome guided transcriptome assembly programs
- Cufflinks
- Class2
- Strawberry
- StringTie
- Note: Not recommonded to use Trinity because it generartes too many small incomplete transcripts

Step 6: Run Portcullis for splice junctions analysis

Step 7: Mikado pipeline & Quality control of Mikado pipeline output

Step 8: BRAKER & Filter TE sequences

NOTE: There are multiple ways to run BRAKER. I recommend reviewing documentation for BRAKER2 and review BRAKER2 tutorial
- Manual about BRAKER with RNA-Seq data
  - Scripts and notes in this repo about running BRAKER with RNA-Seq data only
- Manual about BRAKER with RNA-Seq and Protein data
  - Scripts and notes in this repo about running BRAKER with RNA-Seq and protein data

[Step 9: Merge & Filter -- Mikado output & BRAKER output]

Tools needed:

Tool	Purpose
SRA Toolkit(v )	SRA access
FASTQC (v )	Quality Control
Trimmomatic (v )	Quality Control
STAR (v )	Alignment
SamTools (v )	Tool
Strawberry (v1.1.1)	Transcript Assembly
CLASS2 (v )	Transcript Assembly
StringTie (v )	Transcript Assembly
Cufflinks (v )	Transcript Assembly
Portcullis (v )	Splice junctions
Mikado (v )	Direct Inference prediction
Transdecoder (v )	CDS prediction
TeSorter (v )	identify retrotransposons
Kallisto (v )	Quality Control
RepeatMasker (v)	Hardmask genome
Braker (v )	Ab initio prediction
GeneMark (v )	Ab initio prediction

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
Input_data		Input_data
challenges		challenges
images		images
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BIND annotation

BIND = Braker combined with gene predictions INferred Directly

Overview of Steps

Tools needed:

Flow Chart

About

Releases

Packages

Contributors 2

Languages

PeanutBase/BIND_annotation

Folders and files

Latest commit

History

Repository files navigation

BIND annotation

BIND = Braker combined with gene predictions INferred Directly

Overview of Steps

Tools needed:

Flow Chart

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages