Skip to content

Latest commit

 

History

History
306 lines (222 loc) · 8.61 KB

INSTALL.md

File metadata and controls

306 lines (222 loc) · 8.61 KB

Installation of FunGAP v1.0.1

*Last updated: Jan 7, 2019

FunGAP is freely available for academic use. For the commerical use or license of FunGAP, please contact In-Geol Choi (email: igchoi (at) korea.ac.kr). Please, cite the following reference

Reference: Byoungnam Min Igor V Grigoriev In-Geol Choi, FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation (2017), Bioinformatics, Volume 33, Issue 18, Pages 2936–2937, https://doi.org/10.1093/bioinformatics/btx353


Because FunGAP implements many dependent programs, you may encounter issues during installation. Please don't hesitate to post on *Issues* or contact me (mbnmbn00@gmail.com) for help.

These steps were tested and confirmed in freshly installed Ubuntu 18.04 LTS.

Insall FunGAP using Docker

Using Docker is the most reliable and robust way to install FunGAP. https://gist.github.com/lmtani/d37343a40e143b59336e4606055d1723

  • This Docker container has been developed and maintained by Lucas Taniguti

Install FunGAP using conda

Although we recommend using Docker, some workspaces are not available for Docker (e.g., HPC). Please use the following instruction for conda-based FunGAP installation.

0. FunGAP requirements

0.1. Required softwares (and tested versions)

  1. Hisat2 v2.1.0
  2. Trinity v2.6.6
  3. RepeatModeler v1.0.11
  4. Maker v2.31.10
  5. GeneMark-ES/ET v4.46
  6. Augustus v3.3
  7. Braker v1.9
  8. BUSCO v3.0.2
  9. Pfam_scan v1.6
  10. BLAST v2.6.0+
  11. Samtools v1.9
  12. Bamtools v2.4.1

0.2. Required databases

  1. BUSCO odb9
  2. Pfam release 32.0

1. Setup Anaconda environment

For robust installation, we recommend to use Anaconda environment and install dependent programs and libraries as much as possible in the environment.

1.1. Install Anaconda2 (v4.5.12 tested)

Download and install Anaconda2 (We assume that you install it in $HOME/anaconda2)

cd $HOME
wget https://repo.continuum.io/archive/Anaconda2-2018.12-Linux-x86_64.sh
bash Anaconda2-2018.12-Linux-x86_64.sh

1.2. Set conda environment

echo ". ~/anaconda2/etc/profile.d/conda.sh" >> ~/.bashrc
source ~/.bashrc

1.3. Create and activate an environment

conda update conda
conda create -n fungap
conda activate fungap

1.4. Add channels

This step is essential; otherwise, Maker will stop.

conda config --remove channels bioconda
conda config --remove channels conda-forge
conda config --add channels bioconda/label/cf201901
conda config --add channels conda-forge/label/cf201901

1.5. Install dependencies

conda install augustus rmblast maker hisat2 braker busco blast pfam_scan
pip install biopython bcbio-gff networkx markdown2 matplotlib
cpanm Hash::Merge Logger::Simple Parallel::ForkManager YAML

2. Download and install FunGAP

2.1. Download FunGAP

Download FunGAP using GitHub clone. Suppose we are installing FunGAP in your $HOME directory, but you are free to change the location. $FUNGAP_DIR is going to be your FunGAP installation directory.

cd $HOME
git clone https://github.com/CompSynBioLab-KoreaUniv/FunGAP.git
cd FunGAP/
export FUNGAP_DIR=$(pwd)

3. Download databases

Download Pfam and BUSCO databases in your $FUNGAP_DIR/db directory.

3.1. Pfam DB download

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release

cd $FUNGAP_DIR  # Change directory to FunGAP installation directory
mkdir -p db/pfam
cd db/pfam
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
gunzip Pfam-A.hmm.gz
gunzip Pfam-A.hmm.dat.gz
hmmpress Pfam-A.hmm  # HMMER package (would be automatically installed in the above Anaconda step)

3.2. BUSCO DB download

There are various databases in BUSCO, so just download one of them fitted to your target genome. Here are example commands.

cd $FUNGAP_DIR
mkdir -p db/busco
cd db/busco
wget https://busco.ezlab.org/datasets/fungi_odb9.tar.gz
wget https://busco.ezlab.org/datasets/ascomycota_odb9.tar.gz
wget https://busco.ezlab.org/datasets/basidiomycota_odb9.tar.gz
tar -zxvf fungi_odb9.tar.gz
tar -zxvf ascomycota_odb9.tar.gz
tar -zxvf basidiomycota_odb9.tar.gz

4. Install GeneMark

Go to this site and download GeneMark-ES/ET. http://topaz.gatech.edu/GeneMark/license_download.cgi Don't forget to download the key, too.

4.1. Uncompress downloaded files

mkdir $FUNGAP_DIR/external/
mv gm_et_linux_64.tar.gz gm_key_64.gz $FUNGAP_DIR/external/  # Move your downloaded files to this directory
cd $FUNGAP_DIR/external/
tar -zxvf gm_et_linux_64.tar.gz
gunzip gm_key_64.gz
cp gm_key_64 ~/.gm_key

4.2. Change perl path

GeneMark forces to use /usr/bin/perl instead of conda-installed perl. You can change this by running change_path_in_perl_scripts.pl script.

cd $FUNGAP_DIR/external/gm_et_linux_64/
ln -s other/reformat_fasta.pl .  # It is a bug in v4.46 (checked on Sep 4, 2019)
perl change_path_in_perl_scripts.pl "/usr/bin/env perl"

4.3 Check GeneMark and its dependencies are correctly installed.

cd $FUNGAP_DIR/external/gm_et_linux_64/
./gmes_petap.pl

5. RepeatModeler installation

Note: RepeatModerler is available in Anaconda2 (https://anaconda.org/bioconda/repeatmodeler), but the conda-installed program does not work at the moment. Installation seemed okay, but when I ran, I got no results. I will update this whenever working RepeatModeler is available.

5.1. Check perl version.

perl -v

It should be > 5.8.8

5.2. Install RECON 1.08

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz
tar -zxvf RECON-1.08.tar.gz
cd RECON-1.08/src/
make
make install

5.3. Install RepeatScout 1.0.5

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz
tar -zxvf RepeatScout-1.0.5.tar.gz 
cd RepeatScout-1
make

5.4. Install NSEG

cd $FUNGAP_DIR/external/
mkdir nseg
cd nseg
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.h
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/lnfac.h
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/makefile
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nmerge.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nseg.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/runnseg
make

5.5. Install RepeatMasker 4.0.8

I could not use conda-installed RepeatMasker for RepeatModeler installation. So I manually installed.

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-8.tar.gz
tar -zxvf RepeatMasker-open-4-0-8.tar.gz
cd RepeatMasker
perl ./configure
  • Note: trf and rmblastn are located at ~/anaconda2/envs/fungap/bin.

5.6. Install RepeatModeler 1.0.11

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.11.tar.gz
tar -zxvf RepeatModeler-open-1.0.11.tar.gz
cd RepeatModeler-open-1.0.11/
perl ./configure
  • Note: trf and rmblastn is located at ~/anaconda2/envs/fungap/bin

5.7. Check the installation

cd $FUNGAP_DIR/external/RepeatModeler-open-1.0.11/
./BuildDatabase --help
./RepeatModeler --help

6. Trinity installation

Download and compile Trinity

cd $FUNGAP_DIR/external
wget https://github.com/trinityrnaseq/trinityrnaseq/archive/Trinity-v2.8.5.tar.gz
tar -zxvf Trinity-v2.8.5.tar.gz 
cd trinityrnaseq-Trinity-v2.8.5/
sudo apt-get install cmake  # cmake is required to compile Trinity
make
make plugins

Add to $PATH variable

echo "export PATH=$PATH:$FUNGAP_DIR/external/trinityrnaseq-Trinity-v2.8.5/" >> ~/.bashrc
source ~/.bashrc

7. Configure FunGAP

This script allows users to set and test (by --help command) all the dependencies. If this script runs without any issue, you are ready to run FunGAP!

cd $FUNGAP_DIR
python set_dependencies.py \
  --pfam_db_dir db/pfam \
  --busco_db_dir db/busco/basidiomycota_odb9/ \
  --genemark_dir external/gm_et_linux_64/gmes_petap/ \
  --repeat_modeler_dir external/RepeatModeler-open-1.0.11