Skip to content

hdevillers/go-fannot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-FAnnoT: Functional Annotation Transfer tool in Golang

About

go-FAnnoT is functional annotion transfer tool based on protein homology. Our motivations to develop this tools were manyfold:

  • Defining a precise strategy to build reference datasets. Indeed, most of time, transfer tools consider the annotation of one closely related species annotation as reference, copying possible errors. While it is necessary to adapt reference proteins to the organisms, a more robust strategy is required to ensure the quality of functional annotations.
  • Evaluating homology from global alignment and not from a local alignment. Most of the existing tools identify matches on a basis of BLAST search. Unfortunatly, measuring homology on BLAST alignment is not sufficient and sequences should be realigned with a global alignment tool.
  • Allowing a flexible thresold setting. In addition to reference datasets, homology thresholds should depends on the organism to annotate. Hence, for example, it can be necessary to lower threshold for species that does not have closely related species in reference databases.
  • Standardizing functional annotation in sequence files. This latter aspect is critical to facilitate annotation comparisons.

Hence, go-FAnnoT broadly consists in the following steps:

  1. Extracting reference datasets from rich and high quality databases. We decided to use Uniprot and TrEMBL.
  2. Building a hierarchy between the different reference datasets.
  3. Defining rules (different levels of homolgy) to transfer annotation.
  4. Process each input proteins iteratively against each datasets until finding a suitable annotation.
  5. (optional) Complete annotation with InterProScan functional domain prediction.
  6. Produce standardized functional annotations.

Requierments

Download Uniprot and TrEMBL databases

Our tool has been design to use Uniprot databases (SwissProt or TrEMBL). The complete SwissProt database can be downloaded here (choose the file uniprot_sprot.dat.gz)

Concerning the TrEMBL data, it is recommanded to download only a subset of the database as the complete one is too loarge. Thus, taxon level subsets are available here.

External tools

To run go-FAnnoT, it is necessary to have NCBI-BLAST+ tool suite and NEEDLE (from EMBOSS tool suite) in the system PATH. To do so, there are several solutions:

  • Use a conda environment with these two tools.
  • (Or) Install these tools. Binaries are available at the following urls:
  • (Or, for linux only) Most of the recent distributions have these tools available directly in there repositories:
# Example with Ubuntu
apt-get install ncbi-blast+ emboss

Install go-FAnnoT

Build the project from source (github)

To build the project you will have to install Go (see instructions here).

Then clone this repository:

git clone https://github.com/hdevillers/go-fannot.git

Enter the go-fannotdirectory and build the project with make instructions:

cd go-fannot
make
make test

For linux and macos, binary can be installed by running make install with administrator rights. The default installation path is /usr/local/bin/. It is possible to indicate a different installation path as follow:

make install -prefix my/install/path

Download binaries

Precompiled binaries for all platforms will be available soon.

Licence

MIT