Standalone version of the prediction tool presented in the publication:
To run GP4 locally you first need to download and install the following programs on your local machine:
- SignalP 4.1 (
- SignalP 5.0 (
- TatP (
- LipoP (
- Phobius (
- InterproScan 5.33 (InterproDB 72.0):
Additionally you need to have a working version of Python (>=3.5) and Biopython (>=1.74). Please also make sure you have the following packages: pandas, numpy, decouple.
First, clone this repository in a folder of your choice:
git clone
Then modify the file conf.env adding the absolute path to the required programs.
To see the help with the possible options just type:
./ -h
We suggest to first test the installation by running a fasta file with a single protein, simply doing:
./ -i Fasta_file
usage: GP4 [-h] -i INPUT [-o OUTPUT] [-n NAME] [-t THREADS]
[-r {data,pred,all}] [-v]
Script to perform a proteomic based consensus prediction of protein
localization in Gram+. Written by Stefano Grasso (c).
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file. If you chose -r "pred" then indicate the
folder with the generated data.
-o OUTPUT, --output OUTPUT
Output folder. Default: Results
-n NAME, --name NAME Output file name. Default: same as input name. If
already existing a short string will be attached to
-t THREADS, --threads THREADS
Number of threads to be used. Default: all.
-r {data,pred,all}, --run {data,pred,all}
'data' to generate data; 'pred' requires generated
data as input; 'all' (default).
-v, --version show program's version number and exit
The most basic command is:
./ -i Fasta_file
Results will be saved by default in 'Results' within the 'Fasta_file' folder. If already existing a string will be attached in order to avoid overwriting. By default both data are generated and predictions performed.
To specify an output folder and a new name run:
./ -i Fasta_file -o New-results-folder -n New-name
Results will be saved in New-results-folder/New-name.
You can also decide to run only the module to generate the data (for instance if you have a big dataset):
./ -i Fasta_file -o New-results-folder -n New-name -r data
Then when you are ready to generate the final predictions you can run:
./ -i New-results-folder/New-name -r pred
Note: now the input is not the Fasta_file anymore, but the folder containing the data. By default the mode is 'all' so data and prediction will be generated in a single run.
You can also select the number of threads to generate the data in parallel (this affect only the 'data' module):
./ -i Fasta_file -o New-results-folder -n New-name -t 8
Note: by deafult all available threads will be used.
If you use our tool in your paper please cite:
Stefano Grasso, Tjeerd van Rij, Jan Maarten van Dijl, "GP4: an integrated Gram-Positive Protein Prediction Pipeline for subcellular localization mimicking bacterial sorting", Briefings in Bioinformatics, Volume 22, Issue 4, July 2021, bbaa302,
If you run in any issue specific with GP4, please report it. We'll try to help you! In the meanwhile you can run your predictions at: