Program to predict Protein Stability Index (PSI) from the sequence. ML models were developed based on experimental stability datasets for a 24/23-mer covering the N-/C-terminus of the human proteome using the CatBoost regressor. The performance of the final models was evaluated using the testing set and an R2 coefficient, reaching the values of 0.796/0.812 for the N-terminus with initiator methionine cleaved/not cleaved, respectively, and 0.815 for the C-terminus (the highest possible value of R2 coefficient is 1). See the paper for details and the DEGRONOPEDIA Tutorial for more information.
The web version of this tool (and much more!) is available at: https://degronopedia.com/ which is a web server for screening for degron motifs and providing insights into the possible degradation of your favorite proteins by the ubiquitin-proteasome system.
Tested for python versions 3.8, 3.9, 3.10, and 3.11.
# clone the repo
git clone --depth 1 https://github.com/filipsPL/degronopedia-ml-psi
# create a new conda environment
conda env create -f conda.yml
Input is a sequence in a plain text format, eg:
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRT
Options:
- the file with sequence
- which terminus predict PSI for. Choices:
C
for C-terminusNiMetNo
for N-terminus with initiator Met cleavedNiMetYes
for N-terminus with initiator Met NOT cleaved
Running the program:
# activate the environment
conda activate dp
# run the program
./calculate-desc.py --sequence sequence.txt --type NiMetYes
Output will be:
N-terminus with initiator Met NOT cleaved
Predicted PSI: 5.24
Predictions are made based on the datasets of experimental PSI values, which describe the stability of protein N-/C-terminus in an artificial system where 23-mers covering the termini of nearly entire human proteome were conjugated to GFP protein, and their stability was measured relative to the stability of DsRed protein translated from the same transcript using the Global Protein Stability (GPS) high-throughput technique (Koren et al., 2018 and Timms et al., 2019). Therefore, these values provide insight into the stability of the N-/C-terminus of the query but to a limited extent. Several peptides with low PSI values were experimentally validated to be degraded by the cullin-RING E3 ligase complexes by the authors of the aforementioned GPS studies. However, medium or higher PSI values do not rule out the regulation of such termini by N-/C-degron pathways, as other factors may influence this, including tissue specificity, posttranslational modifications, stress conditions, etc.
As the ML training set consist of human peptides, we recommend to run the PSI prediction for sequences from higher mammals only.
To run tests on sample sequences, execute tests.sh
.
Szulc, N. A., Stefaniak, F.
Szulc, N. A., Stefaniak, F., Piechota, M., Cappannini, A., Bujnicki, J. M., & Pokrzywa, W. (2022). DEGRONOPEDIA - a web server for proteome-wide inspection of degrons. bioRxiv. https://doi.org/10.1101/2022.05.19.492622