Data, scripts and results for the EpitopeVec article by Bahai et al., Bioinformatics, 2021.
The epitope-prediction software is available at
Python 3
with the following packages:- numpy 1.17.1
- scipy 1.4.1
- matplotlib 3.1.3
- sklearn 0.22.1
- pydpi 1.0
- biopython 1.71.0
- tqdm 4.15.0
- gensim 3.8.3
If these are not installed, you can install them with
.pip3 install -r ./requirement/requirements.txt
Additionally, pydpi 1.0 from
might be incompatible with Python 3. Please install the pydpi package from the providedpydpi.tar.gz
file.pip3 install pydpi.tar.gz
Binary file for ProtVec representation of proteins can be downloaded using the following command in the
cd protvec
wget -O sp_sequences_4mers_vec.bin
Clone this repository:
git clone
To train a new machine learning model, run the training file with name of the dataset you want to train on. The datasets are inside the retraining folder (bcpreds, ibce-el, lbtope and viral). eg:
python3 bcpreds
for training on the BCPreds dataset. Uselbtope
for training on the LBTope dataset. Useibce-el
for training on the iBCE-EL training dataset.
- For training a new model, two files containing a list of confirmed positive and negative epitopes are needed. These can be .txt files with each line containing a peptide. eg: In the ./retraining/bcpreds/ folder pos.txt contains a list of petides which are epitopes and neg.txt contains non-epitopes.
For training domain-specific models, all the epitope petides should also be from the specific domain. eg: If one wants a viral-specific model, only include epitopes derived from viral proteins.
- Training a new model will create a pickle file in the /retraining/input dataset folder. The modelname.pickle is the newly trained model which can be used with the EpitopeVec software ( for testing.
To test the performance of the trained models, please use the file.
python3 model_pickle_file peptidefile
model_pickle_file is the .pickle file you trained previosly. peptidefile is a file with two columns. The first column is the peptide sequences and the second is the target (1 for epitope and 0 for non-epitope). See testing directory for a list of example files. The two columns should be tab-separated. Please provide the full name and location of the model_pickle file and the full name and location of the peptidefile. For example, if you want to test the model trained on the iBCE-EL datatset (svm-ibce-el.pcikle) on the ABCPred16 dataset run
python3 ./retraining/ibce-el/svm-ibce-el.pickle ./testing/abcpred16.txt
The becnhmarking results of various methods are inside the results folder.