Predicting Issue Types with seBERT

This replication kit shows how to fine-tune and evaluate the pre-trained seBERT model for the task of issue type classification. Be aware that the fine-tuning may not run on GPUs lower than Nvida RTX5000.

If you want to live test the final model you can do so here.

Create venv and install dependencies

python3.8 -m venv .
source bin/activate
pip install -r requirements.txt

Load provided data

cd data
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-test.tar.gz
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-train.tar.gz
gunzip github-labels-top3-803k-test.tar.gz
gunzip github-labels-top3-803k-train.tar.gz

Loading the pre-trained model

cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/seBERT_pre_trained.tar.gz
tar -xzf seBERT_pre_trained.tar.gz

Loading the fine-tuned model

We provide the fine-tuned version of the model that we used here.

cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/nlbse.tar.gz
tar -xzf nlbse.tar.gz
mv model nlbse

Running the Jupyter notebooks

source bin/activate
cd notebooks
jupyter lab

Fine-tuning the pre-trained model

The fine-tuning task is using the complete training data that is provided. We provide a Jupyter Notebook to show this with notebooks/FineTuneModel.ipynb. However, this is a very resource intensive task which we ran in the HPC system of the GWDG on RTX5000 GPUs. This may not run on GPUs with less vram without modification.

Evaluating the fine-tuned model

The evaluation task uses the fine-tuned model and just classifies the test data that is provided. We provide the Jupyter Notebook notebooks/EvaluateModel.ipynb to demonstrate this. As above, this may take a long time for the data.

Test the model

The Jupyter Notebook notebooks/LiveTest.ipynb loads the fine-tuned version and can be used to play with different inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
models		models
notebooks		notebooks
CITATION.cff		CITATION.cff
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Issue Types with seBERT

Create venv and install dependencies

Load provided data

Loading the pre-trained model

Loading the fine-tuned model

Running the Jupyter notebooks

Fine-tuning the pre-trained model

Evaluating the fine-tuned model

Test the model

About

Releases

Packages

Languages

atrautsch/nlbse2022_replication_kit

Folders and files

Latest commit

History

Repository files navigation

Predicting Issue Types with seBERT

Create venv and install dependencies

Load provided data

Loading the pre-trained model

Loading the fine-tuned model

Running the Jupyter notebooks

Fine-tuning the pre-trained model

Evaluating the fine-tuned model

Test the model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages