This replication kit shows how to fine-tune and evaluate the pre-trained seBERT model for the task of issue type classification. Be aware that the fine-tuning may not run on GPUs lower than Nvida RTX5000.
If you want to live test the final model you can do so here.
python3.8 -m venv .
source bin/activate
pip install -r requirements.txt
cd data
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-test.tar.gz
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-train.tar.gz
gunzip github-labels-top3-803k-test.tar.gz
gunzip github-labels-top3-803k-train.tar.gz
cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/seBERT_pre_trained.tar.gz
tar -xzf seBERT_pre_trained.tar.gz
We provide the fine-tuned version of the model that we used here.
cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/nlbse.tar.gz
tar -xzf nlbse.tar.gz
mv model nlbse
source bin/activate
cd notebooks
jupyter lab
The fine-tuning task is using the complete training data that is provided.
We provide a Jupyter Notebook to show this with notebooks/FineTuneModel.ipynb
.
However, this is a very resource intensive task which we ran in the HPC system of the GWDG on RTX5000 GPUs. This may not run on GPUs with less vram without modification.
The evaluation task uses the fine-tuned model and just classifies the test data that is provided.
We provide the Jupyter Notebook notebooks/EvaluateModel.ipynb
to demonstrate this.
As above, this may take a long time for the data.
The Jupyter Notebook notebooks/LiveTest.ipynb
loads the fine-tuned version and can be used to play with different inputs.