The team participated the AuTexTification challenge, which requires participants to execute preprocessing on those provided sentences and build a model to detect whether the sentences was generated by text generation model or human-written in subtask 1. For subtask2, participants have to generate a model that is enable to detect the sentences was generated by which text generation models as there have six different models.
In this project, the team do preprocessing on those sentences such as translating foreign languages, emoji, and emoticons into English, lemmatization, and tokenization that is generated by coresponding pre-train models. The team have try two scenario as the dataset was preprocessed as translating foreign languages, emoji, and emoticons into English and lemmatization, another did not execute those preprocessing stage just using the tokenizer from pre-trained model.
Due to the goal of the project is detecting the sentences was generated by text generation models or human-written, and human would like to use their native language to encounter those words that they do not know in English and also like to use Emoji or Emoticons to express their expression. Therefore, the team came up with the idea on evaluating on these two scenario for both subtask. Additionally, the team fine-tuned the BERT and XLM-RoBERTa models for specific the downstream task to the model for enabling it to make the certain task in the upcoming event.
This project was collaborative with my teammate whose GitHub account is https://github.com/poeysiec
This evaluation scripts provided by the organizers will use to evaluate the submissions.
Install the requirements
pip install -r requirements.txt
The task_submissions
folder contains two subfolders: ground_truth
and submissions
.
The ground_truth
folder contains one folder per subtask, and each subtask, one folder per language. You should put the ground truth files (truth.tsv
) in the inner folders.
The submissions
folder contain as many folders as participants in the competition. Each participant folder has the same structure than the ground_truth
folder (one folder per subtask, and each subtask, one folder per language). You should put all your runs ({run_name}.tsv
) in the inner folders.
usage: evaluate_submissions.py [-h] [--submissions_path SUBMISSIONS_PATH] [--ground_truth_path GROUND_TRUTH_PATH] {subtask_1,subtask_2} {es,en}
positional arguments:
{subtask_1,subtask_2}
Subtask to evaluate
{es,en} Language to evaluate
optional arguments:
-h, --help show this help message and exit
--submissions_path SUBMISSIONS_PATH
Path to the submissions folder
--ground_truth_path GROUND_TRUTH_PATH
Path to the ground_truth folder
For instance, you can use the evaluation script to evaluate the submissions on the spanish variant of the subtask 2 as:
python evaluate_submissions.py \
--submissions_path task_submissions/submissions/ \
--ground_truth_path task_submissions/ground_truth/ \
subtask_2 \
es
which will return a dataframe with four columns, sorted by macro-f1 values: team (team name), run (run name), all_metrics (metrics from sklearn.classification_report), mf1 (macro-f1), and mf1_cinterval (confidence interval of macro-f1). If you run the evaluation script with the truths and preds of this repo, you will get something similar to:
team run all_metrics mf1 mf1_cinterval
0 my_team truth {'A': {'precision': 1.0, 'recall': 1.0, 'f1-sc... 1.0 (1.0, 1.0)