AuTexTification Challenge

The team participated the AuTexTification challenge, which requires participants to execute preprocessing on those provided sentences and build a model to detect whether the sentences was generated by text generation model or human-written in subtask 1. For subtask2, participants have to generate a model that is enable to detect the sentences was generated by which text generation models as there have six different models.

In this project, the team do preprocessing on those sentences such as translating foreign languages, emoji, and emoticons into English, lemmatization, and tokenization that is generated by coresponding pre-train models. The team have try two scenario as the dataset was preprocessed as translating foreign languages, emoji, and emoticons into English and lemmatization, another did not execute those preprocessing stage just using the tokenizer from pre-trained model.

Due to the goal of the project is detecting the sentences was generated by text generation models or human-written, and human would like to use their native language to encounter those words that they do not know in English and also like to use Emoji or Emoticons to express their expression. Therefore, the team came up with the idea on evaluating on these two scenario for both subtask. Additionally, the team fine-tuned the BERT and XLM-RoBERTa models for specific the downstream task to the model for enabling it to make the certain task in the upcoming event.

This project was collaborative with my teammate whose GitHub account is https://github.com/poeysiec

AuTexTification Evaluation from the organizer

This evaluation scripts provided by the organizers will use to evaluate the submissions.

Getting started

Install the requirements pip install -r requirements.txt

Folder structure

The task_submissions folder contains two subfolders: ground_truth and submissions.

The ground_truth folder contains one folder per subtask, and each subtask, one folder per language. You should put the ground truth files (truth.tsv) in the inner folders.

The submissions folder contain as many folders as participants in the competition. Each participant folder has the same structure than the ground_truth folder (one folder per subtask, and each subtask, one folder per language). You should put all your runs ({run_name}.tsv) in the inner folders.

Evaluation script

usage: evaluate_submissions.py [-h] [--submissions_path SUBMISSIONS_PATH] [--ground_truth_path GROUND_TRUTH_PATH] {subtask_1,subtask_2} {es,en}

positional arguments:
  {subtask_1,subtask_2}
                        Subtask to evaluate
  {es,en}               Language to evaluate

optional arguments:
  -h, --help            show this help message and exit
  --submissions_path SUBMISSIONS_PATH
                        Path to the submissions folder
  --ground_truth_path GROUND_TRUTH_PATH
                        Path to the ground_truth folder

For instance, you can use the evaluation script to evaluate the submissions on the spanish variant of the subtask 2 as:

python evaluate_submissions.py \
--submissions_path task_submissions/submissions/ \
--ground_truth_path task_submissions/ground_truth/ \
subtask_2 \
es

which will return a dataframe with four columns, sorted by macro-f1 values: team (team name), run (run name), all_metrics (metrics from sklearn.classification_report), mf1 (macro-f1), and mf1_cinterval (confidence interval of macro-f1). If you run the evaluation script with the truths and preds of this repo, you will get something similar to:

      team    run                                        all_metrics  mf1 mf1_cinterval
0  my_team  truth  {'A': {'precision': 1.0, 'recall': 1.0, 'f1-sc...  1.0    (1.0, 1.0)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
task_submissions		task_submissions
.gitignore		.gitignore
AuTexTification Proposal.pdf		AuTexTification Proposal.pdf
AuTexTification Report.pdf		AuTexTification Report.pdf
AuTexTification_subtask1.ipynb		AuTexTification_subtask1.ipynb
AuTexTification_subtask2.ipynb		AuTexTification_subtask2.ipynb
LICENSE		LICENSE
README.md		README.md
evaluate_submissions.py		evaluate_submissions.py
requirements.txt		requirements.txt
test.tsv		test.tsv
test_sub_2.tsv		test_sub_2.tsv
train.tsv		train.tsv
train_sub_2.tsv		train_sub_2.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AuTexTification Challenge

AuTexTification Evaluation from the organizer

Getting started

Folder structure

Evaluation script

About

Releases

Packages

Languages

License

KokTeng00/AuTexTification_Challenge

Folders and files

Latest commit

History

Repository files navigation

AuTexTification Challenge

AuTexTification Evaluation from the organizer

Getting started

Folder structure

Evaluation script

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages