Link to Website: https://aclanthology.org/
-
code/webscrape.py : Contains function to scrape and download pdfs from ACL Anthology website, based on conference name, year and number of pdfs
- Example Usage:
- from webscrape import scrape_pdfs
- scrape_pdfs('acl','2022', 5)
- Example Usage:
-
code/runner.py : Contains functions to train the model and test via a test set
- Example Usage:
- python code/runner.py --train --model_name 'roberta-large' --epochs 5 --batch_size 4 --lr 2e-5 --output_dir 'models/roberta-large' --train_data 'data/train'
- python code/runner.py --test --model_name 'models/roberta-large' --batch_size 4 --output_dir 'models/roberta-large' --test_data 'data/test'
- Example Usage:
-
code/prediction.py : Contains code to predict on the Kaggle test sets
- Example Usage:
- python code/prediction.py --model_name 'models/roberta-large' --test_csv 'data/test.csv' --output_csv 'data/outputs.csv'
- Example Usage:
-
data folder : folder containing all scientific paper data for all 3 conferences
- Each folder has a pdfs folder, tokens folder, and an annotations folder.
-
test_webscrape.ipynb : Examples of how to use the webscrape function