Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Are dev/test sets used for training? #187

Open
yulonglin opened this issue Mar 30, 2023 · 0 comments
Open

Are dev/test sets used for training? #187

yulonglin opened this issue Mar 30, 2023 · 0 comments

Comments

@yulonglin
Copy link

yulonglin commented Mar 30, 2023

A few datasets are used for training: NUCLE, Lang-8, FCE, WI and LOCNESS. Do you only use the training sets, or also the development and test sets?

Screenshot 2023-03-30 at 20 17 03

Noticeably, you evaluate on the BEA-2019 dev set, which includes WI and LOCNESS, so I would imagine you only train on the training sets of the datasets above?

My source of confusion is from your dataset sizes and how they differ from the follow-up work: https://arxiv.org/pdf/2203.13064.pdf

It seems that you used the full FCE dataset for GECTOR, and only the FCE training set for the ensembling paper.

Screenshot 2023-03-30 at 20 24 34

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant