-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Feature: training dataset maintenance #49
Comments
@josh-chamberlain To make sure I fully understand the workflow:
Additionally, when we are talking about |
|
I updated the readme for this repo and tweaked this issue slightly—I think using Hugging Face as a database for un-labeled URLs is not needed. We can track batches by ID in github, but we don't need to put them in hugging face before they're labeled. Hopefully this is much simpler. |
Context
Now that we've done it a few times, let's be systematic about how we update the base training dataset in Hugging Face.
Requirements
training-urls
as the canonical dataset in Hugging Face (HF), which we will use Pull Requests in to maintain. We will likely remove the other ones.Docs
The text was updated successfully, but these errors were encountered: