Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Move datasets to MLDatasets.jl #22

Closed
4 tasks
darsnack opened this issue Apr 2, 2021 · 1 comment
Closed
4 tasks

Move datasets to MLDatasets.jl #22

darsnack opened this issue Apr 2, 2021 · 1 comment
Labels
good first issue Good for newcomers gsoc-proposal Good issues to tackle for GSoC proposals help wanted Contributions welcome!

Comments

@darsnack
Copy link
Member

darsnack commented Apr 2, 2021

In the long term, we'd like most of the src/datasets code to move to MLDatasets.jl. To make this happen, we need a refactor of MLDatasets.jl to be more extensible and build on top of LearnBase.jl. Below is the structure envisioned for MLDatasets.jl:

  1. Low-level API: structs for different types of I/O (e.g. FileDataset) that support reading from the underlying I/O via getobs and nobs from LearnBase.jl
  2. High-level API: specific datasets (e.g. CIFAR10) implement using the low-level API

To achieve this goal, we need to complete the following stages:

  • Move data containers (e.g. FileDataset) to MLDatasets.jl
  • Move data container transformations (e.g. mapobs, groupsobs, etc.) to MLDataPattern.jl (these transformations apply generically to any iterator of observations, not just data containers)
  • Refactor existing data sets in MLDatasets.jl to utilize the low-level APIs
  • Move FastAI.jl datasets to MLDatasets.jl
@darsnack darsnack added good first issue Good for newcomers gsoc-proposal Good issues to tackle for GSoC proposals help wanted Contributions welcome! labels Apr 2, 2021
@lorenzoh
Copy link
Member

Closed by #229

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
good first issue Good for newcomers gsoc-proposal Good issues to tackle for GSoC proposals help wanted Contributions welcome!
Projects
None yet
Development

No branches or pull requests

2 participants