Organizing and cataloging datasets and data access methods used in OHW projects.
Here are my thoughts on this project: It would be cool to have a list of all the datasets OHW participants have used over the years and some code examples of how to use them. Data access and wrangling are valuable skills to have and to struggle with, but sometimes, finding the right dataset or navigating server requests can be a barrier to success during the condensed timeline of OHW. I figure if we can organize the projects from past OHW events and make them easily navigable on the website, it'll help future participants get started quicker working with the data they're looking for.
The general plan is to complete and document the full workflow for 1 project to start with. The workflow might be something like:
- Pick a project repo.
- Identify what datasets they used (and potentially for what).
- Find their data access code.
- If found, copy it into a new file/notebook in this repository and test it out.
- If it's broken, try to fix it! Document all external resources used when fixing it.
- If/when it works, add it to the OHW website.
- Check with the larger OHW team about where these examples should go.
- Get input from others on the usefulness before doing any more.
We already have organized lists of projects from each year of OHW on the various websites:
- OHW23 project list
- OHW22 project list
- OHW21 project list
- OHW20 project list
- OHW19 project list
- OHW18 project list
- (OHW24 to be added!)
- Derya Gumustel
- Adam Kemberling
- Kasanda Lassagne
- Boris Shapkin
- Valentina Staneva
- (other collaborators, please feel free to add yourselves here!)
OHW_project_list.md
is where I'm throwing every project from over the years. Each project's title links to the corresponding GitHub repository, and as many linked datasets as can be found will be listed with each project.
An AI chatbot to assist OceanHackWeek participants with questions about projects, datasets, and methods.