pre-submission enquiry - tidync #167

mdsumner · 2017-12-07T09:58:32Z

tidync provides a tidyverse-inspired approach to scientific data in NetCDF format. This is a very general array-based specification with explicit metadata on array dimensions (marginal coordinates), and variables (the data values) and various metadata as attributes on those dimensions, variables and their containers.

https://github.com/hypertidy/tidync

I consider this to be a "data munging" and "data extraction" category, it is designed to make exploring and understanding a NetCDF source very easy, lazy, and tidy. It's not overly designed to make life easy for new users, but that's because these sources cover a very general range of forms and simplification is not possible (IMO) except within specific sub-domains. There are formal sub-domains, such as CF-conventions - but there's a huge wealth of data out there in this format that doesn't adhere to any particular standard.

The target audience for me is users who learn so much about their particular sub-domain that they become programmers helping others in that arena. They either wrap around tidync to build an interface to a NetCDF source-family, or simply use it to learn to craft lower level calls more directly to the API (with packages RNetCDF, ncdf4, rgdal, rhdf5, etc). tidync provides a "database-view" of a NetCDF source, imagining the variables in a shared grid within a source as columns in a table, with all actual read action or expansion of dimension coordinates delayed as late as possible.

I don't believe there's any overlap currently in terms of this "database view" - and in fact that's the biggest hole as far as I'm concerned for the tidyverse and ggplot2 in particular. The dplyr tbl_cube is the nearest and the in-development stars does have some overlap. but I think the virtual table abstraction in tidync is novel, albeit very heavily inspired by the "laziness" of ggplot2 and the multiple-tables approach in tidygraph.

I consider it very close to CRAN-readyness, but I'm a bit stuck on the activate and hyper_tibble(..., select_var = ...) relationship to hone in on shared grids versus picking on specific variables. NetCDF is so general, it might have many grids (virtual tables) or many variables on one grid, and so it's not obvious which to make obvious as a default.

It might be too general in scope, so I appreciate consideration as to whether this would be an appropriate package for review with rOpenSci.

Thanks!

The text was updated successfully, but these errors were encountered:

karthik · 2017-12-12T22:54:48Z

👋 @mdsumner!
After discussion with the other editors, I am happy to let you know that we think this is a good fit and invite a full submission.

mdsumner · 2017-12-13T06:57:34Z

Great, thanks appreciate the confirmation!

sckott added the 0/presubmission label Dec 7, 2017

karthik closed this as completed Dec 13, 2017

This was referenced Dec 19, 2017

rOpenSci submission notes ropensci/tidync#57

Closed

Submission: tidync #174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-submission enquiry - tidync #167

pre-submission enquiry - tidync #167

mdsumner commented Dec 7, 2017 •

edited

Loading

karthik commented Dec 12, 2017

mdsumner commented Dec 13, 2017

pre-submission enquiry - tidync #167

pre-submission enquiry - tidync #167

Comments

mdsumner commented Dec 7, 2017 • edited Loading

karthik commented Dec 12, 2017

mdsumner commented Dec 13, 2017

mdsumner commented Dec 7, 2017 •

edited

Loading