You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tidync provides a tidyverse-inspired approach to scientific data in NetCDF format. This is a very general array-based specification with explicit metadata on array dimensions (marginal coordinates), and variables (the data values) and various metadata as attributes on those dimensions, variables and their containers.
I consider this to be a "data munging" and "data extraction" category, it is designed to make exploring and understanding a NetCDF source very easy, lazy, and tidy. It's not overly designed to make life easy for new users, but that's because these sources cover a very general range of forms and simplification is not possible (IMO) except within specific sub-domains. There are formal sub-domains, such as CF-conventions - but there's a huge wealth of data out there in this format that doesn't adhere to any particular standard.
The target audience for me is users who learn so much about their particular sub-domain that they become programmers helping others in that arena. They either wrap around tidync to build an interface to a NetCDF source-family, or simply use it to learn to craft lower level calls more directly to the API (with packages RNetCDF, ncdf4, rgdal, rhdf5, etc). tidync provides a "database-view" of a NetCDF source, imagining the variables in a shared grid within a source as columns in a table, with all actual read action or expansion of dimension coordinates delayed as late as possible.
I don't believe there's any overlap currently in terms of this "database view" - and in fact that's the biggest hole as far as I'm concerned for the tidyverse and ggplot2 in particular. The dplyrtbl_cube is the nearest and the in-development stars does have some overlap. but I think the virtual table abstraction in tidync is novel, albeit very heavily inspired by the "laziness" of ggplot2 and the multiple-tables approach in tidygraph.
I consider it very close to CRAN-readyness, but I'm a bit stuck on the activate and hyper_tibble(..., select_var = ...) relationship to hone in on shared grids versus picking on specific variables. NetCDF is so general, it might have many grids (virtual tables) or many variables on one grid, and so it's not obvious which to make obvious as a default.
It might be too general in scope, so I appreciate consideration as to whether this would be an appropriate package for review with rOpenSci.
Thanks!
The text was updated successfully, but these errors were encountered:
tidync
provides a tidyverse-inspired approach to scientific data in NetCDF format. This is a very general array-based specification with explicit metadata on array dimensions (marginal coordinates), and variables (the data values) and various metadata as attributes on those dimensions, variables and their containers.https://github.com/hypertidy/tidync
I consider this to be a "data munging" and "data extraction" category, it is designed to make exploring and understanding a NetCDF source very easy, lazy, and tidy. It's not overly designed to make life easy for new users, but that's because these sources cover a very general range of forms and simplification is not possible (IMO) except within specific sub-domains. There are formal sub-domains, such as CF-conventions - but there's a huge wealth of data out there in this format that doesn't adhere to any particular standard.
The target audience for me is users who learn so much about their particular sub-domain that they become programmers helping others in that arena. They either wrap around tidync to build an interface to a NetCDF source-family, or simply use it to learn to craft lower level calls more directly to the API (with packages RNetCDF, ncdf4, rgdal, rhdf5, etc).
tidync
provides a "database-view" of a NetCDF source, imagining the variables in a shared grid within a source as columns in a table, with all actual read action or expansion of dimension coordinates delayed as late as possible.I don't believe there's any overlap currently in terms of this "database view" - and in fact that's the biggest hole as far as I'm concerned for the tidyverse and ggplot2 in particular. The
dplyr
tbl
_cube is the nearest and the in-developmentstars
does have some overlap. but I think the virtual table abstraction in tidync is novel, albeit very heavily inspired by the "laziness" of ggplot2 and the multiple-tables approach in tidygraph.I consider it very close to CRAN-readyness, but I'm a bit stuck on the
activate
andhyper_tibble(..., select_var = ...)
relationship to hone in on shared grids versus picking on specific variables. NetCDF is so general, it might have many grids (virtual tables) or many variables on one grid, and so it's not obvious which to make obvious as a default.It might be too general in scope, so I appreciate consideration as to whether this would be an appropriate package for review with rOpenSci.
Thanks!
The text was updated successfully, but these errors were encountered: