Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Suggested addition: add list of suggested data type terms and their definitions #6

Open
regnans opened this issue Jan 22, 2021 · 9 comments
Labels
ESS-DIVE review Issue needs review by ESS-DIVE team

Comments

@regnans
Copy link

regnans commented Jan 22, 2021

Submitter: Kim Ely

I suggest the following changes: on the page https://github.com/ess-dive-community/essdive-file-level-metadata/blob/master/CSV_dd/CSV_dd_quick_guide.md#data-type

It would be really useful to have some defined terms here. Terms like "text" and "string", and "numeric" and "integer" (and other number types) are often confused and (incorrectly) used interchangeably. A defined list would be really useful. Some other types to consider could be logical, percent, fraction.

@regnans
Copy link
Author

regnans commented Jan 25, 2021

Some specific questions/examples:
What is the unit for a Date field? Is it the "format" of the date, i.e. YYYY-MM-DD, or is it the smallest unit, i.e. day? Or something else?
Should a numeric ratio have a unit or N/A?

@wavingtowaves
Copy link

@tvelliquette Wanted to bring you attention to this.
@regnans and Terri do you think it would be helpful to have these definitions within the metadata "element" called data-type or elsewhere?

@regnans
Copy link
Author

regnans commented Jan 28, 2021

I see the simplest solution would be to have this information within the "Standard definition" row of the Data type table. Or it could be added as another table immediately below. (I realize that this request is not a quick fix, but as a data contributor I would find it really useful, as I struggle with using data type terms consistently).

@kristinboye
Copy link

Looking through the currently uploaded (in various stages) reporting formats there is no consistency with e.g. date and time reporting format (even though each reporting format requires a specific way of reporting date and time...), ways of constructing "terminology files/data dictionary files" etc. So building on Kim's and others' previous comments, I think we need to harmonize the terminology, formatting (when appropriate), and requirements across the ESS-DIVE reporting formats to simplify for the average data producer/archiver.

@regnans
Copy link
Author

regnans commented Mar 8, 2021

An update on "data type" terms. A BNL we have settled (for now), on assigning the following data types in our data dictionaries: integer, floating point, string, date, time, date-time.

@wavingtowaves
Copy link

wavingtowaves commented Mar 8, 2021 via email

@regnans
Copy link
Author

regnans commented Mar 8, 2021

To clarify, my post was an FYI, not necessarily how it should be done. Very happy to take advice from data scientists here! I have a lot to learn about the meanings and implications of using different sorts of data types.

Also, with our current usage of these terms we are not using "numeric" or "text". (Although perhaps "text" is appropriate for a field that included multiple sentences of text.)

@wavingtowaves
Copy link

@vchendrix As Terri and the ORNL team are finishing up the File-level metadata reporting format, Terri sent over a question about whether or not having a place for users of this format to report which data type they are providing in their data sheets would be useful.

Notes from Terri:

tvelliquette added a commit that referenced this issue Jul 13, 2021
Updated #1 with text about creating one dd or multiple dd.

@robcrystalornelas Given this new option, is there any need for the wildcard option in #6? 

I also changed the word "column" to "field" in #3 since we are calling the column/row headers "fields" back in the CSV format structure.
@wavingtowaves wavingtowaves added the ESS-DIVE review Issue needs review by ESS-DIVE team label Sep 10, 2021
@regnans
Copy link
Author

regnans commented Jul 5, 2024

This issue is still relevant (see publication recommendations made for EDDOI-9773). Please include a list of standard Data_Type terms in the GitHub reporting format documentation.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ESS-DIVE review Issue needs review by ESS-DIVE team
Projects
None yet
Development

No branches or pull requests

3 participants