Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

consistent event handling #75

Open
chrisdane opened this issue Oct 22, 2020 · 3 comments
Open

consistent event handling #75

chrisdane opened this issue Oct 22, 2020 · 3 comments

Comments

@chrisdane
Copy link

chrisdane commented Oct 22, 2020

Hi

I would like to compare different datasets from pangaea in an automatic way using such an input list:

pangdois <- list()
if (T) {
    pangdois <- c(pangdois,
                  list("lorius_etal_1985"=
                       list(pdoi="10.1594/PANGAEA.860950",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                    dims=list("kyr_before_1950"="Age [ka BP]"))))))

if (T) {
    pangdois <- c(pangdois, 
                  list("masson-delmotte_etal_2011"=
                       list(pdoi="10.1594/PANGAEA.785228",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                   dims=list("kyr_before_1950"="Age [ka BP]"))))))
}

if (length(pangdois) > 0) {
    for (pangi in seq_along(pangdois)) {
        if (pangi == 1) library(pangaear)
        message("run `pangaear::pg_data(", pangdois[[pangi]]$pdoi, ")` ...")
        tmp <- pangaear::pg_data(pangdois[[pangi]]$pdoi)
        for (eventi in seq_along(tmp)) { # search wanted variables in every event of current doi
            event <- NA # default
            # <non-consistent event-handling; see below>
            for (vi in seq_along(pangdois[[pangi]]$vars)) { # check if any wanted variable exists in current event of current doi
                if (any(names(tmp[[eventi]]$data) == pangdois[[pangi]]$vars[[vi]]$inputname)) {
                    # do further stuff
                } # if current variable exists in current event of current doi
            } # for vi in wanted vars
        } # for eventi in seq_along(tmp)
    } # for pangi in pangdois
} # if length(pangdois) > 0

However, I realized that the usage of the event handler is not consistent. So far I figured out 3 different cases:

# case 1:
$ parent_doi: chr "10.1594/PANGAEA.785228"
$ metadata  :List of 7
..$ events    :List of 7
 .. ..$ Dome_Fuji (DF): chr NA
# --> if `metadata$events` is a list, use first entry that is NA to identify the data?

 # case 2:
$ parent_doi: chr "10.1594/PANGAEA.860950"
$ metadata  :List of 7
 ..$ events    : chr "Vostok * LATITUDE: -78.464420 * LONGITUDE: 106.837320 * DATE/TIME: 1980-01-01T00:00:00 * ELEVATION: 3488.0 m * Recovery: 2755 m * LOCATION: Antarctica * CAMPAIGN: Ice_core_diverse * BASIS: Sampling/drilling ice * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: annual pressure 624 mbar; mean annual temperature -55.5°C; snow accumulation between 2.2 and; 22.5 g/cm**2/yr, about 250 ka"
# --> if `metadata$events` is not a list and `data$Event` is null, find a way to reduce the long event string to identify the data?

# case 3:
..$ parent_doi: chr "10.1594/PANGAEA.863978"
..$ metadata  :List of 9
 ..$ events    : chr "177-1089A * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-19T16:15:00 * DATE/TIME END: 1997-12-21T13:15:00 * ELEVATION: -4619.3 m * Penetration: 216.3 m * Recovery: 149.64 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 23 cores; 216.3 m cored; 0 m drilled; 69.2 % recovery; 177-1089B * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-22T13:16:00 * DATE/TIME END: 1997-12-22T22:45:00 * ELEVATION: -4623.8 m * Penetration: 264.9 m * Recovery: 246.62 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 29 cores; 264.9 m cored; 0 m drilled; 93.1 % recovery; 306-U1313B * LATITUDE: 41.000023 * LONGITUDE: -32.957300 * ELEVATION: -3413.5 m * Recovery: 306.54 m * CAMPAIGN: Exp306 (North Atlantic Climate 2) (URI: https://doi.org/10.2204/iodp.proc.303306.2006) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 32 cores; 300.4 m cored; 102 % recovered; 2 m drilled; 302.4 m penetrated; GeoB1515-1 * LATITUDE: 4.238333 * LONGITUDE: -43.666667 * DATE/TIME: 1991-05-15T00:00:00 * ELEVATION: -3129.0 m * Recovery: 6.58 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); GeoB1523-1 * LATITUDE: 3.831667 * LONGITUDE: -41.621667 * DATE/TIME: 1991-05-17T00:00:00 * ELEVATION: -3292.0 m * Recovery: 6.65 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); KNR140-12JPC (KNR140-2-12JPC) * LATITUDE: 29.080000 * LONGITUDE: -72.900000 * ELEVATION: -4250.0 m * LOCATION: North Atlantic Ocean * CAMPAIGN: KNR140 * BASIS: Knorr * METHOD/DEVICE: Piston corer (PC); M35003-4 * LATITUDE: 12.090000 * LONGITUDE: -61.243333 * DATE/TIME: 1996-04-19T00:00:00 * ELEVATION: -1299.0 m * Recovery: 9.63 m * CAMPAIGN: M35/1 (URI: https://doi.org/10.2312/cr_m35) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL)"          
 $ data      : tibble [138 × 27] (S3: tbl_df/tbl/data.frame)
  ..$ Event                               : chr [1:138] "177-1089A" "177-1089A" "177-1089A" "177-1089A" ...
# --> if `metadata$events` is not a list and `data$Event` is not null, use maybe `unique(data$Event)` to identify the data?

Probably I do not understand the correct usage of the event handler. Is there a better way to identify each individual data set per DOI in an automatic way?

Thanks a lot for any help,
Chris

Session Info
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.3 (2020-10-10)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_US #de_DE                
 collate  C                           
 ctype    en_US.UTF-8                 
 tz       Europe/Berlin               
 date     2020-10-22Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source                            
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                    
 backports     1.1.10  2020-09-15 [1] CRAN (R 4.0.2)                    
 bookdown    * 0.21    2020-10-13 [1] CRAN (R 4.0.3)                    
 callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.3)                    
 cli           2.1.0   2020-10-12 [1] CRAN (R 4.0.3)                    
 colorout    * 1.2-2   2020-04-27 [1] Github (jalvesaq/colorout@726d681)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)                    
 crul          1.0.0   2020-07-30 [1] CRAN (R 4.0.2)                    
 curl          4.3     2019-12-02 [1] CRAN (R 4.0.0)                    
 desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)                    
 devtools    * 2.3.2   2020-09-18 [1] CRAN (R 4.0.2)                    
 digest        0.6.26  2020-10-17 [1] CRAN (R 4.0.3)                    
 dotCall64   * 1.0-0   2018-07-30 [1] CRAN (R 4.0.0)                    
 dplyr         1.0.2   2020-08-18 [1] CRAN (R 4.0.2)                    
 dtupdate    * 1.5     2020-04-27 [1] Github (hrbrmstr/dtupdate@58056ea)
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)                    
 extrafont   * 0.17    2014-12-08 [1] CRAN (R 4.0.0)                    
 extrafontdb   1.0     2012-06-11 [1] CRAN (R 4.0.0)                    
 fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                    
 fields      * 11.6    2020-10-09 [1] CRAN (R 4.0.3)                    
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                    
 generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                    
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                    
 gsw         * 1.0-5   2017-08-09 [1] CRAN (R 4.0.0)                    
 hoardr        0.5.2   2018-12-02 [1] CRAN (R 4.0.0)                    
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.0.0)                    
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                    
 knitr         1.30    2020-09-22 [1] CRAN (R 4.0.2)                    
 lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)                    
 magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)                    
 maps          3.3.0   2018-04-03 [1] CRAN (R 4.0.0)                    
 memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)                    
 ncdf4       * 1.17    2019-10-23 [1] CRAN (R 4.0.0)                    
 oai           0.3.0   2019-09-07 [1] CRAN (R 4.0.0)                    
 oce         * 1.2-0   2020-02-21 [1] CRAN (R 4.0.0)                    
 pangaear    * 1.0.0   2020-01-22 [1] CRAN (R 4.0.0)                    
 pbapply       1.4-3   2020-08-18 [1] CRAN (R 4.0.2)                    
 pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.2)                    
 pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)                    
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                    
 pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)                    
 plyr          1.8.6   2020-03-03 [1] CRAN (R 4.0.0)                    
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)                    
 processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)                    
 ps            1.4.0   2020-10-07 [1] CRAN (R 4.0.3)                    
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                    
 R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                    
 rappdirs      0.3.1   2016-03-28 [1] CRAN (R 4.0.0)                    
 Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.2)                    
 remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)                    
 rlang         0.4.8   2020-10-08 [1] CRAN (R 4.0.3)                    
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)                    
 Rttf2pt1      1.3.8   2020-01-10 [1] CRAN (R 4.0.0)                    
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                    
 spam        * 2.5-1   2019-12-12 [1] CRAN (R 4.0.0)                    
 stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                    
 stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                    
 testthat    * 2.3.2   2020-03-02 [1] CRAN (R 4.0.0)                    
 tibble        3.0.4   2020-10-12 [1] CRAN (R 4.0.3)                    
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)                    
 usethis     * 1.6.3   2020-09-17 [1] CRAN (R 4.0.2)                    
 vctrs         0.3.4   2020-08-29 [1] CRAN (R 4.0.2)                    
 withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)                    
 xfun          0.18    2020-09-29 [1] CRAN (R 4.0.2)                    
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.0)  
@sckott
Copy link
Contributor

sckott commented Oct 26, 2020

thanks for the issue @chrisdane ! having a look

@sckott
Copy link
Contributor

sckott commented Oct 26, 2020

unfortunately, the files from Pangaea are semi formatted text files that are quite hard to parse, and super variable. I can try to make them more consistent.

notes to self:

  • DOIs that have varied events text to parse:

10.1594/PANGAEA.785228
10.1594/PANGAEA.860950
10.1594/PANGAEA.863978
10.1594/PANGAEA.881731
10.1594/PANGAEA.896852
10.1594/PANGAEA.896852

  • working locally on parsing events data, come back to later.

@sckott
Copy link
Contributor

sckott commented Jan 7, 2021

@chrisdane sorry for delay on this - if you are willing to contribute this will move along faster - i'll get to it at some point

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants