Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Sorting on dates with NULL values #227

Open
dhicks opened this issue Dec 30, 2022 · 0 comments
Open

Sorting on dates with NULL values #227

dhicks opened this issue Dec 30, 2022 · 0 comments

Comments

@dhicks
Copy link

dhicks commented Dec 30, 2022

Brief description of the problem: When trying to sort on a csl_dates column with missing (NULL) values, the ordering is wrong. Replacing the NULL entry/ies with NA resolves the problem, but is tricky.

Diagnosis: This came up when I was trying to use bibliography_entries() with an Zotero export of a group that included some submitted but unpublished papers (so, no publication date). bibliography_entries() calls jsonlite::fromJSON(), which has a longstanding issue with assuming java null should be translated to R NULL: jeroen/jsonlite#70.

Reprex:

library(vitae)
#> 
#> Attaching package: 'vitae'
#> The following object is masked from 'package:stats':
#> 
#>     filter

dates = structure(list(
    structure(list(`date-parts` = list(list(2020L))), class = "csl_date"),
    NULL, 
    structure(list(`date-parts` = list(list(2019L, 3L, 14L))), class = "csl_date"), 
    structure(list(`date-parts` = list(list(2016L, 12L, 22L))), class = "csl_date"), 
    structure(list(`date-parts` = list(list(2020L, 1L))), class = "csl_date")
    ), 
    class = c("csl_dates", "vctrs_vctr", "list"))

dates
#> <csl_dates[5]>
#> [1] 2020       NULL       2019-3-14  2016-12-22 2020-1
## The order is all wrong and the last entry has disappeared
dates[order(dates)]
#> <csl_dates[4]>
#> [1] 2019-3-14  NULL       2020       2016-12-22

## From <https://stackoverflow.com/questions/22870198/is-there-a-more-efficient-way-to-replace-null-with-na-in-a-list/49539022#49539022>
replace_x <- function(x, replacement = NA_character_) {
    if (length(x) == 0 || length(x[[1]]) == 0) {
        replacement
    } else {
        x
    }
}

## Presumably you could use an lapply here, but I can't be bothered to figure that out right now
fixed_dates = purrr::modify_depth(dates, 1, replace_x)
## Sorted correctly, with the missing value at the end
fixed_dates[order(fixed_dates)]
#> <csl_dates[5]>
#> [1] 2016-12-22 2019-3-14  2020       2020-1     NA

Created on 2022-12-30 with reprex v2.0.2

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant