Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Evaluation of datetimes not as expected #536

Closed
2 tasks done
kmasiello opened this issue Jun 7, 2024 · 4 comments · Fixed by #539
Closed
2 tasks done

Evaluation of datetimes not as expected #536

kmasiello opened this issue Jun 7, 2024 · 4 comments · Fixed by #539
Assignees

Comments

@kmasiello
Copy link

Prework

Description

Running a validation that a datetime is less than / greater than lubridate::today() yields opposite results, i.e., a date that is prior to/less than today evaluates as a fail under col_vals_lt and vice versa.

I know it's comparing a datetime to a date, but I wouldn't expect to have to do a conversion to run a test such as this, and I wouldn't expect it to appear to work but provide incorrect results.

Reproducible example

library(pointblank)
library(dplyr)
library(lubridate)

df <- tibble(datetime = as.POSIXct(c("2021-01-01 00:15:00",
                                     "2023-09-29 03:35:00", 
                                     "2023-12-16 06:51:00", 
                                     "2024-02-28 00:15:00"))) |> 
      mutate(date=as_date(datetime))

df
#> # A tibble: 4 × 2
#>   datetime            date      
#>   <dttm>              <date>    
#> 1 2021-01-01 00:15:00 2021-01-01
#> 2 2023-09-29 03:35:00 2023-09-29
#> 3 2023-12-16 06:51:00 2023-12-16
#> 4 2024-02-28 00:15:00 2024-02-28
create_agent(df) |> 
  col_vals_lte(datetime, today()) |> 
  col_vals_lte(date, today()) |>
  interrogate()

The validation on datetime failed all but date passed all.
image

Expected result

Validation on the datetime should have passed all dates less than today.

Session info

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.35     fastmap_1.2.0     xfun_0.44         glue_1.7.0       
#>  [5] knitr_1.47        htmltools_0.5.8.1 rmarkdown_2.27    lifecycle_1.0.4  
#>  [9] cli_3.6.2         reprex_2.1.0      withr_3.0.0       compiler_4.3.2   
#> [13] rstudioapi_0.16.0 tools_4.3.2       evaluate_0.23     yaml_2.3.8       
#> [17] rlang_1.1.3       fs_1.6.4

Created on 2024-06-07 with reprex v2.1.0

@yjunechoe
Copy link
Collaborator

yjunechoe commented Jun 7, 2024

This is an unintuitive and frustrating behavior, but unfortunately the source of it is not a bug in {pointblank}. Rather, this has to do with the side effect of {lubridate} sanitizing date/datetime comparison operations upon library(lubridate). {pointblank} is simply not aware of that business when it evaluates col_vals_lte().

Indeed, in base R, comparing an older datetime with newer date (almost always) returns FALSE due to the technicality that dates and datetimes are represented as integers - in seconds and in days, respectively. In turn, < is evaluated on their integer values. See below:

old_datetime <- as.POSIXct("2021-01-01 00:15:00")
current_date <- Sys.Date()

old_datetime < current_date
#> Warning: Incompatible methods ("Ops.POSIXt", "Ops.Date") for "<"
#> [1] FALSE

unclass(old_datetime)
#> [1] 1609478100
#> attr(,"tzone")
#> [1] ""
unclass(current_date)
#> [1] 19881

It's only when using {lubridate} where you get the more "sensible" behavior:

library(lubridate)
old_datetime < current_date
#> [1] TRUE

I'm not sure, on principle, whether {pointblank} should also do the {lubridate}-style auto-magic (@rich-iannone do you have any thoughts?), but in generally I would strongly advise against the expectation that you can safely compare date and date time objects in R, as they are values of disparate types.

With that in mind, and at least as an immediate fix, I would suggest the code in your reprex to use values that match the type of the column its being compared to (the datetime now() or the date today()):

create_agent(df) |> 
  col_vals_lte(datetime, lubridate::now()) |> 
  col_vals_lte(date, lubridate::today()) |>
  interrogate()

In practice, outside of this toy example, you may need to cast date values into datetime when comparing it to datetime columns and vice versa.

@kmasiello
Copy link
Author

Thank you @yjunechoe ! I had suspected the different object types were to blame, and your explanation is very valuable.

It is useful in the base R approach that a warning is surfaced. If pointblank maintains alignment with the base R approach for comparing disparate types, surfacing that warning would be very helpful indeed.

But for now, your workaround is exactly what I needed too. Thank you!!

@yjunechoe
Copy link
Collaborator

Good to hear that the workaround works for your usecase!

And to your point:

It is useful in the base R approach that a warning is surfaced. If pointblank maintains alignment with the base R approach for comparing disparate types, surfacing that warning would be very helpful indeed.

I 100% agree, and I think that that specific part may be a bug (thanks!!) - I've opened a new issue for this. And actually, I suspect that fixing #537 should just give us the lubridate behavior when that's loaded - will update back here after some experimenting!

@yjunechoe
Copy link
Collaborator

Thanks again for the report! Dev version now does the more sensible thing of inheriting the {lubridate} behavior when its's been loaded and, if not, actually showing the evaluation warning/error triggered by incompatible types.

# remotes::install_github("rstudio/pointblank")
library(pointblank)
library(dplyr)
library(lubridate)

df <- tibble(datetime = as.POSIXct(c("2021-01-01 00:15:00",
                                     "2023-09-29 03:35:00", 
                                     "2023-12-16 06:51:00", 
                                     "2024-02-28 00:15:00"))) |> 
  mutate(date=as_date(datetime))

create_agent(df) |> 
  col_vals_lte(datetime, today()) |> 
  col_vals_lte(date, today()) |>
  interrogate()

image

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants