Skip to content

speed up get_model_spec() helper #901

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Mar 7, 2023
Merged

Conversation

simonpcouch
Copy link
Contributor

@simonpcouch simonpcouch commented Mar 6, 2023

This PR speeds up the get_model_spec() helper. I'm first just pushing some new tests so we can run checks and ensure that results don't change.

This helper is called once per model specification fit (or translation).

library(tidymodels)

bt <- bootstraps(mtcars, 100)

bm <- 
  bench::mark(
    total = 
      fit_resamples(linear_reg(), mpg ~ ., bootstraps(mtcars, 100)),
    get_model_spec = 
      replicate(100, parsnip:::get_model_spec("linear_reg", "regression", "lm")),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

bm
#> # A tibble: 2 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total              5.2s     5.2s     0.192   73.57MB    10.6 
#> 2 get_model_spec  258.6ms  342.9ms     2.92     2.42MB     7.29

As a percentage of total time, the helper takes a bit more than 6.5% of the evaluation time:

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 6.588154

The rewritten version of the helper is below (just for purposes of self-contained reprex):

get_model_spec2 <- function(model, mode, engine) {
  m_env <- get_model_env()
  env_obj <- rlang::env_names(m_env)
  env_obj <- grep(model, env_obj, value = TRUE)
  
  res <- list()
  
  libs <- rlang::env_get(m_env, paste0(model, "_pkgs"))
  libs <- vctrs::vec_slice(libs$pkg, libs$engine == engine)
  res$libs <- if (length(libs) > 0) {libs[[1]]} else {NULL}
  
  fits <- rlang::env_get(m_env, paste0(model, "_fit"))
  fits <- vctrs::vec_slice(fits$value, fits$mode == mode & fits$engine == engine)
  res$fit <- if (length(fits) > 0) {fits[[1]]} else {NULL}
  
  preds <- rlang::env_get(m_env, paste0(model, "_predict"))
  where <- preds$mode == mode & preds$engine == engine
  types <- vctrs::vec_slice(preds$type, where)
  values <- vctrs::vec_slice(preds$value, where)
  names(values) <- types
  res$pred <- values
  
  res
}

With benchmarks:

bm2 <- 
  bench::mark(
    old = parsnip:::get_model_spec("linear_reg", "regression", "lm"),
    new = get_model_spec2("linear_reg", "regression", "lm"),
    check = TRUE
  )

Note with check = TRUE in the above, mark() checks equality of the outputs. :) The new check is as.numeric(bm2$median[1]) / as.numeric(bm2$median[2]) times faster than the old one:

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 24.86973

Created on 2023-03-06 with reprex v2.0.2

cc @DavisVaughan--thanks for the rewrite.🏄 Note that this reprex was generated with dev dplyr--quite a big speedup already with their changes.

``` r
boop <- list(a = 1)

bench::mark(
    base = if (length(boop) > 0) {boop[[1]]} else {NULL},
    purrr = purrr::pluck(boop, 1),
    dplyr = dplyr::first(boop),
    iterations = 100
)
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 base       122.94ns 163.91ns  3179808.        0B        0
#> 2 purrr        2.01µs   2.52µs   305755.   22.66KB        0
#> 3 dplyr        5.29µs   6.07µs   141894.    8.12MB        0
```

<sup>Created on 2023-03-06 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
@simonpcouch simonpcouch requested a review from hfrick March 6, 2023 15:13
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🏄 do we need a specific version of vctrs for this?

@simonpcouch
Copy link
Contributor Author

That's a good thought! There are no NEWS updates re: vec_slice() after our minimum 0.4.1, and the helper runs fine with that version, so I believe we should be good to go.

@simonpcouch simonpcouch merged commit 74439c6 into main Mar 7, 2023
@simonpcouch simonpcouch deleted the get-model-spec-speedup branch March 7, 2023 12:40
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants