Skip to content

speed up deharmonize() helper #902

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Mar 7, 2023
Merged

speed up deharmonize() helper #902

merged 1 commit into from
Mar 7, 2023

Conversation

simonpcouch
Copy link
Contributor

@simonpcouch simonpcouch commented Mar 6, 2023

This PR adds an early return to the deharmonize() helper function and transitions tibble() to new_tibble(list()).

The modified version, for ease of reprex:

library(tidymodels)

args <- list(penalty = quo(NULL), mixture = quo(tune()))
arg_key <- 
  tibble::tibble(
    parsnip = c("penalty", "mixture"), 
    original = c("lambda", "alpha"), 
    func = list(list(pkg = "dials", fun = "penalty"), 
                list(pkg = "dials", fun = "mixture")), 
    has_submodel = c(TRUE, FALSE)
  )  
  
deharmonize2 <- function(args, key) {
  if (length(args) == 0) {
    return(args)
  }
  
  if (nrow(key) == 0) {
    return(args[integer(0)])
  }
  
  parsn <- list(parsnip = names(args), order = seq_along(args))
  parsn <- tibble::new_tibble(parsn, nrow = length(args))
  
  merged <-
    dplyr::left_join(parsn, key, by = "parsnip") %>%
    dplyr::arrange(order)
  
  merged <- merged[!duplicated(merged$order),]
  
  names(args) <- merged$original
  args[!is.na(merged$original)]
}

The function runs once per fit, and takes up about 3% of eval time in this example:

bm <- 
  bench::mark(
    total = 
      fit_resamples(linear_reg(), mpg ~ ., bootstraps(mtcars, 100)),
    deharmonize = 
      replicate(100, parsnip:::deharmonize(args, arg_key)),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 3.051962

When we hit the new early return, the speedup is substantial--the old version takes >600x longer:

# ex 1: `linear_reg()`
args <- list(penalty = quo(NULL), mixture = quo(NULL))
arg_key <- 
  tibble(
    parsnip = character(0), 
    original = character(0), 
    func = list(), 
    has_submodel = logical(0)
  )

bm2 <- 
  bench::mark(
    old = parsnip:::deharmonize(args, arg_key),
    new = deharmonize2(args, arg_key)
  )

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 676.4904

When we run through the whole helper, we get a ~15% speedup:

# ex 2: `linear_reg(engine = "glmnet", penalty = .3)`
args <- list(penalty = quo(.3), mixture = quo(NULL))
arg_key <- 
  tibble::tibble(
    parsnip = c("penalty", "mixture"), 
    original = c("lambda", "alpha"), 
    func = list(list(pkg = "dials", fun = "penalty"), 
                list(pkg = "dials", fun = "mixture")), 
    has_submodel = c(TRUE, FALSE)
  )

bm3 <-
  bench::mark(
    old = parsnip:::deharmonize(args, arg_key),
    new = deharmonize2(args, arg_key)
  )

1 - (as.numeric(bm3$median[2]) / as.numeric(bm3$median[1]))
#> [1] 0.1649681

Created on 2023-03-06 with reprex v2.0.2

This reprex uses dev dplyr. :)

@simonpcouch simonpcouch requested a review from hfrick March 6, 2023 17:06
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚤

@simonpcouch simonpcouch merged commit 0127f2d into main Mar 7, 2023
@simonpcouch simonpcouch deleted the deharmonize-speedup branch March 7, 2023 12:40
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants