speed up `deharmonize()` helper #902

simonpcouch · 2023-03-06T16:47:51Z

This PR adds an early return to the deharmonize() helper function and transitions tibble() to new_tibble(list()).

The modified version, for ease of reprex:

library(tidymodels)

args <- list(penalty = quo(NULL), mixture = quo(tune()))
arg_key <- 
  tibble::tibble(
    parsnip = c("penalty", "mixture"), 
    original = c("lambda", "alpha"), 
    func = list(list(pkg = "dials", fun = "penalty"), 
                list(pkg = "dials", fun = "mixture")), 
    has_submodel = c(TRUE, FALSE)
  )  
  
deharmonize2 <- function(args, key) {
  if (length(args) == 0) {
    return(args)
  }
  
  if (nrow(key) == 0) {
    return(args[integer(0)])
  }
  
  parsn <- list(parsnip = names(args), order = seq_along(args))
  parsn <- tibble::new_tibble(parsn, nrow = length(args))
  
  merged <-
    dplyr::left_join(parsn, key, by = "parsnip") %>%
    dplyr::arrange(order)
  
  merged <- merged[!duplicated(merged$order),]
  
  names(args) <- merged$original
  args[!is.na(merged$original)]
}

The function runs once per fit, and takes up about 3% of eval time in this example:

bm <- 
  bench::mark(
    total = 
      fit_resamples(linear_reg(), mpg ~ ., bootstraps(mtcars, 100)),
    deharmonize = 
      replicate(100, parsnip:::deharmonize(args, arg_key)),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 3.051962

When we hit the new early return, the speedup is substantial--the old version takes >600x longer:

# ex 1: `linear_reg()`
args <- list(penalty = quo(NULL), mixture = quo(NULL))
arg_key <- 
  tibble(
    parsnip = character(0), 
    original = character(0), 
    func = list(), 
    has_submodel = logical(0)
  )

bm2 <- 
  bench::mark(
    old = parsnip:::deharmonize(args, arg_key),
    new = deharmonize2(args, arg_key)
  )

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 676.4904

When we run through the whole helper, we get a ~15% speedup:

# ex 2: `linear_reg(engine = "glmnet", penalty = .3)`
args <- list(penalty = quo(.3), mixture = quo(NULL))
arg_key <- 
  tibble::tibble(
    parsnip = c("penalty", "mixture"), 
    original = c("lambda", "alpha"), 
    func = list(list(pkg = "dials", fun = "penalty"), 
                list(pkg = "dials", fun = "mixture")), 
    has_submodel = c(TRUE, FALSE)
  )

bm3 <-
  bench::mark(
    old = parsnip:::deharmonize(args, arg_key),
    new = deharmonize2(args, arg_key)
  )

1 - (as.numeric(bm3$median[2]) / as.numeric(bm3$median[1]))
#> [1] 0.1649681

^{Created on 2023-03-06 with reprex v2.0.2}

This reprex uses dev dplyr. :)

hfrick

Nice! 🚤

github-actions · 2023-03-25T00:56:41Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

speed up deharmonize() helper

74b4996

simonpcouch requested a review from hfrick March 6, 2023 17:06

hfrick approved these changes Mar 7, 2023

View reviewed changes

simonpcouch merged commit 0127f2d into main Mar 7, 2023

simonpcouch deleted the deharmonize-speedup branch March 7, 2023 12:40

github-actions bot locked and limited conversation to collaborators Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up `deharmonize()` helper #902

speed up `deharmonize()` helper #902

simonpcouch commented Mar 6, 2023 •

edited

Loading

hfrick left a comment

github-actions bot commented Mar 25, 2023

speed up deharmonize() helper #902

speed up deharmonize() helper #902

Conversation

simonpcouch commented Mar 6, 2023 • edited Loading

hfrick left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 25, 2023

speed up `deharmonize()` helper #902

speed up `deharmonize()` helper #902

simonpcouch commented Mar 6, 2023 •

edited

Loading