Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Performance of [[ #1353

Open
mgirlich opened this issue Aug 25, 2022 · 2 comments
Open

Performance of [[ #1353

mgirlich opened this issue Aug 25, 2022 · 2 comments

Comments

@mgirlich
Copy link
Contributor

While working on rectangling tool for recursive data frames (see tidyverse/tidyr#1386) I noticed that tibble::[[ actually had quite a performance impact. Do you see a chance of improving the performance? Or maybe a low level version for assignment?

f <- function(x, n = 10e3) {
  for (i in seq(n)) {
    x[["x"]] <- 1L
  }
}

t <- tibble::tibble(x = 1L)
df <- data.frame(x = 1L)
l <- list(x = 1L)

bench::mark(
  tibble = f(t),
  dataframe = f(df),
  list = f(l)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 tibble     523.22ms 523.22ms      1.91   211.5KB     15.3
#> 2 dataframe   78.31ms  80.04ms     12.0     80.9KB     18.0
#> 3 list         1.35ms   1.52ms    560.          0B     31.9

Created on 2022-08-25 with reprex v2.0.2

@krlmlr
Copy link
Member

krlmlr commented Aug 28, 2022

Thanks, confirmed. On my system:

t <- tibble::tibble(x = 1L)
bench::mark(for (i in seq(1e4)) {
  t[["x"]] <- 1L
})
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                                  min median itr/s…¹ mem_a…² gc/se…³
#>   <bch:expr>                               <bch:> <bch:>   <dbl> <bch:b>   <dbl>
#> 1 for (i in seq(10000)) { t[["x"]] <- 1L }  214ms  214ms    4.66   100KB    46.6
#> # … with abbreviated variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`

Created on 2022-08-28 by the reprex package (v2.0.1)

@mgirlich
Copy link
Contributor Author

Thanks for working on this directly. What do you think about a low level function tib_assign_col(df, j, value)? This should allow for some very good performance improvements. Though maybe this should live in vctrs?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants