Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

perf: Reduce overhead of single-column subset assignment #1363

Merged
merged 5 commits into from
Aug 28, 2022

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 28, 2022

For #1353.

One area where extra work is done: distinction between new and existing columns. Other than that, further substantial improvements seem to require moving to C code.

t <- tibble::tibble(x = 1L)
bench::mark(for (i in seq(1e4)) {
  t[["x"]] <- 1L
})
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                                  min median itr/s…¹ mem_a…² gc/se…³
#>   <bch:expr>                               <bch:> <bch:>   <dbl> <bch:b>   <dbl>
#> 1 for (i in seq(10000)) { t[["x"]] <- 1L }  162ms  165ms    6.03   100KB    46.7
#> # … with abbreviated variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`

Created on 2022-08-28 by the reprex package (v2.0.1)

@krlmlr krlmlr changed the title Reduce overhead of single-column subset assignment perf: Reduce overhead of single-column subset assignment Aug 28, 2022
@krlmlr krlmlr merged commit a0ccec2 into main Aug 28, 2022
@sebastian-gerdes
Copy link

Hello everyone,

I can also confirm this issue (just wanted to open a new issue and then found this existing issue:

library('tictoc')
n <- 1e5
tic()
my_tib <- tibble(.rows = n, x = NA)

# slow version: first construct tibble, than assign within tibble
for (i in 1:n) {
  my_tib$x[i] <- runif(1)
}
toc() # approx 10 seconds on my machine

# fast version: as#side 'plain' vector and construct tibble later
tic()
x <- rep(NA, n)
for (i in 1:n) {
  x[i] <- runif(1)
}
my_tib <- tibble(x = x)
toc() # approx 0.1 seconds on my machine

I would really like to work with the first version, since this make the code a lot easier for my simulations, however, performance really might be a deal-breaker for me...

So I would really appreciate any attempts to improve the performance of tibble in this aspect!

Thanks and best greetings,
Sebastian

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants