-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[[ subsetting much slower than $ #780
Comments
The call to |
Out of interest, do you know if $ was significantly quicker prior to tibble 3 - or was performance more equal? |
From tibble 2.1.1 on a different machine. So it seems like $ was ~2x faster on 2.1.1 and is 25-30x faster on 3.0.1. df <- tibble::tibble(x = 1)
bench::mark(
dollar = df$x,
bracket = df[["x"]],
iterations = 1000
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dollar 682.65ns 1.36us 514262. 6.28KB 0
#> 2 bracket 1.02us 1.36us 472464. 5.3KB 0 Created on 2020-06-10 by the reprex package (v0.2.1) |
Thanks @ebein (all my machines were on tibble 3 and the hassle of doing a full reinstall to check it meant I was cheeky and just asked the question!): very interesting that from a practical perspective it may be better to train my muscle memory to use $ where possible (obviously [[ has benefits where the column name isn't a constant!) |
Once we remove the |
Pure S3 dispatch without doing actual work is already 1.3 µs. Oh well... |
Now: df <- tibble::tibble(x = 1)
bench::mark(
dollar = df$x,
bracket = df[["x"]],
iterations = 1000
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dollar 4.29µs 4.67µs 199184. 17.6KB 0
#> 2 bracket 3.91µs 4.25µs 219202. 90.6KB 0 Created on 2020-06-14 by the reprex package (v0.3.0) |
We can strive for even faster processing (closer to 2 µs), I suspect this needs a full rewrite in C. This should be fast enough for most use cases. |
tibble 3.0.2 - `[[` works with classed indexes again, e.g. created with `glue::glue()` (#778). - `add_column()` works without warning for 0-column data frames (#786). - `tribble()` now better handles named inputs (#775) and objects of non-vtrs classes like `lubridate::Period` (#784) and `formattable::formattable` (#785). - Subsetting and subassignment are faster (#780, #790, #794). - `is.null()` is preferred over `is_null()` for speed. - Implement continuous benchmarking (#793). - `is_vector_s3()` is no longer reexported from pillar (#789).
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
Starting with tibble 3.0.0, column subsetting using [[ is much slower than $. This causes slowdowns in functions that call [[ many times, for example data.matrix on a wide tibble.
Created on 2020-06-03 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: