-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Speeding up visdat #59
Milestone
Comments
Could possibly use rle(airquality$Ozone)
#> Run Length Encoding
#> lengths: int [1:152] 1 1 1 1 1 1 1 1 1 1 ...
#> values : int [1:152] 41 36 12 18 NA 28 23 19 8 NA ... Created on 2019-06-08 by the reprex package (v0.2.1) |
It looks like I might be able to use an alternative implementation of fingerprint <- function(x){
x_class <- class(x)
# is the data missing?
ifelse(is.na(x),
# yes? Leave as is NA
yes = NA,
# no? make that value no equal to the class of this cell.
no = glue::glue_collapse(x_class,
sep = "\n")
)
} # end function
fingerprint_2 <- function(x){
# is the data missing?
x_class <- class(x)
dplyr::if_else(condition = is.na(x),
# yes? Leave as is NA
true = NA_character_,
# no? make that value no equal to the class of this cell.
false = as.character(glue::glue_collapse(x_class,
sep = "\n"))
)
} # end function
create_vec <- function(size){
vec <- runif(size)
vec[sample(vctrs::vec_seq_along(vec), size = round(size/10))] <- NA
vec
}
fingerprint(create_vec(100))
#> [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [29] NA "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" NA NA "numeric" "numeric"
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" NA "numeric" "numeric" "numeric" "numeric"
#> [57] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [64] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [71] NA NA "numeric" "numeric" NA "numeric" "numeric"
#> [78] "numeric" "numeric" "numeric" "numeric" NA "numeric" NA
#> [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [99] "numeric" NA
fingerprint_2(create_vec(100))
#> [1] NA "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [29] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [57] "numeric" NA "numeric" "numeric" "numeric" "numeric" "numeric"
#> [64] "numeric" "numeric" "numeric" "numeric" NA "numeric" "numeric"
#> [71] NA NA NA "numeric" "numeric" "numeric" NA
#> [78] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [99] "numeric" "numeric"
bm1 <- bench::press(
size = c(1e2, 1e3, 1e4, 1e5, 1e6),
{
vec <- create_vec(size)
bench::mark(
new = fingerprint_2(vec),
old = fingerprint(vec)
)
}
)
#> Running with:
#> size
#> 1 100
#> 2 1000
#> 3 10000
#> 4 100000
#> 5 1000000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
plot(bm1)
#> Loading required namespace: tidyr summary(bm1)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 10 x 7
#> expression size min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new 100 53.27µs 62.7µs 13557. 56.03KB 12.0
#> 2 old 100 45.88µs 50.67µs 17296. 18.5KB 7.90
#> 3 new 1000 99.45µs 133.53µs 6725. 63.19KB 7.97
#> 4 old 1000 157.55µs 186.07µs 5136. 50.97KB 4.00
#> 5 new 10000 769.07µs 917.66µs 899. 625.69KB 9.99
#> 6 old 10000 1.68ms 1.97ms 462. 504.48KB 3.98
#> 7 new 100000 5.49ms 6.57ms 136. 6.1MB 16.0
#> 8 old 100000 15.56ms 18.01ms 51.2 4.92MB 5.91
#> 9 new 1000000 61.29ms 71.12ms 11.3 61.04MB 28.3
#> 10 old 1000000 151.73ms 155.05ms 6.44 49.21MB 4.83 Created on 2021-05-28 by the reprex package (v2.0.0) Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os macOS Big Sur 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Brisbane
#> date 2021-05-28
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] standard (@0.2.1)
#> backports 1.2.1 2020-12-09 [1] standard (@1.2.1)
#> beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.2)
#> bench 1.1.1 2020-01-13 [1] CRAN (R 4.0.2)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
#> colorspace 2.0-0 2020-11-11 [1] standard (@2.0-0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> curl 4.3 2019-12-02 [1] standard (@4.3)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] standard (@0.6.27)
#> dplyr 1.0.6 2021-05-05 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] standard (@0.14)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] standard (@1.5.0)
#> generics 0.1.0 2020-10-31 [1] standard (@0.1.0)
#> ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2)
#> ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] standard (@1.4.2)
#> gtable 0.3.0 2019-03-25 [1] standard (@0.3.0)
#> highr 0.8 2019-03-20 [1] standard (@0.8)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> httr 1.4.2 2020-07-20 [1] standard (@1.4.2)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] standard (@2.0.1)
#> mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
#> munsell 0.5.0 2018-06-12 [1] standard (@0.5.0)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] standard (@2.0.3)
#> profmem 0.6.0 2020-12-13 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] standard (@0.3.4)
#> R6 2.5.0 2020-10-28 [1] standard (@2.5.0)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] standard (@0.13)
#> scales 1.1.1 2020-05-11 [1] standard (@1.1.1)
#> sessioninfo 1.1.1 2018-11-05 [1] standard (@1.1.1)
#> stringi 1.5.3 2020-09-09 [1] standard (@1.5.3)
#> stringr 1.4.0 2019-02-10 [1] standard (@1.4.0)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
#> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.2)
#> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] standard (@1.1.0)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
#> vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.3)
#> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.2)
#> xml2 1.3.2 2020-04-23 [1] standard (@1.3.2)
#> yaml 2.2.1 2020-02-01 [1] standard (@2.2.1)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
After some discussion with Mike, here are some ways to speedup visdat:
fingerprint
- change so that I don'tpaste
in every element (minor speedup)The text was updated successfully, but these errors were encountered: