Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Onewayfast #328

Merged
merged 3 commits into from
Jul 29, 2019
Merged

Onewayfast #328

merged 3 commits into from
Jul 29, 2019

Conversation

Robinlovelace
Copy link
Member

No description provided.

@Robinlovelace
Copy link
Member Author

Benchmark shows substantial performance improvements:

ids <- as.character(runif(n, 1e4, 1e7 - 1))
# benchmark of methods:
x <- data.frame(id1 = rep(ids, times = n),
                id2 = rep(ids, each = n),
                val = 1,
                stringsAsFactors = FALSE)
bench::mark(
  check = FALSE,
  od_id_order(x),
  od_id_szudzik(x$id1, x$id2),
  od_id_max_min(x$id1, x$id2)
  )
# A tibble: 3 x 13
  expression                      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
  <bch:expr>                  <bch:t> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <lis>
1 od_id_order(x)              504.8ms  505ms      1.98    30.9MB     1.98     1     1      505ms <df[,… <df[,… <bch…
2 od_id_szudzik(x$id1, x$id2) 158.6ms  164ms      6.11   145.8MB    12.2      4     8      655ms <dbl … <df[,… <bch…
3 od_id_max_min(x$id1, x$id2)  84.7ms  118ms      7.43    84.8MB    11.9      5     8      673ms <dbl … <df[,… <bch…
# … with 1 more variable: gc <list>

@Robinlovelace Robinlovelace requested a review from mem48 July 29, 2019 11:46
@Robinlovelace
Copy link
Member Author

Heads-up @mem48 this PR contains a general refactoring of od_id functions and an implementation of the fastest solution here: https://stackoverflow.com/a/57236658/1694378

Seem good to you? I've also created a new function od_oneway() that uses this, which seems to be around 2 times faster than onewayid() for small datasets.

@Robinlovelace Robinlovelace merged commit c68e809 into master Jul 29, 2019
@Robinlovelace Robinlovelace deleted the onewayfast branch July 29, 2019 15:11
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant