-
Notifications
You must be signed in to change notification settings - Fork 0
Summary of functions
genmeblog edited this page Apr 29, 2020
·
16 revisions
Based on article: https://atrebas.github.io/post/2019-03-03-datatable-dplyr/#addupdatedelete-columns
Full source code with results and R (through clojisr
): https://github.com/genmeblog/techtest/blob/master/src/techtest/datatable_dplyr.clj
Some helper functions are created to perform certain operations, they are placed at the beginning of the code:
fn name | desctiption |
---|---|
aggregate |
aggregate dataset and add result to the given (or empty) map |
aggregate->dataset |
convert result of aggregate to a dataset |
group-by-columns-or-fn-and-aggregate |
group dataset by column(s) or fn and aggregate, returns dataset |
sort-by-columns-with-orders |
sort-by columns with given order (:asc or :desc ) |
map-v |
apply fn to values of map, returns map |
All functions are not optimized and should be rewritten to use tech.ml.dataset internal functions. Issues are filled already.
Dataset used in all snippets.
(def DS (ds/name-values-seq->dataset {:V1 (take 9 (cycle [1 2]))
:V2 (range 1 10)
:V3 (take 9 (cycle [0.5 1.0 1.5]))
:V4 (take 9 (cycle [\A \B \C]))}))
(class DS)
;; => tech.ml.dataset.impl.dataset.Dataset
DS
;; => _unnamed [9 4]:
;; | :V1 | :V2 | :V3 | :V4 |
;; |-----+-----+--------+-----|
;; | 1 | 1 | 0.5000 | A |
;; | 2 | 2 | 1.000 | B |
;; | 1 | 3 | 1.500 | C |
;; | 2 | 4 | 0.5000 | A |
;; | 1 | 5 | 1.000 | B |
;; | 2 | 6 | 1.500 | C |
;; | 1 | 7 | 0.5000 | A |
;; | 2 | 8 | 1.000 | B |
;; | 1 | 9 | 1.500 | C |
Operation | Code | Comments |
---|---|---|
Filter rows using indices | (ds/select-rows DS [2 3]) |
|
Discard rows using indices | (ds/drop-rows DS (range 2 7)) |
also remove-rows
|
Filter rows using a logical expression |
(ds/filter-column #(> ^long % 5) :V2 DS) (ds/filter-column #{\A \C} :V4 DS)
|