Skip to content

Summary of functions

genmeblog edited this page Apr 29, 2020 · 16 revisions

Functionality of tech.ml.dataset version 2.0-beta30

Based on article: https://atrebas.github.io/post/2019-03-03-datatable-dplyr/#addupdatedelete-columns

Full source code with results and R (through clojisr): https://github.com/genmeblog/techtest/blob/master/src/techtest/datatable_dplyr.clj

Some helper functions are created to perform certain operations, they are placed at the beginning of the code:

fn name desctiption
aggregate aggregate dataset and add result to the given (or empty) map
aggregate->dataset convert result of aggregate to a dataset
group-by-columns-or-fn-and-aggregate group dataset by column(s) or fn and aggregate, returns dataset
sort-by-columns-with-orders sort-by columns with given order (:asc or :desc)
map-v apply fn to values of map, returns map

All functions are not optimized and should be rewritten to use tech.ml.dataset internal functions. Issues are filled already.

Dataset creation

Dataset used in all snippets.

(def DS (ds/name-values-seq->dataset {:V1 (take 9 (cycle [1 2]))
                                      :V2 (range 1 10)
                                      :V3 (take 9 (cycle [0.5 1.0 1.5]))
                                      :V4 (take 9 (cycle [\A \B \C]))}))

(class DS)
;; => tech.ml.dataset.impl.dataset.Dataset
DS
;; => _unnamed [9 4]:
;;    | :V1 | :V2 |    :V3 | :V4 |
;;    |-----+-----+--------+-----|
;;    |   1 |   1 | 0.5000 |   A |
;;    |   2 |   2 |  1.000 |   B |
;;    |   1 |   3 |  1.500 |   C |
;;    |   2 |   4 | 0.5000 |   A |
;;    |   1 |   5 |  1.000 |   B |
;;    |   2 |   6 |  1.500 |   C |
;;    |   1 |   7 | 0.5000 |   A |
;;    |   2 |   8 |  1.000 |   B |
;;    |   1 |   9 |  1.500 |   C |

Basic operations

Filter rows

Operation Code Comments
Filter rows using indices (ds/select-rows DS [2 3])
Discard rows using indices (ds/drop-rows DS (range 2 7)) also remove-rows
Filter rows using a logical expression (ds/filter-column #(> ^long % 5) :V2 DS)
(ds/filter-column #{\A \C} :V4 DS)
Clone this wiki locally