Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG insert dataset depends on INTERNAL ds order. #18

Open
awb99 opened this issue Feb 17, 2024 · 4 comments
Open

BUG insert dataset depends on INTERNAL ds order. #18

awb99 opened this issue Feb 17, 2024 · 4 comments

Comments

@awb99
Copy link

awb99 commented Feb 17, 2024

I found a weird bug when inserting a dataset via (duckdb/insert-dataset! (:conn session) ds)
Sometimes the insert fails, even though all columns are the same and all data-types are the same
as defined via (duckdb/create-table! (:conn session) ds)

From printing the initial dataset that was used to create the table, I have deducted the order in
which tml-ds has arranged the columns internally. I replicate this via (order-columns-strange ds).
When I want call insert-dataset! via a differently arranged dataset, for example: (order-columns ds),
then the insert throws an exception.

This is a weird behavior which I have not have expected.

This is the code I used to get the inserts to work / not-work:

(defn order-columns [ds]
  (tc/dataset [(:date ds)
               (:asset ds)
               (:open ds)
               (:high ds)
               (:low ds)
               (:close ds)
               (:volume ds)
               (:epoch ds)
               (:ticks ds)]))

(defn order-columns-strange [ds]
  (tc/dataset [(:open ds)
               (:epoch ds)
               (:date ds)
               (:close ds)               
               (:volume ds)
               (:high ds)
               (:low ds)
               (:ticks ds)
               (:asset ds)
               ]))
@cnuernber
Copy link
Collaborator

cnuernber commented Feb 17, 2024

Yes I haven't verified this but the duckdb insert operation should be independent of the dataset column order. Seems odd.

What exactly is the error?

@harold
Copy link
Contributor

harold commented Feb 17, 2024

Confirmed, maybe this can lead to a test case:

user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tmducken.duckdb :as duckdb])
nil
user> (duckdb/initialize! {:duckdb-home "binaries"})
Feb 17, 2024 12:01:37 PM clojure.tools.logging$eval13663$fn__13666 invoke
INFO: Attempting to load duckdb from "binaries/libduckdb.so"
true
user> (def db (duckdb/open-db))
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (def ds (ds/select-columns (ds/->dataset {:a [1 2 3] :b ["x" "y" "z"]}) [:a :b]))
#'user/ds
user> ds
_unnamed [3 2]:

| :a | :b |
|---:|----|
|  1 |  x |
|  2 |  y |
|  3 |  z |
user> (duckdb/create-table! conn ds)
"_unnamed"
user> (duckdb/insert-dataset! conn ds)
3
user> (duckdb/sql->dataset conn "select * from _unnamed")
Feb 17, 2024 12:03:50 PM clojure.tools.logging$eval13663$fn__13666 invoke
INFO: Reference thread starting
:_unnamed [3 2]:

| a | b |
|--:|---|
| 1 | x |
| 2 | y |
| 3 | z |
user> (def db (duckdb/open-db)) ;; FRESH NEW DB
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (duckdb/create-table! conn ds)
"_unnamed"
user> (def swizzled (ds/select-columns ds [:b :a]))
#'user/swizzled
user> swizzled
_unnamed [3 2]:

| :b | :a |
|----|---:|
|  x |  1 |
|  y |  2 |
|  z |  3 |
user> (duckdb/insert-dataset! conn swizzled)
Execution error at tmducken.duckdb/insert-dataset!$fn$check-error (duckdb.clj:321).
{:address 0x00007FF29101D730 }

@cnuernber
Copy link
Collaborator

cnuernber commented Feb 18, 2024

The work around for this is to use a prepared statement - this may take a moment to fix as it doesn't appear that duckdb exposes tableinfo from the connection to the C api at this time.

@awb99
Copy link
Author

awb99 commented Feb 22, 2024

This is a namespace that can be used to reproduce the bug I posted:

(ns notebook.playground.bardb.bug-duckdb
   (:require
   [tick.core :as t]
   [tablecloth.api :as tc]
   [tmducken.duckdb :as duckdb]))
 
 (def db (duckdb/open-db "/tmp/duckdb-test"))

 (def conn (duckdb/connect db))


(defn make-ds [v]
  (let [table-name "test"]
    (-> (tc/dataset v)
        (tc/set-dataset-name table-name))))
 
(def empty-ds (make-ds [{:date (t/now)
                         :open 0.0 :high 0.0 :low 0.0 :close 0.0
                         :volume 0.0 ; crypto volume is double.
                         :ticks 0
                         :asset "000"}]))
 
 empty-ds


(duckdb/create-table! conn empty-ds)
 
(def data-ds (make-ds [{:date (t/now)
                        :open 1.0 :high 2.0 :low 3.0 :close 4.0
                        :volume 0.0 ; crypto volume is double.
                        :ticks 0
                        :asset "1"}]))

(duckdb/insert-dataset! conn data-ds)
;; => 1

(def data2-ds (make-ds [{:asset "2"; NOTE asset is the first key
                        :date (t/now)
                        :open 1.0 :high 2.0 :low 3.0 :close 4.0
                        :volume 0.0 ; crypto volume is double.
                        :ticks 0
                        }]))
 
 (duckdb/insert-dataset! conn data2-ds)
 ;; => Execution error at tmducken.duckdb/insert-dataset!$fn$check-error (duckdb.clj:320).
 ;;    {:address 0x00007F53A4467DB0 }

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants