-
Notifications
You must be signed in to change notification settings - Fork 1
/
notes.txt
310 lines (160 loc) · 9.49 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
Poor man’s datomic. Target use case: smallish business apps.
* focus is on fact storage and retrieval. Summarization is outside of scope.
Should assertation of an existing fact be ignored?
There should be a way to re-assert a fact, where when re-asserted it's recorded in the database that it's true at that point in time once again.
http://aosabook.org/en/500L/an-archaeology-inspired-database.html
This is a good video about a poor experience with datomic, and interesting points on how datomic didn't work for them. https://www.youtube.com/watch?v=0y6QK813new
Most interesting to me is how datomics fully immutable history caused real problems with application development when the schema evolved. The application needed to support both the old schema and the new schema. This brings up a thought that perhaps there should be a split between the "system" history and the "domain" history. For instance if the schema changes, then we need the ability to modify past facts to match the new schema, but not change the "domain" fact itself. I guess you could say we need the ability to update the representation of the fact in the database to match "application/system" level changes.
Have a look at this
https://hashrocket.com/blog/posts/modeling-polymorphic-associations-in-a-relational-database?utm_source=postgresweekly&utm_medium=email
So Datomic is just so very heavy weight, it's total overkill for small apps,
and especially for microservices. It's almost the antithesis of microservices.
But the logical model is just so much more correct than anything else. How to resolve this?
What if we had "micro-peers" that where micro-peer stored only the data related
to a single unit of code. For instance take a list of products, could there not be
a micro-peer for just the list of products that was sort of similar to the full datomic
peer, but it uses a snapshot of a point in time + changes afterwards. So would a product detail
function from a micro-peer that cached all products details?
Or perhaps call it a connected micro collection... basically many small lists in an application that are hooked up to changes pushed from the database. Detail pages always pull latest from server to avoid caching too much locally.
In general how to handle schema migrations?
Especially how to handle schema data type change?
From string to int?
What should happen in query that returns a property that has facts both before and after a type change?
I'm now leaning towards allowing changing past "facts" to a different represention.. it doesn't change the "fact" as it is know in the system, just how it represented. I suppose there could be loss of precision from the past. But for over-all good of evolution of the data model, this is probably fine as long as it's understood.
What about fact re-names?
what would happen if a browser had some facts in memory while a deploy occured?
Could/should we push out a change notification via pg-notify?
https://news.ycombinator.com/item?id=10316872&utm_source=postgresweekly&utm_medium=email
What if a column schema allowed a list of allowed types for a given property?
for instance a property storing a rate could be positive rate or negative rate type
might also allow to say we know a person has a middle name but it's currently unknown
or might be able to say middle name could be name(text), has-no-name or unknown.
for unknown you might ask to complete complete middle name data
or might be able to say it's a US name or a complex name type from other country.
# Name ###
postfact
mea - hawaiian for fact
### numbering/identity ###
Using a single sequence so that every table PK is
a time sequenced gives us the ability to query the state of something,
then also get all facts/retractions that have occured
after that initial point in time state.
done.
### datatypes ###
So, if the query engine knows the type of an attribute, it can query the correct column or table.
#1 Sparse columns might not have issues with concurrency that come up with table inheritance?
#2 suppose we could also union the different tables into a sparse column result set...
#3 Table Inheritance with different queries.. Seems most attractive.
facts (fid,tid,eid,pid)
facts_int (value)
facts_text (value)
facts_datetime (value)
-- Pretty sure we would need to send multiple queries for each prop type to db,
-- then combine the results in the driver. Also don't forget result reading
-- is dependent on ordering of items.
-- also don't forget that we need to view the different queries as-of a single point in time?
-- http://stackoverflow.com/questions/8909069/storing-varying-data-types-in-a-postgresql-database
### Relationships ###
# belongs too... for example product belongs to a category
so category has entity id 11, attribute category/name which is a string
product/category is a property
that's an integer?
that's an entity/id (this is datomic's choice)
that's an entity/id where it's a also a category?
let's create a new product in that category
on new assert
product/category = 11
product/name = `harness`
# let's select a list of products in that category
where product/category = 11
select entity/id,
product/name
# what about a list containing both category and product information?
# should this even be allowed?
where product/category
select entity/id, (* products entity/id)
product/name
category/name
# could this be accomplished at app level...
# - pull & cache categories facts desired
# - pull products list
# name of the category is pulled from cache.
********************************
# so from this query
where product/category = 3
select product/name category/name
# so we need this
[{product/id: 1 product/name:"product a" category/id: 3 category/name: "category a"}
{product/id: 2 product/name:"product b" category/id: 3 category/name: "category a"}]
# how to do this given product/category = 3
- join back onto entities->properties?
create table properties ( -- should this be called attributes?
pid bigint DEFAULT nextval('serial') primary key not null,
type text not null,
name text not null );
create table entities (
eid bigint DEFAULT nextval('serial') primary key not null,
meta text not null );
create table current_facts (
cfid bigint DEFAULT nextval('serial') primary key not null,
eid bigint not null references entities,
pid bigint not null references properties );
create table current_facts_text ( value text not null ) inherits (current_facts);
- get the property id and datatype for the product/category property
- get list of entity id's that have a property with the product/category value
- get all properties for the list of entities
$pid = 44
select *
from current_facts_int cfi
where cfi.pid = product/category
and cfi.value = 3
to be continued...
solution a: so let's say we take a map/reduce angle on the joining..
get a list of the matching entities
where there is a joining attribute...
get a list of all unique id's from the primary result set
with only necessary attributes
then query database for this list of joined entities + attributues
then map/reduce the appropiate attributes into the primary enties
* so this is a bit chatty with the database, but if the database
is all in memory and database sizes aren't too large it might work.
* this eventually would want to be a pg extension/module I think.
* perhaps applications could be designed to not require as much redundant
reselection of data as well... for instance when viewing a list rather than
a full requery, it keeps the local results cached.
***************************************
# for many to many, create a relationship entity
on new assert
product-category/category = 11
product-category/product = 22
# to query list of products in a category then? ohy vey.
# how about some sort of join? implicitly on entity id ?
where product-category/category = 11
join product-category/product
select product-category/product
entity/id (* the products entity id!)
product/name
One to One? (eg biological mother)
### Schema Migrations ###
How the heck to handle schema rule changes?
simple addition of row to the properties table?
What about when an attribute goes from optional to required?
does required have any meaning if we don't have "entity" type rules?
How to remove a property from database?
mark property as removed?
I guess assert needs to look up property on assert?
can you retract a fact on a removed property?
what if you re-add the same property name at later point in time?
https://news.ycombinator.com/item?id=10145933
# Should entities have type?
Tables serve this function in sql. should there be rules around what compromises an entity?
Datomic has no concept of entity type rules, just collection of attributes with rules about each attribute.
related
http://kevinmahoney.co.uk/articles/log-orientated-data/
http://docs.datomic.com/best-practices.html
# How to assert negatives / unknowns?
# Implementation Ideas
* Given the need for the client to see almost all data grouped by entity, might it make sense to use a sparse fill factor and cluster on entity id? This would be rather than clustering on a sequnce id as primary key.
* Look into use of BRIN indexes for keys,
not as fast as BTREE but dramatically less space usage for large amounts of data
http://dba.stackexchange.com/questions/20759/is-there-a-name-for-this-database-structure/20763