-
-
Notifications
You must be signed in to change notification settings - Fork 877
MapDB4 notes
Quick and dirty notes and intro for MapDB 4 (aka V4). Will be rewritten into documentation later. It is a brain dump, feel free to send pull request to clarify typos.
Production code is written in Java only. Test code and part of build system is written in Kotlin.
There is code generator for some source files located in srcGen
folder.
It is ran before compliation, to generate .java
files.
Just Kotlin code that manipulates strings and file.
Preprocessor marks are in source code and start with //-
comment (//-WLOCK
, //-newRWLOCK
etc).
There are two test gradle tasks:
-
gradle test
is quick test ran during normal development. It should run under 10 minutes, use less than 5 GB RAM. -
gradle testlong
is long acceptance test ran before each release. It should ran under one week, requires 64GB RAM and 500GB on disk. Usejava.io.tmpdir
property to change disk location.
MapDB has some deps (Guava, Eclipse Collections), but those will be removed before release.
MapDB is a hybrid between database and Java on-heap collections. This section describes internal representation of data.
Older versions used serializers. But over time we extended their role to do hashing, comparators, array inserts etc..
Constructor BTreeMap(Serializer.LONG, Serializer.STRING)
is simple way to infer data type.
Also this allowed great memory optimalization (Long[]
vs long[]
for btree nodes)
However if Comparator
is fused into Seriazer
it cements ascending order.
Over time this went into complicated design.
V4 replaces Serializer
with Shaper
(as from Data Shape). Single object Serializer
is now special subcase for Shaper
.
Data form was introduced to reduce memory overhead for heap cache (see BTreeKeySerializer
in v3).
For example sorted array of Long numbers in btree node is only used internally, user does not see its content. So on heap it can be stored in many forms, if binary search capability is preserved.
-
Object[]
with generic comparator -
long[]
with some plugable comparators -
int[]
if all -
long start
(first value) andbyte[] deltas
to save memory.
MapDB can also operate directly over binary ByteBuffer
. It is possible to compare keys without deserialing them.
In this case btree node only needs ByteBuffer
offset.
Older versions had this concept (here called data form) added to existing code, a bit dirty. V4 includes this in design from start.
Internally MapDB can represent data in many forms, depending on performance, caching etc.. In BTree high dir nodes could use fast long[]
held in cache, lower dir nodes binary ByteBuffer
.
User can usually access data only in heap form. For Map<Long,Adress>
only Long
form of key is accessible.
However in future other forms should be accessible. For example Adress
value could be transfered from mmap file directly to Netty buffer. This would skip serialization, data copying and CPU cache.