Skip to content

matheusfrancisco/bdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of Berkley DB

Run the following command:

./gendata.py | db_load -T -t btree testdata.bdb

run the following command:

cd bdb-rs && cargo run -- ../testdata.bdb

reference to the following link for more information: https://transactional.blog/building-berkeleydb/

Overview

Why BerkeleyDB?

BerkeleyDB is ubiquitous. It’s installed on every platform, or easily installable, and there’s bindings available to it in nearly every language.

BerkeleyDB is simple. It’s not a highly optimized B-Tree implementation. It isn’t tuned or written with any one data model in mind. The B-Tree is a plain key-value store.

BerkeleyDB is realistic. It’s been used in real-world applications. Features were added to it according to the needs of real software. It influenced the design and of other embedded databases. What one learns from BerkeleyDB will be applicable to other, more complicated B-Trees.

Metapage

This is the metapage

metapage

The gap start on byte 460. The entire structure is 512 bytes in size. The rest of the 4096 byte page is unspecified and unused.

12-15 byte range: Magic Number It should always be 0x53162. This magic number is also unique to the Btree data type, so requiring this magic number makes sure that the file is not one of the alternative access methods supported by BerkeleyDB (hash, record, or queue).

28-31 byte range: Free list page number This will be used once mutations are implemented on the btree.

32-35 byte range: Last page number The page number of the last page in the database. This can be used instead of relying on size_of_database_file / 4KB - 1.

88-91: Root The page number of the root of the btree. Reads of the btree should start from this page.

BTree page

btree

The rest of the page is used to hold the page entries, which is the subject of our next step. For now, we only focus on the page headers. No bolding of fields this time, as all of them will be used.

00-07: Log sequence number The sequence number of this page, which will become important once updates and WAL support is implemented.

08-11: Current page number The page number of this page.

12-15: Previous page number The leaf page containing lexicographically lower data. 0 if none, or if internal node.

16-19: Next page number The leaf page containing lexicographically higher data. 0 if none, or if internal node.

20-21: Number of items on the page The number of entries contained on this page.

22-23: High free byte page offset The space between this byte offset and the header is empty.

24: Btree tree level A leaf node is 1, and it counts up, so any value greater than 1 also means an internal page.

25: Page type The DB meta page is type 9. An internal node is type 3. A leaf node is type 5. All other values are invalid.

Page Entries

In our page format, the header is followed by an array of offsets. The array is of length entries. Each entry in the array is two bytes, and gives an offset at which the entry can be found. Each offset is specified using a base of hf_offset. Given a pointer to a page, the first entry would be at hf_offset + offsets[0] (iff entries > 0). Note that this means that offsets[0] will be greater than offsets[1], as the first key written goes to the very end of the page.

page

And i in the above after the gap is hf_offset. It can be quite helpful to extend your code which prints the header format to include the offsets array, and a hexdump from hf_offset until the end of the page.

Entry Formats

There’s three formats for entries: KeyDataEntry, InternalEntry and OverflowEntry. A KeyDataEntry contains a key or value bytestring in a leaf page. An InternalEntry contains the pointer to a page and the boundary key for an internal node in the BTree. OverflowEntry is a pointer to another page which holds a value which overflowed the current leaf page in size.

The type field for all three is in the same position for all entries. KeyDataEntry has a type code of 1. OverflowEntry has a type code of 3. Yet InternalEntry also has a type code of 1. This means that one cannot dispatch purely on the type code to determine what sort of entry it is. One also needs to maintain the knowledge of if the current page is an internal page in the BTree (then type==1 means InternalEntry), or if the current page is a leaf page (then type==1 means KeyDataEntry).

KeyDataEntry

key

00-01 The length of the variable sized data field.

02 The type code for this entry. Is 1 for KeyDataEntry.

03-…​ The variably sized data for this entry.

InternalEntry

internal 00-01 Key/data item length.

02 Page type and delete flag.

03 Padding, unused.

04-07 Page number of referenced page.

08-11 Subtree record count.

12-…​ Variable length key/data item

OverflowEntry

not used in this implementation

Read data

Algorithm

Examining the entries in an internal B-Tree page is going to yield InternalEntrys that look like:

algo

And the goal is to find the correct next page to read by comparing the target key against the data in the InternalEntry. Your goal is to find the maximum key that’s less than or equal to your target key. If get() was provided a target key of NNNN, then the goal is to identify offset 60 of PgNo: 4, Data: MMMM as the proper entry, and repeat the search on page 4 next. One could binary search this, but the offsets are maintained in order, and so I found a linear scan the easiest.

Specifically, for internal pages, iterate over the entries with a sliding window of size 2, such that one visits the offset pairs of:

(NULL, 100) (100, 80) (80, 60) (60, 40) (40, 20) (20, NULL)

in specifically that order. You’re searching to find a (left, right), where left.data <= target_key && target_key < right.data, and skip the comparison on NULL. If your target key is less than the first InternalEntry's data value, then your target key is not found in the B-Tree. Start with the root page. Repeat on each page indicated by the identified InternalEntry's PgNo until you reach a page where page.level == 1.

Once you’re on a leaf page, there will be KeyDataEntrys instead

this was ported to rust

https://transactional.blog/building-berkeleydb/point-reads

leaf

Recall that there will always be an even number of offsets and entries on a leaf page, with the first entry being a key, and the second being the data. Iterate over all the entries in pairs (a tumbling window of size 2), and if pair[0].data == target_key, then return pair[1].data as the found key. If was no matching key, then the key doesn’t exist in the B-Tree.

key: bbbbbbbbbbbbbbbbbbbb, data: bbbbbbbbbbbbbbbbbbbb
key not found
key: ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss, data: ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

C api implemetation

Run output


Metadata {
    lsn: 4294967296,
    pgno: 0,
    magic: 340322,
    version: 9,
    pagesize: 4096,
    ec: 0,
    ty: 9,
    mf: 0,
    free: 0,
    last_pgno: 4,
    nparts: 0,
    key_count: 0,
    record_count: 0,
    flags: 0,
    uid: [
        108,
        41,
        19,
        1,
        13,
        0,
        0,
        1,
        126,
        207,
        0,
        68,
        252,
        170,
        0,
        0,
        0,
        0,
        0,
        0,
    ],
    minkey: 2,
    re_len: 0,
    re_pad: 32,
    root: 1,
    crypto_magic: 0,
    iv: 0,
    chksum: [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
    ],
}
BTree {
    lsn: 4294967296,
    pgno: 1,
    prev_pgno: 0,
    next_pgno: 0,
    entries: 3,
    hf_offset: 3796,
    level: 2,
    ty: 3,
}
BTree {
    lsn: 4294967296,
    pgno: 2,
    prev_pgno: 0,
    next_pgno: 3,
    entries: 24,
    hf_offset: 880,
    level: 1,
    ty: 5,
}
BTree {
    lsn: 4294967296,
    pgno: 3,
    prev_pgno: 2,
    next_pgno: 4,
    entries: 10,
    hf_offset: 1056,
    level: 1,
    ty: 5,
}
BTree {
    lsn: 4294967296,
    pgno: 4,
    prev_pgno: 3,
    next_pgno: 0,
    entries: 4,
    hf_offset: 2600,
    level: 1,
    ty: 5,
}
Key-Value Data:
   Key: bbbbbbbbbbbbbbbbb
   Value: bbbbbbbbbbbbbbbbb
   Key: ccccccccccccccccccccccccccccccccccccc
   Value: ccccccccccccccccccccccccccccccccccccc
   Key: ddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
   Value: ddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
   Key: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
   Value: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
   Key: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
   Value: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
   Key: ggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
   Value: ggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
   Key: hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
   Value: hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
   Key: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
   Value: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
   Key: jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
   Value: jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
   Key: kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
   Value: kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
   Key: lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
   Value: lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
   Key: mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
   Value: mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
   Key: nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
   Value: nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
   Key: ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
   Value: ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
   Key: ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
   Value: ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
   Key: qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
   Value: qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
   Key: rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
   Value: rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
   Key: sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
   Value: sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
   Key: ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
   Value: ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
==================================
key: bbbbbbbbbbbbbbbbb, data: bbbbbbbbbbbbbbbbb
key not found
key: sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss, data: sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

https://transactional.blog/building-berkeleydb/api-basics

References

https://transactional.blog/building-berkeleydb/

About

Implementation of Berkley DB in rust https://transactional.blog/building-berkeleydb/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published