NAME LevelDB FUSE filesystem for Linux SYNOPSIS leveldbd [-v] _keystore_dir_ _tablestore_dir_ DESCRIPTION Uses FUSE to attach a key-value filesystem implemented using the ideas of the LevelDB datastore that runs Google's BigTable. OPTIONS -v verbose output HISTORY New for csc421-201, november 2019 BUGS
The traditional file system is a data store that sometimes does not scale well. It provides the service of data persistence. However, scalable datastores need to be distributed so that many processes can access the data at higher access rates; they need to be tolerant to failure, and to this end, the data is replicated. Distributed access, replicated data, and fault tolerence are easier accommodated with key-value stores.
A key-value store is a collection pairs, a key with a value. A key-value pair can be inserted into the store, over-writing if there is an existing pair with a matching key; and the value associated to a key can be retrieved.
Databases that change in time are difficult to make consistent. A database that never changes is of limited application. A compromise is a database that many ways is immutable, and otherwise grows by appending new facts. No fact is over wrong, but it could be out-dated. The collection of all facts at any moment is a subset of all facts at any future time.
This is not exactly true, as outdated facts are effectively removed as they are no long reported. However, it is really a pragmatic matter, that a "gleaning" is made to recover space used by outdated facts, once it is well-known that the fact is outdated in any current version of the data store.
LevelDB is a family of files (in the well directory) that store the key-value pairs. The files are mostly immutable (once created they never change), except some are append-only (which shares with immutable a robustness to concurrent access). The files all share a common format. They are a sequence of fixed size (64 bytes) records, with a key (first 16 bytes) then a value (48 bytes).
The key and value are placed into their respective fields, with null ('\0') padding on the right. The null byte is not allowed in the key or value itself. It is like a C string, however, the key or value can fill their entire field, and then will not have a null termination byte.
There are three levels of files,
The tables are arranged on a search path that assure that the most recent version of a key-value pair is encountered first on the path. Once a key match is found, the search terminates. The older key-value pairs of the same key are in effect discarded, without having to modify any files.
Level 0 files are search from most recently added key-value pair backwards, stopping at the first match of the key. In this way updated values hide older values. There is an open level 0 file which accepts appends. When the current file is large enough, it is closed and a new open level 0 file is created. Then the two level 0 files are searched newest to oldest, open file first, then closed file, for the first match.
Level 1 and 2 files are searched by binary search, aided by the fixed record length. The are searched from the most recently created level 1 file back towards the oldest level 1, then searching the level 2 for the first match.
Tombstone
To avoid modifications of settled data, deletes are by hiding the deleted key-value pair by a meta key-value pair that indicates the key is not present. Like the student that says "I am not here" during attendance. In this year's version, we will mark a deleted key by a key with the empty string as a value.
Key-value pairs in level 2 files will never be deleted from the file, as all files are immutable. A new level 2 file will be created without the deleted pairs, and replaces all-or-nothing the existing level 2 file. There are never any Tombstones in a level 2 table, as they are not needed. Tombstones in level 0 or level 1 files are needed to halt the search for a deleted key.
Gleaning
Updates work by placing the new value ahead of the old value in the search of pairs. Because level 0 are append-only, they might contain multiple pairs of the same key. However level 1 and level 2 files must not. The process of discarding hiden key-value pairs is gleaning.
Glean a level 0 when forming a level 1 by sweeping through discarding all but the most recent key-value pair of a key. Treat the tombstones no differently then an other pair.
Level 1's and combined with each other, or with level 2's, by gleaning all but the most recent key-value pair for a key. When merging to create a new level 2, the tombstones are dropped.
The key–value store is given the patina of a file system by running your FUSE enabled program, leveldbd, on the mount point store.
The kv-table directory can contain a number of level 0 and level 1 tables, and a single level 2 table, each as separate files. The level 0 tables are append only until a certain size, when they become immutable. The other tables are all immutable.
However, tables are deleted when it is certain they will never be accessed again.
It also memory-maps it, so that you will not have to use file operations on the table. It can be programmed as if it were an in-memory array.
author: burton rosenberg
created: 10 nov 2019
update: 21 oct 2020