LevelDB Project

by: burt rosenberg
at: university of miami
date: oct 2024

US Patent US4905110 A

Goals

Immutable Key-Value Stores

The traditional file system provides the service of data persistence. However, scalable datastores need to be distributed so that many processes can access the data at higher access rates. They need to be tolerant to failure, and to this end, the data is replicated.

Distributed access, replicated data, and fault tolerence make consistency of data a challenge. The LevelDB is a key–value store that addresses this challenge.

A key–value store is a collection pairs, a key with a value. The functions on the store are,

The LevelDB store achieves consistency and resilience by storing data in either immutable or append only files. An immutable file never changes, so one either has a copy of the data, or one does not. And having a copy supersedes not having a copy. An append only file is slightly more complicated in that there could be two copies of the file, but one is always a prefix of the other, if not the same.

Combining immutable and append-only stores for key-value stores, no fact is ever wrong. The collection of all facts grows, and is at any moment is a subset of all that facts at any future time.

LevelDB is a very durable method for storing data. It is resistant to corruption caused by concurrency issues. The LevelDB system can be used in a distributed architecture, with multiple threads, because the LevelDB files are immutable. Immutable files cannot give conflicting data, because the data never changes.

Updates replace old immutable file with new ones in a manner that is atomic, consistent, efficient, and concurrent. The datastore will never be unavailable, and its files can be copied or backed up at any point, as they never change and are never wrong.

LevelDB

This is a variant of LevelDB, and not exactly the official datastore by that name. Our project will have four levels, Key-value pairs are appended to the end of the level 0 table, with newer values superseding older values. At some point the level 0 closed against more appends and called the level 1 table and a new level 0 table is allocated.

The level 1 log structured table (by time) is eventually gleaned of repeated keys and sorted (by key) to make a level 2 table that replaces the level 1 table. Eventually the level 2 table is merged into the level 3 table, which is the final location for the data. Tombstone marks, which indicated deleted keys, are not needed in the level 3 table.

LevelDB tables
         +-------------------------+   +-------------------------+
level 3  | k,v | k,v |  ...  | k,v |   | k,v | k,v |  ...  | k,v |   . . . 
         +-------------------------+   +-------------------------+
         
                      ⇑ merge and remove tombstones
           
                +----------------------------+    
level 2         | k,v | k,v |   ...    | k,v |
                +----------------------------+
                    
                      ⇑ glean and sort
           
         +------------------------+    
level 1  | k,v | k,v | .... | k,v |  
         +------------------------+
   
                                  +---------------------
level 0                           | k,v | k,v | k,v | ....    ⇐   | k,v |
                                  +---------------------    new pairs append to
                                                            open table

        +------------------------+--------------+----------------------------------------+
        |     KEY (16 bytes)     | TS (8 bytes) |           VALUE (48 bytes)             |
        +------------------------+--------------+----------------------------------------+

The Level 0 table

The level 0 table is a sequence of fixed sized records for a total length of 64 bytes. The keys and values are not always null-terminated strings. The key and value are placed into their respective fields, with null padding on the right when the data does not fill the field. The null byte is not allowed in the key or value itself.

Note 1: Use strncpy to deal with the key and the value. Just be prepared with buffers that are 17 and 41 bytes long for the key and the value, respectively, and to put a null in the last byte (conditionally or otherwise).

Note 2: To time stamp, kv_pair->timestamp = time(NULL) ; is a hint.

#define KEY_SIZE 16
#define TIMESTAMP_SIZE 8
#define VALUE_SIZE 40
#define LEVEL0_N_PAIRS 256

struct KV_PAIR {
    char key[KEY_SIZE] ;
    time_t timestamp ; // char timestamp[TIMESTAMP_SIZE]
    char value[VALUE_SIZE] ;
} ;

struct KV_PAIR * kv_pair = (struct KV_PAIR *) leveldb_g.l0_mm ;
The level 0 table contians LEVEL0_N_PAIRS number of these structs, forming a fixed record file, that is also an in-memory array, using memory-mapped io. The function mmap_level0_file in levedldb.c sets up the mapping using the mmap system call. The result is that the virtual memory system is recruited for file I/O, by assigning the files as the backing store for a region of virtual memory.

The file has the fixed record structure of an array of structs, and can be used as such. Here is an example of finding the end of the level 0 file, that is, the first empty key.

int find_next_index() {
    int i ;
    struct KV_PAIR * kv_pair = (struct KV_PAIR *) leveldb_g.l0_mm ;
    int found = -1 ;
    for (i=0;i<LEVEL0_N_PAIRS;i++) {
        if (kv_pair[i].key[0]=='\0') {
            found = i ;
            break ;
        }
    }
    return found ;
}

Work to Accomplish

NAME
    LevelDB filesystem using FUSE
    
SYNOPSIS	
    leveldbd [-v] _keystore_dir_ _tablestore_dir_
    
DESCRIPTION
    Uses FUSE to attach a key-value filesystem implemented using the 
    ideas of the LevelDB datastore that runs Google's BigTable. 
    
OPTIONS
    -v verbose output
        
HISTORY
    New for csc421-201, november 2019

BUGS
The template should run without doing any of the key-value stuff. You need to add reading, writing and deleting of key-values.

Please run at least three windows.

  1. a window to run the daemon leveldbd;
  2. a window to the test key-value pairs in the subdirectory keystore;
  3. and a window to run make level-dump that hexdumps the file kv-store/l0.dat.
Implement the behavior,

To make things simpler, truncate silently user input when necessary. Truncate keys longer than 16 characters to 16 characters, and values longer than 40 characters to 40 characters. It might be easier to use strncpy and strnlen, and represent strings as a fixed length, not necessary null terminated, but if so, null padded as well. See strncpy.

Note that this implementation does not distinguish between a deleted key and key with an empty value. It even does not distinguish between a key with an empty value and a key that never existed, except that a key that was deleted has a deletion time stamp, and a key that never was has time stamp the current time, when listed.

Also note, the listing of the directory never shows keys (this is hard to implement).

The implementation must be by appends to the key-value store, as this pages has described. The behavior is important but correctness will also be checked by comparing your level 0 file against a reference file.

$ ls -l keystore/
total 0
-rwxrwxrwx 1 root root 0 Oct 28 02:33 leveldb
$ echo apple > keystore/a
$ cat keystore/a
apple
$ ls -l keystore/a
-rwxrwxrwx 1 root root 6 Oct 28 02:33 keystore/a
$ echo banana > keystore/b
$ cat keystore/b
banana
$ ls -l keystore/b
-rwxrwxrwx 1 root root 7 Oct 28 02:36 keystore/b
$ echo artist > keystore/a
$ cat keystore/a
artist
$ ls -l keystore/a
-rwxrwxrwx 1 root root 7 Oct 28 02:38 keystore/a
$ rm keystore/b
$ cat keystore/b
$ ls -l keystore/b
-rwxrwxrwx 1 root root 0 Oct 28 02:39 keystore/b
$ ls -l keystore/
total 0
-rwxrwxrwx 1 root root 0 Oct 28 02:45 leveldb
$ make level-dump 
hexdump -C kv-tables/l0.dat
00000000  61 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000010  2c 30 1f 67 00 00 00 00  00 00 00 00 00 00 00 00  |,0.g............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000040  61 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000050  2c 30 1f 67 00 00 00 00  61 70 70 6c 65 0a 00 00  |,0.g....apple...|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  62 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |b...............|
00000090  e5 30 1f 67 00 00 00 00  00 00 00 00 00 00 00 00  |.0.g............|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000c0  62 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |b...............|
000000d0  e5 30 1f 67 00 00 00 00  62 61 6e 61 6e 61 0a 00  |.0.g....banana..|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  61 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000110  6a 31 1f 67 00 00 00 00  00 00 00 00 00 00 00 00  |j1.g............|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000140  61 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000150  6a 31 1f 67 00 00 00 00  61 72 74 69 73 74 0a 00  |j1.g....artist..|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000180  62 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |b...............|
00000190  8b 31 1f 67 00 00 00 00  00 00 00 00 00 00 00 00  |.1.g............|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004000

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 10 nov 2019
update: 27 oct 2024