LevelDB Project: Level 1

by: burt rosenberg
at: university of miami
date: nov 2020
NAME
    LevelDB FUSE filesystem for Linux
    
SYNOPSIS	
    leveldbd [-v -s _l0_size_] _keystore_dir_ _tablestore_dir_
    
DESCRIPTION
    Uses FUSE to attach a key-value filesystem implemented using the 
    ideas of the LevelDB datastore that runs Google's BigTable. 
    
OPTIONS
    -s the size of the l0 store, in number of key-value pairs
    -v verbose output
        
HISTORY
    New for csc421-201, november 2019

BUGS

US Patent US4905110 A

LevelDB

LevelDB is a family of files (in the well directory) that store the key-value pairs. The files are mostly immutable (once created they never change), except some are append-only (which shares with immutable a robustness to concurrent access). The files all share a common format. They are a sequence of fixed size (64 bytes) records, with a key (first 16 bytes) then a value (48 bytes).

The key and value are placed into their respective fields, with null ('\0') padding on the right. The null byte is not allowed in the key or value itself. It is like a C string, however, the key or value can fill their entire field, and then will not have a null termination byte.

There are three levels of files,

LevelDB tables
         +-----------------------------------------------------------+ 
level 2  | k,v | k,v | k,v |           ...                     | k,v |
         +-----------------------------------------------------------+
         
           ⇑ search level 2 last
           
                +----------------------------+    
level 1         | k,v | k,v |   ...    | k,v |
                +----------------------------+
                    
           ⇑ search level 1 newest to oldest
           
                           +----------------------------+    
level 1                    | k,v | k,v |   ...    | k,v |
                           +----------------------------+ 

            ⇐ search level 0 newest first, open table first ⇐   

                  (closed)                     (open) 
         +------------------------+     +------------------
level 0  | k,v | k,v | .... | k,v |     | k,v | k,v | ....    ⇐   | k,v |
         +------------------------+     +------------------`   new pairs append to
                                                               open table



        +----------------+------------------------------------------------+
        | KEY (16 bytes) |           VALUE (48 bytes)                     |
        +----------------+------------------------------------------------+

Project: Level 1

In the previous project, the Level 0 table was implemented, and once full, the behavior was left unspecified. The leveldbd.c and leveldb.h files are changed from the previous project. Copy from class to your folder the proj7 directory, then from your proj6 file copy over your leveldb-sub.c.

Important note: For testing, the size of the level0 can be set by a run-time option.

While the true leveldb would have multiple level 0 and level 1 tables, we will only have a single table of each level. As described previously, new key-value pairs are appended onto the end of the level0 tables. Searching the level0 from newest to oldest, reads on the leveldb store report the newest binding of a key to a value.

This is durable, but slow, as an O(n) linear search is used. So at a certain size the level0 table is closed, gleaned, and sorted by key, to make a level1 table that can be binary searched.

In a full implementation, the datastore would be non-stop. A new level0 table would be opened, and the old level0 still searched, but after the new level0 table. In the meanwhile, the closed level0 will be turned into a level1, and when done, a switch-over will occur that replaces the closed level0 with its level1 version. We however will make sure our filesystem is single-threaded, and that thread will accomplish the level1 creation, then the zeroing of the level0, while making new key-value pairs wait.

L0 to L1 migrations.

Just as the level0 table is a file, named l0.dat; the level1 table is a file, named l1.dat. The rules are,
  1. The current level0 is migrated to a level1.
  2. If there was an existing level1, it is overwritten.
  3. Once the level0 is migrated, it is zeroed.

To make the level1, you must create an algorithm that does the following:

L0 to L1 Migration Algorithm

Input: An L0 table: an ordered collection of (key,value).
Output: An L1 table: a collection of (key,value) sorted on key, such that each key appears only once.

Correctness:
If a key appears in the input, it appears once in the output; its associated value
is the value from the most recent (key,value) in the input.

No key appears in the output unless it appeared in the input.

Searching in the L0/L1 tables

Search first in the level0, from newest to oldest. If nothing is found, search in the level1 table.

You must use binary search in the level1 table (for full credit). If it is not found in the level1 table as well, then the key is not in the datastore.

Hint: While it was fun and easy to use memory maps for the level0, this does not seem to be as important for level1. The records are fixed sized for a reason — that you can use pread or lseek with an easily computed offset.

Extra credit: Level 2 tables

The last level is level2. It is searched when a key is not found in either the level0 or level1. Like the level1, it is sorted by key, for binary search, and each key appears at most once. Unlike the level2, there are no tombstone entries in the table. (Note: this means that every value found in the level2 is non-empty.)

Since we only have one table of each level, and level1 to level2 migration is called when a level0 to level1 migration is needed. First the level1 is merged with an existing level2 (if any) and made the new level2. Then the level1 is zero'ed and a level0 to level1 migration follows.

L1 to L2 Migration Algorithm

Input: An L1 table: a collection of (key,value) sorted on key, such that each key appears only once.
       and perhaps an existing L2 table.
Output: An L2 table: a collection of (key,value) sorted on key, such that each key appears only once and
       no value is empty (length 0).

Correctness:

A key appears in the L2 if and only if-
   1) it appears only in the level2 table; in this case the value is the level2 value.
   2) or it appears only in the level1 table and it is not a tombstone; in this case the
      value is the level1 value.
   3) or it appears both in the level2 and level1 and it is not a tombstone; in this case
      the value is the level1 value.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 5 nov 2020
update: 5 nov 2020