Mini-FAT Project

by: burt rosenberg
at: university of miami
date: nov 2017
NAME
    minifat
    
SYNOPSIS	
    minifat -s -d _directory_ [-cv] _virtual-disk_
    
DESCRIPTION
    Uses FUSE to attach the file system implemented in the minifat code to the
    named directory, with the virtual data file _virtual-disk_.

OPTIONS
   The options -s and -d are passed to FUSE, and are manditory.

    -d The directory to attach.
    -s Single threaded
    -c create the file _virtual-disk_, or zero if already exists.
    -v verbose output
          
    When zeroing the virtual disk, or running without the -c option,
    verify that the _virtual-disk_ is 1,048,576 bytes exactly, and begins
    with the character sequence "MINIFAT 1.0\0".    
        
HISTORY
    FAT/FUSE was introduced in CSC521 2006-2007; MiniFat was introduced 2012-2014; 
    further simplified in this project.
    
BUGS


US Patent US4905110 A

Goals

You are to write a simple file system which will use a regular file as the "virtual disk". The structure of the file system is based on FAT, simplified, and called mini-FAT. In order that you can work as a file system driver but not in the kernel, Linux has a technology called FUSE — Filesystem in User SpacE. Built into the kernel, FUSE will redirect file system calls to a program you write and run as a background job as a regular user program. Your program will take those redirected calls and act upon the data in the "virtual disk" file to carry out the request.

Mini-FAT description

The Mini-FAT filesystem is a simplification of the FAT file system. The FAT filesystem, although simple and robust, has a lot of complications, which don't add to the conceptual understanding of filesystems, although are very instructive to the true nature of software technology in the context of the real world. That is to say, it is surprisingly messy, and the half of it is untold in the official documentation.

The mini-FAT file system makes the following simplifications to FAT (see the FAT documentation for concepts):

✶ There are 1024 bytes in a cluster, and 1024 clusters in the file system.

✶ The super-block occupies cluster 0 and has no purpose. It can contain all zeros.

✶ The meta-data region, which consists only of the FAT table. The FAT table is an array of 32 bit integers, indexed by cluster numbers, 0 through 1023. The table occupies clusters 1 through 4.

✶ The content region, which contains files and directories; which occupies clusters 5 through 1023.

✶ Cluster 5 always contains the first cluster of the root directory.

✶ The FAT table creates a linked list of clusters in a file or directory; except that the value 0 means the cluster is free, and the value of -1 marks the end of a linked list of clusters.

✶ Zero sized files and directories have dir_firstcluster equal to zero. This is actually per official documentation.

Notes on directory entries:

The format of the directory entry is given in the following table.

MiniFat Directory Entry
name offset  size description
dir_name0118+3 filename. Empty entry if first byte is 0x0. (see notes)
dir_attr1110x0 for a regular file, 0x10 for a directory file.
dir_empty1212unused
dir_firstcluster244cluster number of first cluster in file
dir_filesize284size of file in bytes, if a regular file.

✶ The minifat directory format is a simplication of the FAT directory format. Each directory is a sequence of directory entires. Each directory entry is either in use or empty. The directory can be a chain of clusters, and any cluster in the chain can contain only empty directories, for instance, after a sequence of deletions.

✶ Filesize is ignored for directories. The length of a directory is the total length of the clusters in the cluster chain.

✶ Like the true FAT file system, filenames are 8+3 style and blank padded on the right. The file system is case-insensitive, but filenames are stored all upper case.

✶ An empty directory entry is indicated by a 0 (0x0) as the first character of the file name. (The FAT option of using a different mark to indicate an empty directory entry but other non-empty directory entries follow, is not supported.)

✶ A filename can only contain the characters "A" through "Z", "0" through "9", and "~". The "~" character is reserved to the file system and may substitute for characters other than in the permitted set.

✶ Files have no owner, permissions, or dates. Minifat may report the files are owned by root, have all permissions, and dates of unix time zero.

✶ Fields dir_firstcluster and dir_sizesize are little endian.

✶ Empty directories have dir_firstcluster equal to 0.

✶ Directories should simulate . and .. entries, but do not actually have such entiries in the directory.

Project Approach

This is a larger programming assignment. Properly breaking the code construction is essential. The project has been divided at a a stage one and strage two to guide the construction.

First you need to understand FUSE and memory mapped files. Examples programs are in the class/examples directory, for your experimentation.

In stage one development of the file system, focus on the FAT table and the root directory. Implement only the creating, deleting, and listing of files in the root directory. Ignore reading and writing to any file. Claim any file to be size 0, and discard any write requests without complaint.

The root directory is a file of sorts, and working with the root directory is a good warm-up for files. Divide further this stage to a adding and removing entries in a fixed size root directory. The implement how to expand and reduce the cluster chain of the root directory. This includes reclaiming clusters when the root directory size is reduced.

In stage two, implement reading and writing of files. This too has sub-tasks. First implement appending to a file, as well as the truncate operatin. There is already several interesting coding problems in these operations. Next, implement reading and writing at various locations in the file. Zero fill the file if the write skips over bytes.

The truncate fuseop is a good starting point for stage 2. It is a subtask to writing, it can be written and tested in isolation of any other operation including write, and it exercises much of the algorithmic complications of stage 2.

Truncate can either reduce the size of a file, or increase it. To reduce it, you should release and free unneeded clusters. To increase, you must claim and link new clusters, zero filling both the new clusters, and the slack space that might exist in the cluster at the old end of the cluster chain.

The operations read and write are probably best co-developed, although you could develop a simplified write first using hexdump to watch the effects directly in the virtual disk. Simplified versions of read and write to consider: at first ignore offset and always read and write starting at offset 0; only consider files of length 1024 or less. Once these are running you can start to implement further behavior.

Getting started with FUSE

FUSE is a kernel technology that passes file requests to a listener program. The program attaches to a directory, and file requests that descend into that directory at passed on the the program, for the program to respond. It allows us to hook our minifat file system into the operating system.

To get started with FUSE, explore class/examples/fuse-hw. The Makefile has an install target that you will have to run to install the FUSE development package. Boot on a generic kernel, as your modified kernels might have failed to load the FUSE kernel extension. The test target will compile and run the hello-world file system. This is a phoney file system where baked in there is a single file in the root directory, with a fixed content.

User programs that are requesting files in your file system will be redirected in the kernel to FUSE, and FUSE will send those requests to your program, minifat, which will be running continuously. The requests are made through callbacks to functions loaded into FUSE through the presentation of a function vector — an array of functions.

How this will work in your program is that minifat.c will implement the read and write calls, changing the contents of the fatdisk.dmg file, such that it looks like a FAT file system is mounted on the mount point.

FUSE and Files

User-program <-----> Kernel <-----> FUSE <-----> minifat.c <-----> fatdisk.dmg

            filesystem                 FUSE callbacks     filesystem

The documentation on FUSE is scarce. Here are some resources:

Getting started with Memory Mapped files

While it would be more realistic if the project required that your code handle the reading and writing to our "disk" block-by-block, that is tedious. You could use lseek to move around the file as if it were a the head of a block device and then read entire clusters. But you can also use memory-mapped IO. Since the "disk" is small (1024 clusters of 1024 bytes for a 1M disk) you can just map the file into virtual memory and read and write the file as if it were memory.

This will give you experience with memory-mapped IO, which might prove useful.

Look at class/examples/mmap-mf for an example program that shows how to set up memory-mapped IO, how to coerce the FAT table structure and Dirent structures onto the memory, at the correct (or possibly correct) locations.

Screen

The program screen allows you to run multiple windows. This allows minifat to run in the foreground so you can see diagnostic output while in another window you test the fuse mounted directory or hexdump the virtual disk, to see what is going on. You can also log in multiple times using ssh, but this means setting up an ssh demon on your host and possibly modifying your networking options.

To use screen type "screen" and accept with the space bar. All screen commands start with control-A.

When all created screens are killed, screen exits. Screen is a bit strange in that after running screen there isn't a real change that you can see. Type "screen" then ^A c to create a new screen. Then use ^A ^A to flip back and forth between the screens. This will give one screen to run minifat in the foreground and observe it diagnostics message, and another screen to exercise the file system, running ls and so forth.

See: www.gnu.org, screen.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 8 nov 2015
update: 23 nov 2017