Mini-FAT Project

NAME
    minifat
    
SYNOPSIS	
    minifat -s -d _directory_ [-cv] _virtual-disk_
    
DESCRIPTION
    Uses FUSE to attach the file system implemented in the minifat code to the
    named directory, with the virtual data file _virtual-disk_.

OPTIONS
   The options -s and -d are passed to FUSE, and are manditory.

    -d The directory to attach.
    -s Single threaded
    -c create the file _virtual-disk_, or zero if already exists.
    -v verbose output
          
    When zeroing the virtual disk, or running without the -c option,
    verify that the _virtual-disk_ is 1,048,576 bytes exactly, and begins
    with the character sequence "MINIFAT 1.0\0".    
        
HISTORY
    FAT/FUSE was introduced in CSC521 2006-2007; MiniFat was introduced 2012-2014; 
    further simplified in this project.
    
BUGS

Goals

You are to write a simple file system which will use a regular file as the "virtual disk". The structure of the file system is based on FAT, simplified, and called mini-FAT. In order that you can work as a file system driver but not in the kernel, Linux has a technology called FUSE — Filesystem in User SpacE. Built into the kernel, FUSE will redirect file system calls to a program you write and run as a background job as a regular user program. Your program will take those redirected calls and act upon the data in the "virtual disk" file to carry out the request.

Mini-FAT description

The Mini-FAT filesystem is a simplification of the FAT file system. The FAT filesystem, although simple and robust, has a lot of complications, which don't add to the conceptual understanding of filesystems, although are very instructive to the true nature of software technology in the context of the real world. That is to say, it is surprisingly messy, and the half of it is untold in the official documentation.

The mini-FAT file system makes the following simplifications to FAT (see the FAT documentation for concepts):

✶ The super-block occupies cluster 0 and has no purpose. It can contain all zeros.

✶ The meta-data region, which consists only of the FAT table. The FAT table is an array of 32 bit integers, indexed by cluster numbers, 0 through 1023. The table occupies clusters 1 through 4.

✶ The content region, which contains files and directories; which occupies clusters 5 through 1023.

✶ The FAT table creates a linked list of clusters in a file or directory; except that the value 0 means the cluster is free, and the value of -1 marks the end of a linked list of clusters.

✶ Zero sized files and directories have dir_firstcluster equal to zero. This is actually per official documentation.

Notes on directory entries:

✶ The minifat directory format is a simplication of the FAT directory format. Each directory is a sequence of directory entires. Each directory entry is either in use or empty. The directory can be a chain of clusters, and any cluster in the chain can contain only empty directories, for instance, after a sequence of deletions.

✶ Filesize is ignored for directories. The length of a directory is the total length of the clusters in the cluster chain.

✶ Like the true FAT file system, filenames are 8+3 style and blank padded on the right. The file system is case-insensitive, but filenames are stored all upper case.

✶ An empty directory entry is indicated by a 0 (0x0) as the first character of the file name. (The FAT option of using a different mark to indicate an empty directory entry but other non-empty directory entries follow, is not supported.)

✶ A filename can only contain the characters "A" through "Z", "0" through "9", and "~". The "~" character is reserved to the file system and may substitute for characters other than in the permitted set.

✶ Files have no owner, permissions, or dates. Minifat may report the files are owned by root, have all permissions, and dates of unix time zero.

✶ Directories should simulate . and .. entries, but do not actually have such entiries in the directory.

Project Approach

This is a larger programming assignment. Properly breaking the code construction is essential. The project has been divided at a a stage one and strage two to guide the construction.

First you need to understand FUSE and memory mapped files. Examples programs are in the class/examples directory, for your experimentation.

In stage one development of the file system, focus on the FAT table and the root directory. Implement only the creating, deleting, and listing of files in the root directory. Ignore reading and writing to any file. Claim any file to be size 0, and discard any write requests without complaint.

The root directory is a file of sorts, and working with the root directory is a good warm-up for files. Divide further this stage to a adding and removing entries in a fixed size root directory. The implement how to expand and reduce the cluster chain of the root directory. This includes reclaiming clusters when the root directory size is reduced.

In stage two, implement reading and writing of files. This too has sub-tasks. First implement appending to a file, as well as the truncate operatin. There is already several interesting coding problems in these operations. Next, implement reading and writing at various locations in the file. Zero fill the file if the write skips over bytes.

name	offset	size	description
dir_name	0	11	8+3 filename. Empty entry if first byte is 0x0. (see notes)
dir_attr	11	1	0x0 for a regular file, 0x10 for a directory file.
dir_empty	12	12	unused
dir_firstcluster	24	4	cluster number of first cluster in file
dir_filesize	28	4	size of file in bytes, if a regular file.

The truncate fuseop is a good starting point for stage 2. It is a subtask to writing, it can be written and tested in isolation of any other operation including write, and it exercises much of the algorithmic complications of stage 2.

Truncate can either reduce the size of a file, or increase it. To reduce it, you should release and free unneeded clusters. To increase, you must claim and link new clusters, zero filling both the new clusters, and the slack space that might exist in the cluster at the old end of the cluster chain.

The operations read and write are probably best co-developed, although you could develop a simplified write first using hexdump to watch the effects directly in the virtual disk. Simplified versions of read and write to consider: at first ignore offset and always read and write starting at offset 0; only consider files of length 1024 or less. Once these are running you can start to implement further behavior.

Getting started with FUSE

FUSE is a kernel technology that passes file requests to a listener program. The program attaches to a directory, and file requests that descend into that directory at passed on the the program, for the program to respond. It allows us to hook our minifat file system into the operating system.

To get started with FUSE, explore class/examples/fuse-hw. The Makefile has an install target that you will have to run to install the FUSE development package. Boot on a generic kernel, as your modified kernels might have failed to load the FUSE kernel extension. The test target will compile and run the hello-world file system. This is a phoney file system where baked in there is a single file in the root directory, with a fixed content.

User programs that are requesting files in your file system will be redirected in the kernel to FUSE, and FUSE will send those requests to your program, minifat, which will be running continuously. The requests are made through callbacks to functions loaded into FUSE through the presentation of a function vector — an array of functions.

How this will work in your program is that minifat.c will implement the read and write calls, changing the contents of the fatdisk.dmg file, such that it looks like a FAT file system is mounted on the mount point.

Getting started with Memory Mapped files

While it would be more realistic if the project required that your code handle the reading and writing to our "disk" block-by-block, that is tedious. You could use lseek to move around the file as if it were a the head of a block device and then read entire clusters. But you can also use memory-mapped IO. Since the "disk" is small (1024 clusters of 1024 bytes for a 1M disk) you can just map the file into virtual memory and read and write the file as if it were memory.

Look at class/examples/mmap-mf for an example program that shows how to set up memory-mapped IO, how to coerce the FAT table structure and Dirent structures onto the memory, at the correct (or possibly correct) locations.

Screen

To use screen type "screen" and accept with the space bar. All screen commands start with control-A.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 8 nov 2015
update: 23 nov 2017