Mini-FAT Project

by: burt rosenberg
at: university of miami


US Patent US4905110 A

This is in-progress. Please check back as the documentation is updated

You are to write a simple file system which will use a regular file as the "virtual disk". The structure of the file system is based on FAT, simplified, and called mini-FAT. In order that you can work as a file system driver but not in the kernel, Linux has a technology called FUSE — File system in User space. Built into the kernel, FUSE will redirect file system calls to a program you write and run as a background job as a regular user program. Your program will take those redirected calls and act upon the data in the "virtual disk" file to carry out the request.

The project therefore requires that you also learn about FUSE. There is some documentation, and a sample "Hello World" filesystem, that does nothing at all except pretend to have a single file whose contents is the string "Hello World". In fact, there is no file system at all, just a bunch of functions that are connected to file system calls by the FUSE mechanism, and these functions return the string "Hello World" when the file helloworld.txt is read, and pretty much ignore any other file system operation.

Mini-FAT Description

The Mini-FAT filesystem is a simplification of the FAT file system. The FAT filesystem, although simple and robust, has a lot of complications, which don't add to the conceptual understanding of filesystems, although are very instructive to the true nature of software technology in the context of the real world. That is to say, it is surprisingly messy, and the half of it is untold in the official documentation.

The Mini-FAT filesystem is composed of a sequence of 1024 byte clusters. We fix the size of the file system as 1024 clusters. The filesystem has three sections:

  1. The super-block, which occupies cluster 0;
  2. The meta-data region, which consists only of the FAT table; which occupies clusters 1 through 4;
  3. The content/data region, which contains files and directories; which occupies clusters 5 through 1023.

The superblock is empty and can contain all zeros.

The FAT table is an array of 32 bit integers. The following values are reserved and have special meaning in the FAT table:

Files and directories are contained in the data section in a chain of clusters represented by FAT[x]=y when cluster y follows cluster x in the chain of clusters. The starting cluster for a file or directory is given in the directory entry for the file or directory, with the exception of the root directory, which is contained in no directory.

Files have lengths, given in the file length field of the directory entry for the file. Directory lengths are always a multiple of cluster size, and the length is inferred by the length of the chain of clusters, and the file length field in the directory entry is ignored.

Files are either regular files or directories. Directories are a sequence of 32-byte directory structures as described in the FAT documentation, with the differences:

In addition, the first directory entry of a directory is mandatorily always named ".." (0x2e twice) and is a pointer to the parent directory. The only field of significance of this entry is the cluster field, and all other fields are ignored. For the root directory, this entry points back to the root directory (the value 5).

Like the true FAT file system, filenames are 8+3 style and blank padded on the right. The file system is case-insensitive, but filenames are stored all upper case. Other than for the first directory entry, which contains ".." followed by 9 blanks (0x20), the only characters allowed in a filename are "A" through "Z" and "0" through "9" and "~". The "~" character is reserved to the file system and may substitute for characters other than in the permitted set. (Note that the FAT documentation talks only of what can be stored in the directory structure and is silent on what characters can be in a filename.)

The FAT file system is "little endian", but we are fortunately writing for a little endian architecture and do not need to reverse the byte order going to and from the disk image. The program class/endian.c tests for endian-ness and also the length of various data types. It is convenient to work with 32 bit integers, however, the C language specification does not guarantee any particular bit lengths except that byte is 8 bits, and each "larger" type has at least as many bits as does a "smaller" type.

The greatest of care assumes nothing, and types such as u_int32 are introduced for working programmers that need to know how many bits are in their integer types. Aside from that, types long and longlong have been introduced, and there are several models as to which of these maps to 64 bits and which to 32 bits.

Creating an empty filesystem

Your filesystem will reside inside a regular file in the normal file system. An initialized file system can consist of a sequence of 1024 1024-byte blocks, all zero-filled. You can create such a file using:
    dd if=/dev/zero of=mydisk.dmg bs=1024 count=1024

To view the contents of your "virtual disk", use the command:

    hexdump -C mydisk.dmg
See man hexdump.

To read and write to the virtual disk, you can use fopen to open the file, and lseek to move to the cluster (or byte) you wish to read or write. As an option, you can try "memory-mapped I/O". See man mmap, using a zero value for both the desired mapping address and the MAP_FIXED flag. This will, for instance, allow you to connect an array of type uint_32 fat[1024] to the 1024-th byte of the mapping, and thereby allow access to the FAT table using simple array syntax. (uint_32 is whatever typedef you need to force the use a 32-bit integer — C language does not assure any specific bit-size for short, int, or double!)

FUSE and the Hello World filesystem

Read the documentation on FUSE

and get the hello-world file system working on FUSE.

The class/proj4 directory has a helloworld_filesystem.c and various make targets that should implement the hello-world file system on your Ubuntu installation. You will need the FUSE developer tools installed, and there is a make target to help you with the install. Compile and run. Then ls the fusemount directory and cat the fusemount/hello file. If this all works, then you have accomplished this step.

Test Driven Development

I will try to make TDD files available, and a test driver, to help you target what a basic implementation should accomplish. To be done ...

Hints

The longest journey begins with the first step. It is often hard to figure out how to begin an implementation. The idea is to begin by implementing as little as possible, compiling it, and testing it. It is often necessary to implement "provisional code" that you know will be thrown away later in the development process.

As a possible plan for implementation:

You should know by now that the successful developer:

Update history: