Mini-FAT Project

by: burt rosenberg
at: university of miami
date: nov 2018
NAME
    minifat
    
SYNOPSIS	
    minifat -s -d _directory_ [-cv] _virtual-disk_
    
DESCRIPTION
    Uses FUSE to attach the file system implemented in the minifat code to the
    named directory, with the virtual data file _virtual-disk_.

OPTIONS
   The options -s and -d are passed to FUSE, and are manditory.

    -d The directory to attach.
    -s Single threaded
    -c create the file _virtual-disk_, or zero if already exists.
    -v verbose output
          
    When zeroing the virtual disk, or running without the -c option,
    verify that the _virtual-disk_ is 1,048,576 bytes exactly, and begins
    with the character sequence "MINIFAT 1.0\0".    
        
HISTORY
    FAT/FUSE was introduced in CSC521 2006-2007; MiniFat was introduced 2012-2014; 
    and further simplified in 2017.
    
BUGS


US Patent US4905110 A

Goals

You are to write a simple file system which will use a regular file as the "virtual disk". The structure of the file system is based on FAT, simplified, and called mini-FAT. In order that you can work as a file system driver but not in the kernel, Linux has a technology called FUSE — Filesystem in User SpacE. Built into the kernel, FUSE will redirect file system calls to a program you write and run as a background job as a regular user program. Your program will take those redirected calls and act upon the data in the "virtual disk" file to carry out the request.

Mini-FAT description

The Mini-FAT filesystem is a simplification of the FAT file system. The FAT filesystem, although simple and robust, has a lot of complications, which don't add to the conceptual understanding of filesystems, although are very instructive to the true nature of software technology in the context of the real world. That is to say, it is surprisingly messy, and the half of it is untold in the official documentation.

The mini-FAT file system makes the following simplifications to FAT (see the FAT documentation for concepts):

✶ There are 1024 bytes in a cluster, and 1024 clusters in the file system.

✶ The super-block occupies cluster 0 and contains the string "MINIFAT 1.0" followed by all nulls.

✶ The meta-data region, consists of only the FAT table, and occupies clusters 1 through 4.

✶ The table is an array or 1024 32-bit integers, indexed by cluster number.

✶ The content region, which contains files and directories, which occupies clusters 5 through 1023.

✶ Cluster 5 always contains the first cluster of the root directory.

✶ The special values in the Fat table are: 0, the cluster is free; and -1, this is the last cluster in the chain.

✶ The FAT table bad cluster mark is not supported.

✶ Directory filename byte-zero values of 0xE5 and 0x05 are not supported.

✶ Directory filename byte-zero value 0x00 is used to mark an empty but possibly non-last directory entry.

✶ The directory entry is modified, dropping some fields and simplifying others.

✶ We do not support Microsoft's patented solution for long file names, clever though it is.

✶ Zero sized files and directories have dir_firstcluster equal to zero. This is actually per official documentation. -->

Directory Entry Format

The format of the directory entry is given in the following table.

MiniFat Directory Entry
name offset  size description
dir_name0118+3 filename. Empty entry if first byte is 0x0. (see notes)
dir_attr1110x0 for a regular file, 0x10 for a directory file.
dir_empty1212unused
dir_firstcluster244cluster number of first cluster in file
dir_filesize284size of file in bytes, if a regular file.

✶ The minifat directory format is a simplication of the FAT directory format. Each directory is a sequence of directory entires. Each directory entry is either in use or empty. The directory can be a chain of clusters, and any cluster in the chain can contain only empty directories, for instance, after a sequence of deletions.

✶ Filesize is ignored for directories. The length of a directory is the total length of the clusters in the cluster chain.

✶ Like the true FAT file system, filenames are 8+3 style and blank padded on the right. The file system is case-insensitive, but filenames are stored all upper case.

✶ An empty directory entry is indicated by a 0 (0x0) as the first character of the file name. (The FAT option of using a different mark to indicate an empty directory entry but other non-empty directory entries follow, is not supported.)

✶ A filename can only contain the characters "A" through "Z", "0" through "9", and "~". The "~" character is reserved to the file system and may substitute for characters other than in the permitted set.

✶ Files have no owner, permissions, or dates. Minifat may report the files are owned by root, have all permissions, and dates of unix time zero.

✶ Fields dir_firstcluster and dir_sizesize are little endian.

✶ Empty directories have dir_firstcluster equal to 0.

✶ The . and .. entries are not supported, but their behavior must be simulated. This is made easier in that FUSE always gives full directory paths.

Getting started with FUSE

FUSE is a kernel technology that passes file requests to a listener program. The program attaches to a directory, and file requests that descend into that directory are passed on to that program for the program to process and respond. It allows us to hook our minifat file system into the operating system.

User programs that are requesting files in your file system will be redirected in the kernel to FUSE, and FUSE will send those requests to your program, minifat, which will be running continuously. The requests are made through callbacks to functions loaded into FUSE through the presentation of a function vector — an array of functions.

How this will work in your program is that minifat.c will implement the read and write calls, changing the contents of the fatdisk.dmg file, such that it looks like a FAT file system is mounted on the mount point.

FUSE and Files

User-program <-----> Kernel <-----> FUSE <-----> minifat.c <-----> fatdisk.dmg

            filesystem                 FUSE callbacks     filesystem

To get started with FUSE, explore class/examples/fuse-hw. The Makefile has an install target that you run to install the FUSE development package. The test target will compile and run the hello-world file system. This is a phoney file system containing a single file in the root directory with a fixed content.

The documentation on FUSE is scarce. Here are some resources:

Getting started with Memory Mapped files

While it would be more realistic if the project required that your code handle the reading and writing to our "disk" block-by-block, that is tedious. You could use lseek to move around the file as if it were a the head of a block device and then read entire clusters. But you can also use memory-mapped IO. Since the "disk" is small (1024 clusters of 1024 bytes for a 1M disk) you can just map the file into virtual memory and read and write the file as if it were memory.

This will give you experience with memory-mapped IO, which might prove useful.

Look at class/examples/mmap-mf for an example program that shows how to set up memory-mapped IO, how to coerce the FAT table structure and Dirent structures onto the memory, at the (possibly) correct locations.

SSH

In virtual box, change the network from NAT to Bridged Adapter and reboot. Run ifconfig once you are again logged into the virtual image, and note the IP addresss. For instance:
ojo@csc421:~$ ifconfig
enp0s3    Link encap:Ethernet  HWaddr 08:00:27:cb:8c:17  
          inet addr:10.0.1.14  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fecb:8c17/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:732 errors:0 dropped:0 overruns:0 frame:0
          TX packets:537 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:74149 (74.1 KB)  TX bytes:148893 (148.8 KB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:176 errors:0 dropped:0 overruns:0 frame:0
          TX packets:176 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:13296 (13.2 KB)  TX bytes:13296 (13.2 KB)
The thing you care about is the 10.0.1.14. Mileage may vary. Get the number that your run of ipconfig provides. On your host run:
 ssh _username_@10.0.1.14
This should log you into your virtual machine as if it were a remote host.

Screen

The program screen allows you to run multiple windows. This allows minifat to run in the foreground so you can see diagnostic output while in another window you test the fuse mounted directory or hexdump the virtual disk, to see what is going on. You can also log in multiple times using ssh, but this means setting up an ssh demon on your host and possibly modifying your networking options.

To use screen type "screen" and accept with the space bar. All screen commands start with control-A.

When all created screens are killed, screen exits. Screen is a bit strange in that after running screen there isn't a real change that you can see. Type "screen" then ^A c to create a new screen. Then use ^A ^A to flip back and forth between the screens. This will give one screen to run minifat in the foreground and observe it diagnostics message, and another screen to exercise the file system, running ls and so forth.

See: www.gnu.org, screen.

Project Approach

This is a larger programming assignment. Properly breaking the code construction is essential.

  1. First you need to understand FUSE and memory mapped files. Examples programs are in the class/examples directory, for your experimentation.
  2. In stage one development of the file system, focus on the FAT table and the root directory. Implement only the creating, deleting, and listing of files in the root directory. Ignore reading and writing to any file. Claim any file to be size 0, and discard any write requests without complaint.
  3. The root directory is a file of sorts, and working with the root directory is a good warm-up for files. Divide further this stage to a adding and removing entries in a fixed size root directory. Then implement how to expand and reduce the cluster chain of the root directory. This includes reclaiming clusters when the root directory size is reduced.
  4. In stage two, implement reading and writing of files. Begin with the truncate fuseop. It is a subtask to writing, it can be written and tested in isolation of any other operation and it exercises much of the algorithmic complications of file reading and writing.
  5. Truncate can either reduce the size of a file, or increase it. To reduce it, you should release and free unneeded clusters. To increase, you must claim and link new clusters, zero filling both the new clusters, and the slack space that might exist in the cluster at the old end of the cluster chain.
  6. The operations read and write are probably best co-developed, although you could develop a simplified write first using hexdump to watch the effects directly in the virtual disk.
  7. Simplified versions of read and write to consider: at first ignore offset and always read and write starting at offset 0; only consider files of length 1024 or less. Once these are running you can start to implement further behavior.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 8 nov 2015
update: 4 nov 2018