Tips 45


The FAT is a database that associates clusters of disk space with files. It has one entry for each cluster. The first two entries contain information about the FAT. The third entry and those following are assigned to clusters of disk space, starting with the first cluster that is available for use by files. 


FAT entries may contain a few special values to indicate that the cluster is free--that is, not in use by a file (0000H for FAT16); the cluster contains one or more sectors that are physically damaged and should not be used (FFF7H for FAT16); and the cluster is the final cluster in a file (FFF8-FFFFH for FAT16). For any cluster that is used by a file but is not the file's last cluster, the FAT entry contains the number of the next cluster used by the file.


Each directory, whether the root directory or a subdirectory, is also a database. A DOS directory has one main entry for each file (Windows 95 uses additional entries for long filenames). Unlike the FAT, where each entry has a single field, the entries for a file in a directory have multiple fields (see Figure 1). Some of the fields--name, extension, size, date, and time--can be displayed with the DIR command. But the FAT system relies on a field that DIR does not display--the number of the first cluster assigned to the file.


When a program asks the operating system to provide the contents of a file, the OS looks at the directory entry for the file to find the first cluster. It then looks at the FAT entry for that cluster to find the next cluster in the chain. Repeating this process until it encounters the last cluster for the file, the OS can compute exactly which clusters belong to the file and the order in which they fall. In this way, the OS can provide a program with any part of the file the program asks for. This way of organizing a file is called the FAT chain.


Under the FAT system, files are always allocated an integral number of clusters. On a 1.2GB hard disk with clusters of 32K, a text file containing only the words "hello, world" may have a directory listing of just 12 bytes, but the file will actually take up 32K of disk space. The unused part of the cluster is called slack. Small files may have slack equal to almost an entire cluster; on average, a file will have half a cluster of slack.


On an 850MB hard disk with 16K clusters, with a typical file size of about 50K, about 16 percent of the disk space allocated to files will be wasted in unused but allocated space. One of the ways a disk compression scheme, such as DriveSpace's, creates space is by recycling the disk's slack space for use by other files


The original FAT file system in DOS 1 used 12-bit FAT entries. (The 12-bit FAT system is still in use today for floppy disks.) DOS 2, which added support for hard disks, shifted to 16-bit FAT entries. The problem that arose in 1987 had its root in the low-level operating system calls to retrieve a disk sector, which involved passing a 16-bit value as a parameter. There are 65,536 distinct 16-bit values; multiplying 65,536 by the 512 bytes in a sector yields 32MB. With regard to sectors on the disk, that was as far as the operating system could count. What DOS 4.0 did was change the low-level sector-handling to use 32-bit parameters.


The current 2GB limit is not actually an operating system problem; it is a problem with existing programs. A disk whose size is 2GB or larger would have clusters of 64K, or 65,536 bytes (see Figure 2). But the largest value that can fit into 16 bits is 65,535; 64K is too large. Microsoft discovered that many programs simply assumed that the number of bytes per cluster fits into 16 bits.


Even if the problem didn't exist at 2GB, the operating system would still balk at disks larger than 4GB. Under FAT16, the OS itself stores the number of sectors per cluster in a single byte in the disk parameter block that is set up by the OS. This number must be a power of 2, and it must be less than 256. Thus, the largest possible cluster size would be 128 sectors, or 64K. So even without the 2GB limit, FAT16 could still accommodate only disks under 4GB.


With the introduction of FAT32, both the FAT entries and the sector numbering are now 32-bit. Here's what that means for you: 4,294,967,296 distinct 32-bit values multiplied by 512 bytes per sector yields a whopping 2 terabytes (2,199,023,255,552 bytes) as the maximum possible disk size under FAT32.