General Programming Concepts: Writing and Debugging Programs

JFS2 File Space Allocation

File space allocation is the method by which data is apportioned physical storage space in the operating system. The kernel allocates disk space to a file or directory in the form of logical blocks. A Logical block refers to the division of a file or directory contents into 512, 1024, 2048, or 4096 byte units. When a JFS2 file system is created the logical block size is specified to be one of 512, 1024, 2048, or 4096 bytes. Logical blocks are not tangible entities; however, the data in a logical block consumes physical storage space on the disk. Each file or directory consists of 0 or more logical blocks.

Full and Partial Logical Blocks

A file or directory may contain full or partial logical blocks. A full logical block contains 512, 1024, 2048, or 4096 bytes of data, depending on the file system block size specified when the JFS2 file system was created. Partial logical blocks occur when the last logical block of a file or directory contains less than file system block size of data.

For example, a JFS2 file system with a logical block size of 4096 with a file of 8192 bytes is two logical blocks. The first 4096 bytes reside in the first logical block and the following 4096 bytes reside in the second logical block. Likewise, a file of 4608 bytes consists of two logical blocks. However, the last logical block is a partial logical block containing the last 512 bytes of the file's data. Only the last logical block of a file can be a partial logical block.

JFS2 File Space Allocation

The default block size is 4096 bytes. You can specify smaller block sizes with the mkfs command during a file system's creation. Allowable fragment sizes are: 512, 1024, 2048, and 4096 bytes. You can use only one blocks size in a file system. See JFS File System Layout for more information on the file system structure.

The kernel allocates disk space so that only the last file system block of data receive a partial block allocation. As the partial block grows beyond the limits of its current allocation, additional blocks are allocated.

Block reallocation also occurs if data is added to logical blocks that represent file holes. A file hole is an "empty" logical block located prior to the last logical block that stores data. (File holes do not occur within directories.) These empty logical blocks are not allocated blocks. However, as data is added to file holes, allocation occurs. Each logical block that was not previously allocated disk space is allocated a file system block of space.

Additional block allocation is not required if existing data in the middle of a file or directory is overwritten. The logical block containing the existing data has already been allocated file system blocks.

JFS tries to maintain contiguous allocation of a file or directory's logical blocks on the disk. Maintaining contiguous allocation lessens seek time because the data for a file or directory can be accessed sequentially and found on the same area of the disk. The disk space required for contiguous allocation may not be available if it has already been written to by another file or directory.

The file system uses a bitmap called the block allocation map to record the status of every block in the file system. When the file system needs to allocate a new block, it refers to the block allocation map to identify which blocks are available. A block can only be allocated to a single file or directory at a time.

Extents

An extent is a sequence of contiguous file system blocks allocated to a JFS2 object as a unit. Large extents may span multiple allocation groups.

Every JFS2 object is represented by an i-node. I-nodes contain the expected object-specific information such as time stamps, file type (regular verses directory etcetera.) They also contain a B+ tree to record the allocation of extents.

A file is allocated in sequences of extents. An extent is a contiguous variable-length sequence of file system blocks allocated as a unit. An extent may span multiple allocation groups. These extents are indexed in a B+ tree.

There are two values needed to define an extent, the length and the address. The length is measured in units of the file system block size. 24-bit value represents the length of an extent, so an extent can range in size from 1 to 2²⁴ -1 file system blocks. Therefore the size of the maximum extent depends on the file system block size. The address is the address of the first block of the extent. The address is also in units of file system blocks, it is the block offset from the beginning of the file system.

An extent based file system combined with user-specified file system block size allows JFS2 to not have seperate support for internal fragmentation. The user can configure the file system with a small file system block size (such as 512 bytes) to minimize internal fragmentation for file systems with large numbers of small size files.

In general, the allocation policy for JFS2 tries to maximize contiguous allocation by allowing a minimum number of extents, with each extent as large and contiguous as possible. This allows for larger I/O transfer resulting in improved performance. However in some cases this is not always possible.

B+ Trees

This section describes the B+ tree data structure used for file layout. The discussion shows how generic B+ tree concepts have been adapted specifically for JFS2; it is not a tutorial on B+ tree data structure.

B+ trees were selected to help with performance of reading and writing extents, the most common operations JFS2 will have to do.

An extent allocation descriptor (xad_t structure) describes the extent and adds two more fields that are needed for representing files: an offset, describing the logical the logical byte address the extent represents, and a flags field. The xad_t structure is defined in /usr/include/j2/j2_xtree.h.

An xad structure describes two abstract ranges:

The physical range of disk blocks. This starts at file system block number addressXAD(xadp) address and extends for lengthXAD(xadp) file system blocks.
The logical range of bytes within a file. This starts at byte number offsetXAD(xadp)*(file system block size) and extends for lengthXAD(xadp)*(file system block size.)

The physical range and the logical range are both the same number of bytes long. Note that offset is stored in units of file system block size (example, a value of 3) in offset means 3 file system blocks, not three bytes. Extents within a file are always aligned on file system block size boundaries.

There will be one generic B+ tree index structure for all index objects in JFS2 except for directories. The data being indexed will depend on the object. The B+ tree is keyed by the offset of the xad of data being described by the tree. The entries are sorted by the offsets of the xad structures. An xad structure is an entry in a node of a B+ tree.

The bottom of the second section of a disk inode contains a data dscriptor which tells what is stored in the second half of the inode. The second half of the inode could contain in-line data for the file if it is small enough. If the file data won't fit in the in-line data space for the inode it will be contained in extents and the inode will contain the root node of the B+ tree. The header will indicate how many xad are in use and how many are available. Generally, the inode will contain 8 xad structures for the root of the B+ tree. If there are 8 or fewer extents for the file, then these 8 xad structures are also a leaf node of the B+ tree. They will describe the extents. Otherwise the 8 xad structures in the inode will point to either the leaves or internal nodes of the B+ tree.

Once all of the available xad structures in the inode are used, the B+ tree must be split. We will allocate 4K of disk space for a leaf node of the B+ tree. A leaf node is logically an array of xad entries with a header. The header points to the first free xad entry in the node, all xad entries following that one are also not allocated. The 8 xad entries are copied from the inode to the leaf node, the header is initialized to point to the 9th entry as the first free entry. Then we will update the root of the B+ tree into the first xad structure of the inode; this xad structure will point to the newly allocated leaf node. The offset for this new xad structure will be the offset of the first entry in the leaf node. The header in the inode will be updated to indicate that now only 1 xad is being used for the B+ tree. The header in the inode also needs to be updated to indicate the inode now contains the pure root of a B+ tree.

As new extents are added to the file, they will continue to be added to the same leaf node in the necessary order. This will continue until the leaf node fills. Once the node fills a new 4K of disk space will be allocated for another leaf node of the B+ tree. The second xad structure from the inode will be set to point to this newly allocated node.

This will continue until all 8 xad structures in the inode are filled, at which time another split of the B+ tree will occur. This split will create internal nodes of the B+ tree which are used purely to route the searches of the tree. This will allocate 4K of disk space for an internal node of the B+ tree. An internal node looks the same as a leaf node. The 8 xad entries are copied from the inode to the internal node, the header is initialized to point to the 9th entry as the first free entry. Then it will update the root of the B+ tree by making the first xad structure of the inode point to the newly allocated internal node. The header in the inode will be updated to indicate that now only 1 xad is being used for the B+ tree.

The file /usr/include/j2/j2_xtree.h describes the header for the root of the B+ tree in struct xtpage_t. The file /usr/include/j2/j2_btree.h describes the header for an internal node or a leaf node in struct btpage_t.

Related Information

Working with JFS i-nodes

Working with JFS2 i-nodes

File Creation and Removal

Using File Descriptors

JFS File System Layout