Return to lecture notes index
Septembr 19, 2013 (Lecture 8)

File System in Linux

Today we discussed two common Linux file systems, ext2 and ext3. Ext2 is, in many ways, a very traditional UNIX file system. Ext3 is essentially an extension that uses a journal to allow for faster consistency checking at boot. The journal does, of course, have significant forensics value.

To review what we discussed, please check out these materials, please first check out the "generic" introduction to traditional UNIX file systems from my 15-123 class. Then, check out the linked resources for specifics of ext2 and ext3.

A Quick, Generic View of Traditional UNIX File Systems

The operating system maintains two data structures representing the state of open files: the per-process file descriptor table and the system-wide open file table.

When a process calls open(), a new entry is created in the open file table. A pointer to this entry is stored in the process's file descriptor table. The file descriptor table is a simple array of pointers into the open file table. We call the index into the file descriptor table a file descriptor. It is this file descriptor that is returned by open(). When a process accesses a file, it uses the file descriptor to index into the file descriptor table and locate the corresponding entry in the open file table.

The open file table contains several pieces of information about each file:

Each entry in the open file table maintains its own read/write pointer for three important reasons:

One important note: In modern operating systems, the "open file table" is usually a doubly linked list, not a static table. This ensures that it is typically a reasonable size while capable of accomodating workloads that use massive numbers of files.

Session Semantics

Consider the cost of many reads or writes may to one file.

The solution is to amortize the cost of this overhead over many operations by viewing operations on a file as within a session. open() creates a session and returns a handle and close() ends the session and destroys the state. The overhead can be paid once and shared by all operations.

Consequences of Fork()ing

In the absence of fork(), there is a one-to-one mapping from the file descriptor table to the open file table. But fork introduces several complications, since the parent task's file descriptor table is cloned. In other words, the child process inherits all of the parent's file descriptors -- but new entries are not created in the system-wide open file table.

One interesting consequence of this is that reads and writes in one process can affect another process. If the parent reads or writes, it will move the offset pointer in the open file table entry -- this will affect the parent and all children. The same is of course true of operations performed by the children.

What happens when the parent or child closes a shared file descriptor?

Why clone the file descriptors on fork()?

The Ext2 file System: In practice, The Basic Linux File System

Please check out the following links for the details specific to the ext2 file system, which has been the workhorse of Linux systems for many years, and with whcih the ext3 file system is backward compatible.