File System in Linux
Today we discussed two common Linux file systems, ext2 and ext3. Ext2 is, in many ways, a very traditional UNIX file system. Ext3 is essentially an extension that uses a journal to allow for faster consistency checking at boot. The journal does, of course, have significant forensics value.To review what we discussed, please check out these materials, please first check out the "generic" introduction to traditional UNIX file systems from my 15-123 class. Then, check out the linked resources for specifics of ext2 and ext3.
A Quick, Generic View of Traditional UNIX File Systems
The operating system maintains two data structures representing the state of open files: the per-process file descriptor table and the system-wide open file table.
When a process calls open(), a new entry is created in the open file table. A pointer to this entry is stored in the process's file descriptor table. The file descriptor table is a simple array of pointers into the open file table. We call the index into the file descriptor table a file descriptor. It is this file descriptor that is returned by open(). When a process accesses a file, it uses the file descriptor to index into the file descriptor table and locate the corresponding entry in the open file table.
The open file table contains several pieces of information about each file:
- the current offset (the next position to be accessed in the file)
- a reference count (we'll explain below in the section about fork())
- the file mode (permissions),
- the flags passed into the open() (read-only, write-only, create, &c),
- a pointer to an in-RAM version of the inode (a slightly light-weight version of the inode for each open file is kept in RAM -- others are on disk), and a structure that contains pointers to all of the .
- A pointer to the structure containing pointers to the functions that implement the behaviors like read(), write(), close(), lseek(), &c on the file system that contains this file. This is the same structure we looked at last week when we discussed the file system interface to I/O devices.
Each entry in the open file table maintains its own read/write pointer for three important reasons:
- Reads by one process don't affect the file position in another process
- Write are visible to all processes, if the file pointer subsequently reaches the location of the write
- The program doesn't have to supply this information each call.
One important note: In modern operating systems, the "open file table" is usually a doubly linked list, not a static table. This ensures that it is typically a reasonable size while capable of accomodating workloads that use massive numbers of files.
Session Semantics
Consider the cost of many reads or writes may to one file.
- Each operation could require pathname resolution, protection checking, &c.
- Implicit information, such as the current location (offset) into the file must be maintained,
- Long term state must also be maintained, especially in light of the fact that several processes using the file might require different view.
Caches or buffers may need to be initialized The solution is to amortize the cost of this overhead over many operations by viewing operations on a file as within a session. open() creates a session and returns a handle and close() ends the session and destroys the state. The overhead can be paid once and shared by all operations.
Consequences of Fork()ing
In the absence of fork(), there is a one-to-one mapping from the file descriptor table to the open file table. But fork introduces several complications, since the parent task's file descriptor table is cloned. In other words, the child process inherits all of the parent's file descriptors -- but new entries are not created in the system-wide open file table.
One interesting consequence of this is that reads and writes in one process can affect another process. If the parent reads or writes, it will move the offset pointer in the open file table entry -- this will affect the parent and all children. The same is of course true of operations performed by the children.
What happens when the parent or child closes a shared file descriptor?
- remember that open file table entries contain a reference count.
- this reference count is decremented by a close
- the file's storage is not reclaimed as long as the reference count is non-zero indicating that an open file entry to it exists
- once the reference count reaches zero, the storage can be reclaimed
- i.e., "rm" may reduce the link count to 0, but the file hangs around until all "opens" are matched by "closes" on that file.
Why clone the file descriptors on fork()?
- it is consistent with the notion of fork creating an exact copy of the parent
- it allows the use of anonymous files by children. The never need to know the names of the files they are using -- in fact, the files may no longer have names.
- The most common use of this involves the shell's implementation of I/O redirection (< and >). Remember doing this?
The Ext2 file System: In practice, The Basic Linux File System
Please check out the following links for the details specific to the ext2 file system, which has been the workhorse of Linux systems for many years, and with whcih the ext3 file system is backward compatible.