fork()
In order to create a process, we make use of the fork() call. Tradionally, for is, itself, a system call. These days, the actual system call may, or may not be a fork(). But, this isn't important. What is important is that when a process calls fork(), it creates a nearly exact clone of itself. One process goes into the phone booth -- and two step out.The original process is known as the parent process. The new one is known as the child. The parent and the child are virtually alike. For our purposes, there are only two differences. The first difference is that they have differnt process ids (pids). A pid is just a number that the OS uses to identify a process. The second difference is that fork() returns a 0 to the child, but returns the pid of the child to the parent. This is convenient, because it allows the program, now running in two differnt processes, to do different things in each of the two processes.
The Exec Family
For example, it is not uncommon to want the child process to assume some other identity -- to, in effect, become a different program. this is done with one of the exec calls (man execve, execvl, execvp, &c). What an exec call does is to load, from disk, a new process image into the current process. So, the current process's memory is dumped and the new process's image is loaded into memory from its executible file.Take careful note, when an exec function is called, in the normal case, it doesn't return. This is beause the process has assumed a new identity -- the exec is gone. Most ofen, an exec faisl becuase the path to the executible is wrong and the executible can't be found.
The "l" versions of the functions, execl() and execlp() take the arguments in a long list, each of the new programs' arguments are passed separately as arguments to exec. It is very important to note that the last argument to exec() and execl() must be a NULL pointer -- otherwise, since it doesn't know how many arguments there are, when walking down the stack, it doesn't know when to stop.
The "v" version of the functions, execv() and execvp(), take the arguments in an array, mcuh like main gets its arguments via the argv[] array. Again, since the array has no length, it is important that the last entry be a NULL.
The "p" version of the functions, execvp() and execlp() will search the path for a matching executible, as compared to execv() and execl(), where the full path, e.g., /bin/ls, must be specified.
Lastly, the 0th argument should be the name of the program, not a "real" command-line argument. It doesn't have to match the executible name -- but normally does. It is passed in as argv[0] to the new program's main().
So, basically, the argv[] array that is passed into execl(), execlp(), &c, is the same as the argv[] array as you are accustomed to receiving within main(). execl() and execlp() take the same list of arguments, including the 0th and terminating NULL -- but take them flattened out, with each being passed as a separate argument to exec.
To really understand this, you'll need to read the man page, look at our example, and search the Web for another example or two.
wait() and waitpid()
The two function calls wait() and waitpid() are normally used to wait for a process to end. So, consider a UNIX shell. When you start up "vi" or "emacs", your shell waits until the editor eds before producing a new prompt. The shell waits by calling wait() or waitpid().Fork w/copy-on-writeIn either case, the wait call will block until the child process is done. Once the child is done, the wait call will return. By waiting for a child in this way, the parent also reaps the child. As we discussed earlier, the child will remain a zombie until reaped by a wait.
The wait calls give the caller back an integer that contains some information about the child's state. We're not going to worry about it here. Either pass in a pointer to a real interger, or a NULL. But, if you are curious, do a "man 2 wait". Notice the macros, such as WIFEXITED() that can be used to decode this status.
The wait() call will wait for any child. If it is desirable to wait for a particular child, the waitpid() call can be used. It will only wait for the child who's pid is specified. waitpid() can be made to wait for any child by passing in a pid of 0 or -1. Upon success, both forms of wait return the pid of the child they reaped.
waitpid() also has another argument that will become important for this lab -- the flags. If a flag of WNOHANG is specified, wait will not actually block. If there is an available zombie, it will collect its information and return its PID. If no child is currently available, the WNOHANG glag prefents wait from blocking. Instead, it will return -1 if there are no more children, or 0 if there are children -- but none are zombies.
Copying all of the pages of memory associated with a process is a very expensive thing to do. It is even more expensive considering that very often the first act of the child is to deallocate this recently created space.One alternative to a traditional fork implementation is called copy-on-write. the details of this mechanism won't be completely clear until we study memory management, but we can get the flavor now.
The basic idea is that we mark all of the parent's memory pages as read-only, instead of duplicating them. If either the parent or any child try to write to one of these read-only pages, a page-fault occurs. At this point, a new copy of the page is created for the writing process. This adds some overhead to page accesses, but saves us the cost of unnecessarly copying pages.
vfork()
Another alternative is also available -- vfork(). vfork is even faster, but can also be dangerous in the worng hands. With vfork(), we do not duplicate or mark the parent's pages, we simply loan them, and the stack frame to the child process. During this time, the parent remains blocked (it can't use the pages). The dangerous part is this: any changes the child makes will be seen by the aprent process.vfork() is most useful when it is immediately followed by an exec_(). This is because an exec() will create a completely new process-space, anyway. There is no reason to create a new task space for the child, just to have it throw it away as part of an exec(). Instead, we can loan it the parent's space long enough for it to get started (exec'd).
Although there are several (4) different functions in the exec-family, the only difference is the way they are parameterizes; under-the-hood, they all work identically (and are often one).
After a new task is created, the parent will often want to wait for it (and any siblings) to finish. We discussed the defunct and zombie states last class. The wait-family of calls is used for this purpose.
A Quick Overview of the File System from the OS Point of View
The operating system maintains two data structures representing the state of open files: the per-process file descriptor table and the system-wide open file table.
When a process calls open(), a new entry is created in the open file table. A pointer to this entry is stored in the process's file descriptor table. The file descriptor table is a simple array of pointers into the open file table. We call the index into the file descriptor table a file descriptor. It is this file descriptor that is returned by open(). When a process accesses a file, it uses the file descriptor to index into the file descriptor table and locate the corresponding entry in the open file table.
The open file table contains several pieces of information about each file:
- the current offset (the next position to be accessed in the file)
- a reference count (we'll explain below in the section about fork())
- the file mode (permissions),
- the flags passed into the open() (read-only, write-only, create, &c),
- a pointer to an in-RAM version of the inode (a slightly light-weight version of the inode for each open file is kept in RAM -- others are on disk), and a structure that contains pointers to all of the .
- A pointer to the structure containing pointers to the functions that implement the behaviors like read(), write(), close(), lseek(), &c on the file system that contains this file. This is the same structure we looked at last week when we discussed the file system interface to I/O devices.
Each entry in the open file table maintains its own read/write pointer for three important reasons:
- Reads by one process don't affect the file position in another process
- Write are visible to all processes, if the file pointer subsequently reaches the location of the write
- The program doesn't have to supply this information each call.
One important note: In modern operating systems, the "open file table" is usually a doubly linked list, not a static table. This ensures that it is typically a reasonable size while capable of accomodating workloads that use massive numbers of files.
Session Semantics
Consider the cost of many reads or writes may to one file.
- Each operation could require pathname resolution, protection checking, &c.
- Implicit information, such as the current location (offset) into the file must be maintained,
- Long term state must also be maintained, especially in light of the fact that several processes using the file might require different view.
Caches or buffers may need to be initialized The solution is to amortize the cost of this overhead over many operations by viewing operations on a file as within a session. open() creates a session and returns a handle and close() ends the session and destroys the state. The overhead can be paid once and shared by all operations.
Consequences of Fork()ing
In the absence of fork(), there is a one-to-one mapping from the file descriptor table to the open file table. But fork introduces several complications, since the parent task's file descriptor table is cloned. In other words, the child process inherits all of the parent's file descriptors -- but new entries are not created in the system-wide open file table.
One interesting consequence of this is that reads and writes in one process can affect another process. If the parent reads or writes, it will move the offset pointer in the open file table entry -- this will affect the parent and all children. The same is of course true of operations performed by the children.
What happens when the parent or child closes a shared file descriptor?
- remember that open file table entries contain a reference count.
- this reference count is decremented by a close
- the file's storage is not reclaimed as long as the reference count is non-zero indicating that an open file entry to it exists
- once the reference count reaches zero, the storage can be reclaimed
- i.e., "rm" may reduce the link count to 0, but the file hangs around until all "opens" are matched by "closes" on that file.
Why clone the file descriptors on fork()?
- it is consistent with the notion of fork creating an exact copy of the parent
- it allows the use of anonymous files by children. The never need to know the names of the files they are using -- in fact, the files may no longer have names.
- The most common use of this involves the shell's implementation of I/O redirection (< and >). Remember doing this?
Simple Fork()/execvp()/waitpid() Example
#include <sys/types.h> #include <sys/wait.h> #include <stdio.h> #include <errno.h> #include <unistd.h> #define EXEC_FAILED 1 int main(int argc, char *argv[]) { int status; int pid; char *prog_arv[4]; /* * Build argument list */ /* Remove the path or make the path correct for your system, as needed */ prog_argv[0] = "/usr/local/bin/ls"; prog_argv[1] = "-l"; prog_argv[2] = "/"; prog_argv[3] = NULL; /* * Create a process space for the ls */ if ((pid=fork()) < 0) { perror ("Fork failed"); exit(errno); } if (!pid) { /* This is the child, so execute the ls */ execvp (prog_argv[0], prog_argv); exit(EXEC_FAILED); } if (pid) { /* * We're in the parent; let's wait for the child to finish */ waitpid (pid, NULL, 0); /* Could also be wait(NULL); */ } }
Simple Fork()/execlp()/wait() Example
#include <sys/types.h> #include <sys/wait.h> #include <stdio.h> #include <errno.h> #include <unistd.h> #define EXEC_FAILED 1 int main(int argc, char *argv[]) { int status; int pid; char *prog_arv[4]; /* * Build argument list */ /* Remove the path or make the path correct for your system, as needed */ prog_argv[0] = "/usr/local/bin/ls"; prog_argv[1] = "-l"; prog_argv[2] = "/"; prog_argv[3] = NULL; /* * Create a process space for the ls */ if ((pid=fork()) < 0) { perror ("Fork failed"); exit(errno); } if (!pid) { /* This is the child, so execute the ls */ execlp (prog_argv[0], prog_argv[0], prog_argv[1], prog_argv[2], NULL); exit(EXEC_FAILED); } if (pid) { /* * We're in the parent; let's wait for the child to finish */ waitpid (NULL); /* Could also be waitpid(pid, NULL, 0); */ } }
Important Things To Remember