Lecture 10 (Monday, February 7, 2000)

Return to the Lecture notes Index

Lecture 10 (Monday, February 7, 2000)

Interprocess Communication (IPC)

Motivation:

Data transfer
Sharing data
Event notification
Resource sharing
Process control

Why not threads?

Tasks may be on different machines
Robustness/availability may require different address spaces
Watchdog jobs must be independent from watched processes
Source code may be unavailable, so tasks can’t be converted to threads
Constrained growth of stack space

Why not just shared memory?

Very little protection among threads implies vulnerability
Source code might be required to convert tasks to threads within a task
Generally unavailable, except to processes running on the same host

Universal IPC Facilities

(Review: Signals and pipes were required for Project #1)

Signals – a.k.a software interrupts

No data – just occurance of signal, which can represent an event
Limited breadth in describing events – typically only 31 signals (4 byte mask)
Asynchronous
Handler operates similarly to the unexpected invocation of a function.
Signals only received on return from system call (or context-switch) – fortunately, there are plenty
Originally designed for exceptions
Early UNIXs used signals (SIGPAUSE, SIGCONT) for process synchronization

Pipes

Unstructured messages (concatenates writes) – hard to separate messages
Traditional/BSD: Unidirectional FIFO based on filesystem I-node and circular buffer (typically 4K)
SYSVR4 pipes: Bidirectional – 2 FIFOS based on Streams (layered device driver) interface
Reader typically blocks on empty
Writer typically blocks on full
Can’t broadcast to multiple receivers (read always removes)
If reading from multiple writes, no way of knowing sender
Processes must have pipe’s entry in system open file table (anonymous pipe), or use named pipe (actualt file system directory entry used for naming)

Ptrace

Basic level of support for debuggers to trace children
Ptrace (cmd, pid, addr, data)
Cmd examples: read/write addr space or registers, intercept signals, set watchpoints, terminate, pause, &c
Set-uid disabled or don’t survive exec to prevent evil things (consider tracing a program and replacing parms to an exec with tcsh – root shell)
Exec_()’s generate a SIGTRAP so parent can regain control
Ptrace() can’t trace grandchildren, just children
Massive context-switch overhead – movement of data from child to parent is via kernel space

Sockets

One machine or two machines or many machine (broadcast)
SOCK_STREAM: Unformatted, reliable (connection-oriented, typically TCP)
SOCK_DGRAM: Formatted, unreliable (connectionless, typically UDP)
More during networking

System V IPC

Semaphores, message queues, shared memory,

Common elements

Key – user supplied ID for instance of resource (eg which semaphore or which queue)
Creator – who created the resource
Owner – current owner of resource (initially creator, but may be changed by creator, owner, or super-user)
Permissions – file system-like permissions r/w/e user/group, &c

Implementation

Fixed-sized resource table for each resource (Danger – can run out)
Each entry contains ipc_perm (key, creator, owner, perms) & resource specific & sequence number
Sequence number is like a generation number – inc’d with reuse
Id returned on create = seq * table_size + index
Kernel discovers index = id % table_size
Created with semget(), shmget(), or msgget()
Flags on get IPC_CREAT (create), IPC_EXCL (exclusive), IPC_RMID (deallocate), IPC_STAT (get status information), IPC_SET (set status information)
Danger – unless IPC_RMID is stays allocated – even if all users gone

Mechanisms

Semaphores – usual operations
Message queues
Shared memory

Message Queues

Like PIPE, but more flexible – discrete messages/boundaries preserved (like diff between TCP and UDP)
FIFO
Big messages can be expensive – 2 copies – into and out of kernel
No broadcast mechanism
Other than perms on queue, no way to limit recipient of particular message – any legal reader

Shared Memory

Maps same storage into two different processes’ address spaces
More about implementation during discussion of VM later in semester
Fastest – no copy and no context switch (after init’d)
No provided synchronization
No provided protocol for use
Most UNIX variants (included SYSVR4) provide mmap() – similar, but maps file through VMM

SYVR4 Streams

More flexible than IPC, but can function as IPC and is used to implement some IPC facilties

This is a very brief overview – streams are reasonably intricate
Originally developed by Ritchie to provide structured way to implement device drivers in layers and allow for reuse
Now used to implement device drivers, terminal drivers, and IPC constructs like pipes
(SYSVR4 is based on streams)
Also used for TCP/IP and other networking stacks (very natural -- we’ll see why).
Each layer contains read and write queues for messages – can be prioritized
Layers are stacked
Head/top is usually user end
Bottom is usually device driver, but can be another stream
Upstream is flow toward head
Downstream is flow toward user.
Each module in-between can be viewed as a smart filter
Modules can be mixed and matched and reused
Can be multiplexed (consider use for broadcast/multiple receivers/multiple senders)
Supports “virtual copying” (shared data) among modules

/proc File System

Originally intended to replace ptrace() and support debugging
Now in most implementations, ptrace() is implemented via /proc
One directory under /proc for each process, name is PID
Not real file system – just interface
Each PID directory contains subdirectories for a representative LWP

status – r/o status info PID, PGID, SID, size and location of stack, heap, &c (struct pstatus)
psinfo – r/o anything viewable by the ps command, duplicates some info in status (struct psinfo)
ctl – w/o perform control operations (wait, run, kill, wait until stopped, stop on exit to syscall)
map – r/o description of virtual address space (where on core or backing store)
as – r/w map of virtual address space – change by lseek and write
sigact – r/o information about signals: mask, handlers, &c (struct sigaction)
pcred – effective, real, saved UIDs and GIDs (struct pcred)
object – directory, one entry for each mapped object (ex memory mapped files).
lwp – subdirectory info about each lwp in process. Each subdir contains lwpstatus, lwpsinfo, lwpctl (same as above, but for individual LWPs)

Mach IPC

Mach is an operating system based on a microkernel architecture. This means that many of the josb that are typically part of the operating system's kernel are actually user processes. This makes an interesting application for IPC. It is in fact the case that IPC is necessary for operating system components implemented as user processes to interact with each other. In many ways, IPC is part of the foundation for a microkernel based operating system -- not the other way around. The file system, pager, memory management, &c are all implemented as user-level tasks outside of the kernel -- they interact with the kernel via the Mach IPC

Some key goals of the designers of Mach IPC included:

Efficient support for messages varying in size from a few bytes to many gigabytes
Protection should be fine-grained and strongly enforced
Support for user-kernel and user-user communication
Communication among processes on different hosts should function as does communications among processes on a single host.

In the Mach model, data is formed into messages. These messages are then passed among processes. A process receives a message at a port. A port is a queue of messages.

Ports

Each port's queue has a finite capacity. When the queue is full the senders block; when it is empty, the receivers block. Senders must hold capabilities to access a port. These capabilities come in two flavors: read and write. Many processes may have write capabilities to a port, but only one may have read capability.

In the context of Mach, a capability is a name for a port that is unique within a process's space. Two different processes may have two different capabilities representing the same port. Capabilities are reference counted.

There are some special ports:

task_self: a port that is used to send messages to the kernel on behalf of the task
thread_self: similar to task_self, but for individual threads
task_notify: a port that is used to receive messages from the kernel
reply: receives results form system calls and RPC calls.
exception: receives notification of exceptions

Backup Ports

Ports can have backup ports. If a port is deallocated (freed) and messages are sent to this deallocated port, the backup port will recieve them.

Port Sets

Port sets implement similar functionality to the UNIX select() function. One important difference is that port sets do not suffer a performance degredation if there are many ports in the set -- access time is constant. One can view a port set as a common queue for several ports. Since the message itself contains the original destination, the intended recipient can be discovered.

Messages

Messages contain the data that is being sent from process to process and the metadata needed to transport and interpret it. The actually user data may be contained within the message, or it may be referenced in shared memory. Small amounts of data contained within the message itself are known as in-line memory. Larger amounts of data that are only referenced within the message are known as out-of-line memory. Out-of-line memory is shared using a copy-on-write approach. The memory is shard by both tasks, until either tasks write to it, at which time a private copy of the page is made.

Messages may be sent (msg_send), received (msg_recv), or sent, when a reply is expected (msg_rpc). msg_rpc() is typically used to implement remote procedure calls.

type - ordinary or complex. Ordinary is simple data. complex may require some type of translation or other special treatment like out-of-line memory.
size - size of entire message, including header
destination port - name of port that will receive message
reply port - if there are result, send them here
message id - not necessary, name assigned by user program

Type descriptor

name - more properly a type. Ex: internal memory, rights, byte, 16-bit integer, string, real, &c
size - size of data item
number - how many data items (of type size)
flags - in-line, out-of-line, &c