Lecture 10 (Monday, February 7, 2000)
Return to the Lecture notes Index
Lecture 10 (Monday, February 7, 2000)
Interprocess Communication (IPC)
Motivation:
- Data
transfer
- Sharing
data
- Event
notification
- Resource
sharing
- Process
control
Why not threads?
- Tasks
may be on different machines
- Robustness/availability
may require different address spaces
- Watchdog
jobs must be independent from watched processes
- Source
code may be unavailable, so tasks can’t be converted to threads
- Constrained
growth of stack space
Why not just shared memory?
- Very
little protection among threads implies vulnerability
- Source
code might be required to convert tasks to threads within a task
- Generally
unavailable, except to processes running on the same host
Universal IPC Facilities
(Review: Signals and pipes were required for Project #1)
Signals – a.k.a software interrupts
- No
data – just occurance of signal, which can represent an event
- Limited
breadth in describing events – typically only 31 signals (4 byte mask)
- Asynchronous
- Handler
operates similarly to the unexpected invocation of a function.
- Signals
only received on return from system call (or context-switch) –
fortunately, there are plenty
- Originally
designed for exceptions
- Early
UNIXs used signals (SIGPAUSE, SIGCONT) for process synchronization
Pipes
- Unstructured
messages (concatenates writes) – hard to separate messages
- Traditional/BSD:
Unidirectional FIFO based on filesystem I-node and circular buffer
(typically 4K)
- SYSVR4
pipes: Bidirectional – 2 FIFOS based on Streams (layered device driver)
interface
- Reader
typically blocks on empty
- Writer
typically blocks on full
- Can’t
broadcast to multiple receivers (read always removes)
- If
reading from multiple writes, no way of knowing sender
- Processes
must have pipe’s entry in system open file table (anonymous pipe), or use
named pipe (actualt file system directory entry used for naming)
Ptrace
- Basic
level of support for debuggers to trace children
- Ptrace
(cmd, pid, addr, data)
- Cmd
examples: read/write addr space or registers, intercept signals, set
watchpoints, terminate, pause, &c
- Set-uid
disabled or don’t survive exec to prevent evil things (consider tracing a
program and replacing parms to an exec with tcsh – root shell)
- Exec_()’s
generate a SIGTRAP so parent can regain control
- Ptrace()
can’t trace grandchildren, just children
- Massive
context-switch overhead – movement of data from child to parent is via
kernel space
Sockets
- One
machine or two machines or many machine (broadcast)
- SOCK_STREAM:
Unformatted, reliable (connection-oriented, typically TCP)
- SOCK_DGRAM:
Formatted, unreliable (connectionless, typically UDP)
- More
during networking
System V IPC
Semaphores, message queues, shared memory,
Common elements
- Key –
user supplied ID for instance of resource (eg which semaphore or which
queue)
- Creator
– who created the resource
- Owner
– current owner of resource (initially creator, but may be changed by
creator, owner, or super-user)
- Permissions
– file system-like permissions r/w/e user/group, &c
Implementation
- Fixed-sized
resource table for each resource (Danger – can run out)
- Each
entry contains ipc_perm (key, creator, owner, perms) & resource
specific & sequence number
- Sequence
number is like a generation number – inc’d with reuse
- Id
returned on create = seq * table_size + index
- Kernel
discovers index = id % table_size
- Created
with semget(), shmget(), or msgget()
- Flags
on get IPC_CREAT (create), IPC_EXCL (exclusive), IPC_RMID (deallocate),
IPC_STAT (get status information), IPC_SET (set status information)
- Danger
– unless IPC_RMID is stays allocated – even if all users gone
Mechanisms
- Semaphores
– usual operations
- Message
queues
- Shared
memory
Message Queues
- Like
PIPE, but more flexible – discrete messages/boundaries preserved (like
diff between TCP and UDP)
- FIFO
- Big
messages can be expensive – 2 copies – into and out of kernel
- No
broadcast mechanism
- Other
than perms on queue, no way to limit recipient of particular message – any
legal reader
Shared Memory
- Maps
same storage into two different processes’ address spaces
- More
about implementation during discussion of VM later in semester
- Fastest
– no copy and no context switch (after init’d)
- No provided
synchronization
- No
provided protocol for use
- Most
UNIX variants (included SYSVR4) provide mmap() – similar, but maps file
through VMM
SYVR4 Streams
More flexible than IPC, but can function as IPC and is used to implement
some IPC facilties
- This
is a very brief overview – streams are reasonably intricate
- Originally
developed by Ritchie to provide structured way to implement device drivers
in layers and allow for reuse
- Now
used to implement device drivers, terminal drivers, and IPC constructs
like pipes
- (SYSVR4
is based on streams)
- Also
used for TCP/IP and other networking stacks (very natural -- we’ll see
why).
- Each
layer contains read and write queues for messages – can be prioritized
- Layers
are stacked
- Head/top
is usually user end
- Bottom
is usually device driver, but can be another stream
- Upstream
is flow toward head
- Downstream
is flow toward user.
- Each
module in-between can be viewed as a smart filter
- Modules
can be mixed and matched and reused
- Can
be multiplexed (consider use for broadcast/multiple receivers/multiple
senders)
- Supports
“virtual copying” (shared data) among modules
/proc File System
- Originally
intended to replace ptrace() and support debugging
- Now
in most implementations, ptrace() is implemented via /proc
- One
directory under /proc for each process, name is PID
- Not
real file system – just interface
- Each
PID directory contains subdirectories for a representative LWP
- status
– r/o status info PID, PGID, SID, size and location of stack, heap,
&c (struct pstatus)
- psinfo
– r/o anything viewable by the ps command, duplicates some info in status
(struct psinfo)
- ctl
– w/o perform control operations (wait, run, kill, wait until stopped,
stop on exit to syscall)
- map
– r/o description of virtual address space (where on core or backing
store)
- as
– r/w map of virtual address space – change by lseek and write
- sigact
– r/o information about signals: mask, handlers, &c (struct
sigaction)
- pcred
– effective, real, saved UIDs and GIDs (struct pcred)
- object
– directory, one entry for each mapped object (ex memory mapped files).
- lwp
– subdirectory info about each lwp in process. Each subdir contains
lwpstatus, lwpsinfo, lwpctl (same as above, but for individual LWPs)
Mach IPC
Mach is an operating system based on a microkernel architecture. This means
that many of the josb that are typically part of the operating system's kernel
are actually user processes. This makes an interesting application for IPC. It
is in fact the case that IPC is necessary for operating system components
implemented as user processes to interact with each other. In many ways, IPC is
part of the foundation for a microkernel based operating system -- not the
other way around. The file system, pager, memory management, &c are all
implemented as user-level tasks outside of the kernel -- they interact with the
kernel via the Mach IPC
Some key goals of the designers of Mach IPC included:
- Efficient support for
messages varying in size from a few bytes to many gigabytes
- Protection should be
fine-grained and strongly enforced
- Support for user-kernel and
user-user communication
- Communication among
processes on different hosts should function as does communications among
processes on a single host.
In the Mach model, data is formed into
messages. These messages are
then passed among processes. A process receives a message at a port. A port is a queue of messages.
Ports
Each port's queue has a finite capacity. When the queue is full the senders
block; when it is empty, the receivers block. Senders must hold capabilities to
access a port. These capabilities come in two flavors: read and write. Many
processes may have write capabilities to a port, but only one may have read
capability.
In the context of Mach, a capability
is a name for a port that is unique within a process's space. Two different
processes may have two different capabilities representing the same port.
Capabilities are reference counted.
There are some special ports:
- task_self: a port that is
used to send messages to the kernel on behalf of the task
- thread_self: similar to
task_self, but for individual threads
- task_notify: a port that is
used to receive messages from the kernel
- reply: receives results
form system calls and RPC calls.
- exception: receives
notification of exceptions
Backup Ports
Ports can have backup ports. If
a port is deallocated (freed) and messages are sent to this deallocated port,
the backup port will recieve them.
Port Sets
Port sets implement similar
functionality to the UNIX select() function. One important difference is that
port sets do not suffer a performance degredation if there are many ports in
the set -- access time is constant. One can view a port set as a common queue
for several ports. Since the message itself contains the original destination,
the intended recipient can be discovered.
Messages
Messages contain the data that is being sent from process to process and the
metadata needed to transport and interpret it. The actually user data may be
contained within the message, or it may be referenced in shared memory. Small
amounts of data contained within the message itself are known as in-line memory. Larger amounts of
data that are only referenced within the message are known as out-of-line memory. Out-of-line
memory is shared using a copy-on-write approach. The memory is shard by both
tasks, until either tasks write to it, at which time a private copy of the page
is made.
Messages may be sent (msg_send),
received (msg_recv), or sent, when a reply is expected (msg_rpc). msg_rpc() is
typically used to implement remote procedure calls.
- type - ordinary or complex. Ordinary is simple
data. complex may require some type of translation or other special
treatment like out-of-line memory.
- size - size of entire
message, including header
- destination port - name of
port that will receive message
- reply port - if there are
result, send them here
- message id - not necessary,
name assigned by user program
Type descriptor
- name - more properly a type.
Ex: internal memory, rights, byte, 16-bit integer, string, real, &c
- size - size of data item
- number - how many data items
(of type size)
- flags - in-line,
out-of-line, &c