Interprocess Communication

We want high-level support for client-server and group communication between processes, and help with naming and locating processes.

Network services offer stream or datagram delivery. Streams are handy for producer-consumer relationships, but the more common message passing communications typical in distributed systems are more datagram-like.

Building Blocks

Mapping data structures to messages: flattening (serializing). To support different architectures, data values are converted to an agreed upon external form (such as XDR) for transmission.

Marshalling involves flattening and external data conversion of data for transmission; unmarshalling is done at receipt. Can generate marshalling operations automatically from a specification of the data structure (e.g. RPCL).

Send and Receive

Each operation is given a message destination (communication channel); send is given a message (containing the flattened parameters), receive returns a message.

Synchronous and asynchronous communication

Synchronous: sender and receiver synchronize on every message. Send and receive are blocking.

Asynsynchronous: send is non-blocking. Receive may or may not block. Blocking receive works nicely with threads: one thread blocks while others can continue. Non-blocking receive more complex to code for.

Blocking reception usually includes a timeout option, if indefinite waiting (appropriate in some demand-driven servers) is not desired (to handle lost messages or client crashes). Difficult to select proper timeout period.

Message Destinations

IP addresses are not location-independent. Mach, Chorus, and Amoeba use location-independent "ports" for message destinations. Routing software converts them to lower level addresses: servers can change location without telling clients.

Ports may or may not have queues, accept messages from many senders, but have only one receiving process. In Mach and Chorus a port can be moved from one process to another.

Reliability

An unreliable message is not acknowledged or retried (e.g. UDP), and on an internetwork may be lost, duplicated, delivered out of order or delayed. Processes handle errors themselves. On a LAN, unreliable messages may be dropped but not reordered or duplicated.

Reliability layered on unreliable delivery by using acknowledgements: positive for client-server, negative for multicasting. Overhead: state needed at client and server, extra messages sent, latency introduced.

Message identifiers are needed to provide reliability. Two parts: a requestId or sequence number, and sender's address (for replies).

Client-Server Communication

Request-reply protocols are usually synchronous. Using send and receive, 4 system calls needed. Using request-reply protocols, can reduce overhead to 3 system calls, possibly provide delivery guarantees: (Fig 4.5)
  • DoOperation ( port, message, reply )
  • GetRequest ( port, message )
  • SendReply ( port, message )
  • Delivery failure assumptions:

  • messages can be dropped
  • networks can become partitioned
  • processes can fail
  • messages are NOT corrupted (checksum)
  • DoOperation uses a timeout.

    RPC exchange protocols have differing failure semantics:

    R - request
    no return values, no ack, no waiting
    RR - request-reply
    server's reply is the ack; next request is ack to server
    RRA - request-reply-acknowledge
    ack has requestId to ack all lower-numbered replies

    Timeouts could just abort DoOperation, but the operation may have succeeded, and the reply lost. Usually, after timeout the request is retried several times before giving up.

    Discarding duplicate requests at the server prevents the operation be performed multiple times (e.g., when timeout less than round-trip time).

    Lost reply messages would cause another request to be sent, and server may have to re-execute the request to generate a reply. If server operations are idempotent they can be applied one or more times with the same result (e.g., add-to-set is, append isn't).

    A history of replies can be kept at the server to avoid re-execution after lost replies. Can keep just last message from each client, use next request as ack to previous reply, or use RRA.

    Multipacket messages occur when passing messages > the network's MTU. Can ack entire message or each packet (latter gives flowcontrol).

    Group Communication

    Multicast messages are handy:
  • fault-tolerance based on replicated servers
  • locating objects in distributed services
  • better performance through replication (updates are multicast)
  • multiple update
  • Atomicity

    To ensure all replicated servers get all requests, need atomic multicast: a message is either received by all members or none. (Failed processes lose their membership.)

    Reliable multicast is fine if we don't need to talk to all servers, for example, when we only need reply from one.

    Ordering

    Atomic and reliable multicasts preserve message order between any process pair. Replicated servers may all need to handle requests in the same order, even with multiple clients (Fig 4.8). Totally ordered multicast: all messages (from all senders) are received in the same order at all members of the group. This can be expensive. Less strict orderings can be cheaper (e.g, causal ordering).

    Implementation

    Unreliable multicast (using unreliable send):
    for each member of the group send ( member, message )

    Efficiency can be improved using LAN features.

    Reliability suffers from lost messages and sender failures.

    Monitoring of group members can detect failed processes, and remove them from groups. Needed for atomic multicast.

    For a reliable multicast, sender waits for ack from each member before returning to caller. Can retransmit, detect and remove failed processes.

    To handle sender failures partway through the multicast, members monitor the sender to see if multicast completes. If sender failure detected, another member takes over to complete the multicast. Receivers can use the sender's next message to indicate completion of previous multicast (implies no concurrent multicasting). This is inefficient.

    The communications handler can hold-back messages, not delivering them to processes until ordering and atomicity requirements are met.

    Negative acknowledgement can reduce the number of messages needed. If each multicast message has a sequence number, recipients can detect lost messages by noticing a gap in the sequence numbers, and request retransmission: senders therfore need to keep a history. To allow members to take over for failed senders (to ensure atomicity), receivers also need a history. Occasional positive acks can keep history size down.

    Totally ordered atomic multicast assigns a unique, totally ordered id to each message. A message is stable at a member if no message with a lower id is expected to arrive. Processes recieve only stable messages, in the order indicated by the id.

    Assume we can generate identifiers that are globally (and totally) ordered. Members hold back messages until they are stable: if no gap in ids, a message can be delivered immediately, else must be held back until the missing message(s) arrive.

    Generating global, totally ordered identifiers:

  • timestamps from logical or physical clocks
  • a sequencer process contacted before each multicast
  • some protocol among members (Isis ABCAST)
  • Case study: IPC in UNIX