One of our current application case studies involves the
identification of useful "building blocks" for efficient,
fault-tolerant distributed systems. These building blocks will
include a wide range of communication and memory abstractions, and
also a variety of special-purpose building blocks such as
transaction-processing modules, failure detectors, and resource
allocators. The specification of the building blocks will include
performance and fault-tolerance information as well as ordinary
correctness information.
Such building blocks should be useful to system designers, because
they will permit unambiguous specification of required system and
component behavior (including performance and fault-tolerance
properties), will support system modification, and will speed up the
design process by permitting reuse of software and preventing
expensive design errors.
Some of our specific projects are:
- Formal modelling, verification and performance analysis of
Lamport's Paxos
algorithm for distributed consensus in partially synchronous
systems.
- K-consensus problems in
asynchronous systems.
- A framework for understanding precedence-based
memory models, which generalize standard multiple processor memory
models.
- The definition of a notion of highly available
shared memory , plus methods for its efficient implementation in a
fault-prone distributed system.
- An exploratory distributed
implementation and evaluation of an eventually-serializable
data service.
- The specification of the
robust atomic read/write memory service using replicated
registers and dynamic quorum-acknowledged broadcasts.
- The definition of a new virtually synchronous
group communication service and the use of the service in
implementing globally totally ordered message service, and a load
balancing algorithm.
- The specification of a practical group
communication service providing a dynamic notion of a primary view
and the demonstration of how it can be used in an application.
- The specification of a group communication service
that integrates the notions of
dynamic primary view and configurations,
together with an implementation and an application.
- The group communication service of the Ensemble system.
- The identification of the main memory and communication
abstractions used in the implementation of the Orca system of
Kaashoek and
Tanenbaum. This work includes a verification of the
correctness of the Orca implementation .
- The development of data structures such as
counting networks and diffracting trees, to support efficient,
highly concurrent operations in a multiprocessor system. This project
includes [foundational work] on defining the type of coherence
guaranteed by such structures, and on defining suitable complexity
measures.
- The development of a theory for fault-tolerant atomic
objects, including formal definitions, various operations and
transformations that can be applied to them, and various rules for
proving correctness of implementations.
- The development of a comprehensive theory for
concurrency control and abort recovery algorithms for databases.
- Development of a flexible notion of
object for use in an object-oriented distributed system.
|