The Topological Approach to Asynchronous Computability

Nir Shavit

Today, the computer industry is very good at making computers run faster: Speeds double roughly every two years. Eventually, however (and perhaps as early as the turn of the century), fundamental limitations, such as the speed of light or heat dissipation, will make further speed improvements increasingly difficult. Beyond that point, the most promising way to make computers more effective is to have many processors working in parallel, the approach known as multiprocessing.

The hard part of multiprocessing is getting the individual computers to coordinate effectively with one another. As a typical coordination problem, if two computers, possibly far apart, both try to reserve the same airline seat, care must be taken that exactly one of them succeeds. Coordination problems arise at all scales in multiprocessor systems---at a very small scale, processors within a single supercomputer might need to allocate resources, and at a very large scale, a nation-wide distributed system, such as an ``information highway,'' might need to allocate communication paths over which large quantities of data will be transmitted.

Coordination is difficult because multiprocessor systems are inherently asynchronous: Processors can be delayed without warning for a variety of reasons, including interrupts, pre-emption, cache misses, and communication delays. These delays can vary enormously in scale: A cache miss might delay a processor for fewer than ten instructions, a page fault for a few million instructions, and operating system pre-emption for hundreds of millions of instructions. Any coordination protocol that does not take such delays into account runs the risk that a sudden delay of one process in the middle of a coordination protocol may leave the others in a state where they are unable to make progress.

The need for effective coordination has long been recognized as a fundamental aspect of multiprocessor architectures. As a result, modern processors typically provide hardware mechanisms that facilitate coordination. Until recently, these mechanisms were chosen in an ad hoc fashion, but it is becoming increasingly clear that some kind of mathematical theory is needed if the implications of such fundamental design choices are to be understood.

Our research focuses on new mathematical techniques for analyzing and evaluating common hardware synchronization primitives. Aside from its inherent interest to the computer science community, we believe this work may be of interest to the mathematical research community because it establishes a (perhaps unexpected) connection between asynchronous computability and a number of well-known results in combinatorial topology.

In many multiprocessor systems, processors communicate by applying certain operations, called synchronization primitives, to variables in a shared memory. These primitives may simply be reads and writes, or they may include more complex constructs, such as test-and-set, fetch-and-add, or compare-and-swap.

Over the years, computer scientists have proposed and implemented a variety of different synchronization primitives, and their relative merits have been the subject of a lively debate. Most of this debate has focused on the ease of implementation and ease of use of the primitives. More recently, however, it has emerged that some synchronization primitives are inherently more powerful than others, in the sense that every synchronization problem that can be solved by primitive A can also be solved by primitive B, but not vice-versa. Our research is directed at developing the new conceptual tools that will make it possible to provide a rigorous analysis of the relative computational power of different synchronization primitives. This emerging theory could provide the designers of computer networks and multiprocessor architectures with mathematical tools for recognizing when problems are unsolvable, for evaluating alternative synchronization primitives, and for making explicit the assumptions needed to make a problem solvable.

Selected papers:

A Simple Constructive Computability Theorem ... (STOC 94) paper or slides.

The Asynchronous Computability Theorem ... (STOC 93) paper .

TOC / LCS / MIT
Last modified: Fri Nov 7 16:07:21 1997 Comments?


TOC / LCS / MIT Last modified: Fri Nov 7 16:07:21 1997	Comments?