TDS - Borowsky-Gafni Simulation Algorithm

A New Look at the Borowsky-Gafni Simulation Algorithm

Consider a read/write asynchronous shared memory system. In [1], Borowsky and Gafni describe an algorithm that allows a set of f+1 processes, any f of which may exhibit stopping failures, to ``simulate'' a larger number n of processes, also with at most f failures. This simulation algorithm is used in [1] to convert an arbitrary k-fault-tolerant n-process solution for Chaudhuri's k-set-agreement problem [2] into a wait-free k+1-process solution for the same problem. Since the k+1-process k-set-agreement problem has been shown to have no wait-free solution, this transformation implies that there is no k-fault-tolerant solution to the n-process k-set-agreement problem, for any n. Other applications of the simulation algorithm appear in other papers by Borowsky and Gafni.
These initial examples suggest that the Borowsky-Gafni simulation can become a powerful tool for proving solvability and unsolvability results for fault-prone asynchronous systems. However, in order for this to happen, it must be clear exactly what the simulation guarantees. Borowsky and Gafni's presentation is brief and informal, and does not include a careful specification of what the algorithm provides to its users. In fact, the emphasis in [1] is mainly on the application to the k-set-agreement problem rather than on the simulation itself.
We began this research with the modest aim of stating and proving precise correctness guarantees for the Borowsky-Gafni simulation algorithm, using the I/O automaton model and standard proof techniques (invariants, simulation relations, composition, etc.). However, the job turned out to be more difficult than we expected, because the description in [1] left some ambiguities that we needed to resolve. The final product of our work is a complete and careful description of a version of the Borowsky-Gafni simulation algorithm, plus a careful description of what it accomplishes, plus a proof of correctness.
In order to specify what the simulation accomplishes, we define a notion of "fault-tolerant reducibility" between decision problems, and show that the algorithm implements this reducibility, in a precise sense. Although this notion of reducibility is quite natural, it is specially tailored to the Borowsky-Gafni simulation algorithm; it does not seem suitable as a general notion of reducibility between decision problems. We give some examples of pairs of decision problems that do and do not satisfy this reducibility. For example, the n-process k-set-agreement problem is f-reducible to the n'-process k'-set-agreement problem if k >= k' and f <= min(n, n'). (The particular case where n = k + 1 and f = k = k' was used in [1].) On the other hand, these problems are not reducible if k <= f < k'. Moreover, we only know trivial instances of the renaming problem that satisfy the reducibility. The moral is that one must be careful in applying the simulation -- it does not work for all pairs of problems, but only those that satisfy the reducibility.
We present and verify the algorithm in terms of I/O automata . The presentation has a great deal of interesting modularity, expressed by I/O automaton composition and both forward and backward simulation relations . Composition is used to include a "safe agreement" module, a simplification of one in [1], as a subroutine. Forward and backward simulation relations are used to view the algorithm as implementing a "multi-try snapshot" strategy. The most interesting part of the proof is the safety argument, which is handled by the forward and backward simulation relations; once that is done, the liveness argument is straightforward.
Submitted for journal publication.
ISTCS96 Conference version
MIT/LCS/TM-573.
[1] E. Borowsky and E. Gafni. Generalized FLP impossibility result for t-resilient asynchronous computations,'' in Proceedings of the 1993 ACM Symposium on Theory of Computing, May 1993.
[2] S. Chaudhuri. Agreement is harder than consensus: set consensus problems in totally asynchronous systems, in Proceedings of the Ninth Annual ACM Symposium on Principles of Distributed Computing}, pages 311--234, August 1990.

TOC / LCS / MIT
Last modified: Thu Jan 15 16:01:53 1998 Comments?


TOC / LCS / MIT Last modified: Thu Jan 15 16:01:53 1998	Comments?