This work presents new algorithms for the Do-All problem that consists of performing $t$ tasks reliably in a message-passing synchronous system of $p$ fault-prone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be active as the result of failures.
The Do-All problem was introduced by Dwork, Halpern and Waarts~\cite{DHW}. They developed several algorithms and analyzed them in terms of the number of times each task is performed (including multiplicities) and the number of messages. De~Prisco, Mayer and Yung~\cite{DMY} developed an algorithm for the Do-All problem and analyzed it in terms of the available processor steps $S$ (this measure charges processors for each step they perform, not only for those steps in which tasks are executed), and in terms of the number of messages. The models of failures considered by prior solutions allow for up to $f < p$ stop-failures and do not accommodate processor restarts.
We present two new algorithms for the Do-All problem.
The first algorithm is tolerant of $f p$ stop-failures and it does not allow restarts. It has the available processor steps complexity $S = O((t + p\log p/\log\log p)\cdot \log f)$ and the message complexity $M = O(t + p\log p/\log\log p + f\cdot p)$. Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for large $f$, it has better $S$ complexity. This algorithm is used as the basis for another algorithm which tolerates any pattern of stop-failures {\em and restarts\/}. This new algorithm is the first solution for the Do-All problem that efficiently deals with processor restarts. Its available processor steps complexity is $S = O((t + p\log p + f)\cdot \min\{\log p,\log f\})$, and its message complexity is $M=O(t+p\cdot\log p+ f\cdot p)$, where $f$ is the number of failures.
![]() | |
|
TOC /
LCS /
MIT Last modified: Fri Nov 7 16:25:45 1997 |
Comments? |