Adaptive Parallelism.

Next: Fault-Tolerance. Up: Research Issues: Problems and Previous: Research Issues: Problems and

Adaptive Parallelism.

By their nature, volunteer networks are heterogenous and dynamic. Volunteer nodes can have different kinds of CPUs, and can join and leave a computation at any time. Even nodes with the same type of CPU cannot be assumed to have equal or constant computing capacities, since each can be loaded differently by external tasks (especially in systems which try to exploit users' idle times). For these reasons, models for volunteer computing systems must be adaptively parallel [23]. That is, unlike many traditional parallel programming models, they must not assume the existence of a fixed number of nodes, or depend on any static timing information about the system.

Various strategies for implementing adaptive parallelism have already been proposed and studied. In eager scheduling [24], packets of work to be done are kept in a pool from which worker nodes get any undone work whenever they run out of work to do. In this way, faster workers get more work according to their capability. And, if any work is left undone by a slow node, or a node that ``dies'', it eventually gets reassigned to another worker. Systems that implement eager schedule include Charlotte [24] and the factoring demo used in Sect.4.3.

The Linda model [25] provides an associative tuple-space that can be used to store both data and tasks to be done. Since this tuple-space is global and optionally blocking, it can be used both for communication and synchronization between parallel tasks. It can also serve as a work pool, which like in eager scheduling, can allow undone tasks to be redone. Linda was originally used in the Piranha [23] system, and more recently implemented in Java by WWWinda [26], Jada [27], and Javelin [28].

In Cilk [29], a task running on a node, A, can spawn a child task, which the node then executes. If another node, B, runs out of work while NodeA is still running the child task, it can steal the parent task from NodeA and continue to execute it. This work-stealing algorithm has been shown to be provably efficient and fault-tolerant. Cilk has been implemented for NOWs [30], and has been implemented using Java applications (but not applets) in ATLAS [31].

Project Bayanihan aims to develop a framework that does not limit the programmer to using only one form of adaptive parallelism, but instead makes it easy to implement any of these forms and even develop new ones.

Next: Fault-Tolerance. Up: Research Issues: Problems and Previous: Research Issues: Problems and

Luis Sarmenta
1/2/1998