Eager Scheduling.

Next: Fault and Sabotage Tolerance. Up: Adaptive Parallelism and Fault-Tolerance Previous: Checkpointing and Process Migration.

Eager Scheduling.

The underlying Bayanihan master-worker framework provides a simple form of adaptive parallelism called eager scheduling [3,19]. As shown in Fig. 3 (Sect. 3.1), work objects in the work pool are stored in a circular list, with a pointer keeping track of the next available uncompleted work. As workers call getWork(), the pointer moves forward, assigning different work objects to different workers. Faster workers will tend to call getWork() more often, and naturally get a bigger share of the total work. This gives us a simple but effective form of dynamic load balancing. Moreover, since the list is circular, previously-assigned but uncompleted work can be reassigned to other workers. This ``eager'' behavior guarantees that slow workers do not cause bottlenecks - fast workers with nothing left to do will simply bypass slow ones, redoing work themselves if necessary. (Redundant results are simply ignored.) It also provides a basic form of crash-tolerance. If a worker crashes or quits, and leaves its work undone, for example, it is alright because the work will eventually be reassigned to another worker. In this way, computation can go on as long as at least one worker is still alive. In fact, even if all the workers crash or quit, the computation can continue as soon as a new worker becomes available.

Next: Fault and Sabotage Tolerance. Up: Adaptive Parallelism and Fault-Tolerance Previous: Checkpointing and Process Migration.

Luis Sarmenta
1/19/1999