The Bayanihan Master-Worker Based Implementation

Next: Adaptive Parallelism and Fault-Tolerance Up: Implementing the BSP Model Previous: Implementing the BSP Model

The Bayanihan Master-Worker Based Implementation

BSP's structure makes BSP relatively easy to implement in Java-based volunteer computing systems. Since processes are independent within supersteps, we can package the work to be done in a superstep as separate work packets and farm them out to volunteer worker nodes using a master-worker system. Figure 3 shows how we have implemented BSP using the existing master-worker model of the Bayanihan framework [19]. As shown, BSP processes are implemented as BSPWork objects and placed in a work pool on the server. Volunteer worker clients each have a work engine which runs in a loop, repeatedly making remote method calls via HORB [14] (a distributed object library similar to Sun's RMI [22]) to its corresponding advocate on the server side to request more work. Each advocate forwards these calls to the work manager, which looks into the work pool, and returns the next available uncompleted work object.

**Figure 3:** Implementing BSP on top of the Bayanihan master-worker framework.
$\begin{figure} \centerline{\PSbox{BSParch.eps hoffset=-18 voffset=-564}{4.25in}{3.15in}} \end{figure}$

When the work engine receives the new work object, it calls the work object's bsp_run() method, which starts the local computation phase of the superstep indicated by the restorestep field. As bsp_run() executes, communication requests are logged in MsgQueue fields, and the values of shared variables are saved in the bspVars field, a hash table that stores objects under String-type names. Execution continues until bsp_sync() is called, at which point restorestep is incremented and control returns to the work engine. The work engine then sends the whole work object - including bspVars, the MsgQueue's, and restorestep - back to the server, which in turn marks the original work object ``done'', and stores the returned work object as its result.

When all work objects are done, the work manager notifies the problem object, and the global communication and barrier synchronization phases begin. The BSPProblem object goes through the result work objects, and performs all the global communication operations by moving data from one work object to another. The remote memory access operations (bsp_get() and bsp_put()) are done by reading and writing to the work objects' bspVars fields. Similarly, the message-passing-style bsp_send() operation is performed by enqueuing the message data into the destination node's inQ field, so that in the local computation phase of the next superstep, they can be retrieved by calling bsp_recv(). After all the communication operations have been performed, each work object in the work pool is replaced by its updated version, and control returns to the work manager. The next superstep begins as the work manager starts giving workers new work objects wherein restorestep indicates the new superstep, bspVars contains updated variable values, and inQ contains new incoming messages.

In addition to the work objects in the work pool, the server also has a special main work object, which is like other work objects, except that it is not farmed out to workers, but is run on the server itself. In place of bsp_run(), the server calls the bsp_main() method. As described in Sect. 4.1, the main work object can be used for such tasks as spawning new work objects, coordinating other work objects, collecting and displaying results, and interacting with users.

Next: Adaptive Parallelism and Fault-Tolerance Up: Implementing the BSP Model Previous: Implementing the BSP Model

Luis Sarmenta
1/19/1999