Next: The BSP Programming Model Up: An Adaptive, Fault-tolerant Implementation Previous: An Adaptive, Fault-tolerant Implementation

Introduction

Volunteer computing is a form of distributed computing that seeks to make it as easy as possible for as many people as possible to donate computer time towards solving a parallel problem. By maximizing potential worker pool size and minimizing setup time, volunteer computing makes it possible to build very large networks of workstations very quickly, and creates many exciting new possibilities, including not only global true volunteer systems, such as distributed.net [4] (which was used to crack the RC5-56 challenge), but also local forced volunteer systems for use within institutions that have so far lacked the necessary expertise and time to pool their existing workstations to provide inexpensive supercomputing facilities for their computational needs [17].

In the past two years, there has been a particular surge of interest in Java-based volunteer computing systems, which would allow Internet users to join a parallel computation by simply visiting a web page and letting their standard web browser run a Java applet. Compared to existing network-of-workstations (NOW) software such as PVM [10] or MPI [12], such volunteer systems would be orders-of-magnitude easier to setup. Instead of having to spend weeks setting up accounts and installing software on potentially thousands of machines, one can setup an institution-wide NOW literally overnight by simply asking employees to point their browsers to a particular intranet site and to leave their browsers running before they go home or while they work. Moreover, Java's platform-independence allows programmers to worry less about generating code for many different architectures, and concentrate more on the applications themselves.

While using such Java-based systems should be very easy, however, developing the infrastructure for them is not as straightforward. A major research problem today is that of choosing an appropriate programming model. By enabling anyone to volunteer any kind of computer at any time, Java-based - and particularly browser-based - volunteer systems create a heterogeneous and dynamic environment for which currently popular parallel programming models such as shared memory and MPI-style message-passing are not well-suited. In basic MPI, for example, programs are written assuming that the number of physical processors is known and fixed; there is little or no support for situations where nodes leave or join the system at unexpected times. In general, a good programming model for volunteer computing systems has to be adaptively parallel and fault-tolerant. It cannot depend on static information on the number of nodes in the system and their individual processing speeds. At the same time, it must be able to tolerate data loss or corruption due not only to random hardware and software faults, but also to malicious saboteurs pretending to be volunteers but submitting erroneous results.

In this paper, we present a programming model based on the Bulk Synchronous Parallel (BSP) model [23,20], and show how it can be used in Java-based volunteer computing systems. We begin by presenting the BSP programming model, and then show how we have implemented it using the Bayanihan volunteer computing framework [17,18,19], taking advantage of Bayanihan's existing adaptive parallelism and fault-tolerance mechanisms. We also describe the new programming interface and show how it can be used to write realistic parallel applications. We conclude by presenting some current results, and discussing related, ongoing, and future work.

Next: The BSP Programming Model Up: An Adaptive, Fault-tolerant Implementation Previous: An Adaptive, Fault-tolerant Implementation

Luis Sarmenta
1/19/1999