Daniel S. Nussbaum. Run-Time Thread Management for Large-scale Multiprocessors. Ph.D. Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September 1993. Also available as MIT/LCS Technical Report 596.
(pdf, compressed postscript)


Effective thread management is crucial to achieving good performance on large-scale distributed-memory multiprocessors that support dynamic threads. For a given parallel computation with some associated task graph, a thread-management algorithm produces a running schedule as output, subject to the precedence constraints imposed by the task graph and the constraints imposed by the interprocessor communications network. Optimal thread management is an NP-hard problem, even given full a priori knowledge of the entire task graph and assuming a highly simplified architecture abstraction. Thread management is even more difficult for dynamic data-dependent computations which must use online algorithms because their task graphs are not known a priori. This thesis investigates online thread-management algorithms and presents XTM, an online thread-management system for large-scale distributed-memory multiprocessors. XTM has been implemented for the MIT Alewife Multiprocessor. Simulation results indicate that XTM's behavior is robust, even when run on very large machines. XTM makes the thread-management problem more tractable by splitting it into three sub-problems:

  1. determining what information is needed for good thread management, and how to efficiently collect and disseminate that information in a distributed environment,
  2. determining how to use that information to match runnable threads with idle processors, and
  3. determining what interprocessor communication style XTM should use.
XTM solves these sub-problems as follows:
  1. Global information is collected and disseminated using an X-Tree data structure embedded in the communications network. Each node in the X-Tree contains a "presence bit," the value of which indicates whether there are any runnable threads in the sub-tree headed by that node. On a machine with a sufficiently high, balanced workload, the expected cost of maintaining these presence bits is proved to be asymptotically constant, regardless of machine size.
  2. The presence bit information, along with a combining process aided by the X-Tree, is used to match threads to processors. This matching process is shown to be eight-competitive with an idealized adversary, for a two-dimensional mesh network.
  3. A message-passing communication style yields fundamental improvements in efficiency over a shared-memory style. For the matching process, the advantage is shown to be a factor of logl, where l is the distance between an idle processor and the nearest runnable thread.
Asymptotic analyses of XTM's information distribution and thread distribution algorithms are given, showning XTM to be competitive with idealized adversaries. While the solutions to the sub-problems given above have provably good characteristics, it is difficult to say anything concrete about their behavior when combined into one coherent system. Simulation results are therefore used to confirm the validity of the analyses, with the Alewife Multiprocessor as the target machine.

Back to Alewife Publications.
Back to Alewife. $Date: 1998/01/06 16:49:48 $