Other Projects

From SuperTech Wiki

(Difference between revisions)
Jump to: navigation, search
Line 11: Line 11:
= <br>Deterministic Nonassociative Reducers =
= <br>Deterministic Nonassociative Reducers =
Cilk++ and Intel Cilk Plus support a type of parallel data structure, called a '''reducer (hyperobject)''', which allows workers to perform update operations to the data structure in parallel.  To accomplish this task, each worker maintains its own "view" of a reducer on which it performs its own update operations, and after these workers complete their parallel tasks, these views are combined using a "reduce" operation such that, if this reduce operation is associative, then the data structure that results from combining these views has serial semantics --- the reducer resulting from any parallel execution is equivalent to the reducer resulting from any serial execution of the Cilk program.  While reducers are very useful for parallelizing many programs, for some data types, such as floating-point numbers on which additive updates are performed, no associative reduce operation is apparent.  We propose that a runtime mechanism called '''pedigrees''' be built into the runtime system in order to support '''deterministic (nonassociative) reducers''', a variant of reducers that guarantees that all executions of the program will yield a reducer data structure with serial semantics.  In particular, in this work we design deterministic reducers using a particular pedigree mechanism, with the goal of developing the most efficient design possible.  Such deterministic reducers allow programmers to write parallel programs that use data types such as floating-point numbers and still behave deterministically, regardless of the underlying nondeterminism of the Cilk scheduler.
Cilk++ and Intel Cilk Plus support a type of parallel data structure, called a '''reducer (hyperobject)''', which allows workers to perform update operations to the data structure in parallel.  To accomplish this task, each worker maintains its own "view" of a reducer on which it performs its own update operations, and after these workers complete their parallel tasks, these views are combined using a "reduce" operation such that, if this reduce operation is associative, then the data structure that results from combining these views has serial semantics --- the reducer resulting from any parallel execution is equivalent to the reducer resulting from any serial execution of the Cilk program.  While reducers are very useful for parallelizing many programs, for some data types, such as floating-point numbers on which additive updates are performed, no associative reduce operation is apparent.  We propose that a runtime mechanism called '''pedigrees''' be built into the runtime system in order to support '''deterministic (nonassociative) reducers''', a variant of reducers that guarantees that all executions of the program will yield a reducer data structure with serial semantics.  In particular, in this work we design deterministic reducers using a particular pedigree mechanism, with the goal of developing the most efficient design possible.  Such deterministic reducers allow programmers to write parallel programs that use data types such as floating-point numbers and still behave deterministically, regardless of the underlying nondeterminism of the Cilk scheduler.
 +
 +
= <br>Efficient SAT Solver =
 +
SAT solving and its variants (such as MaxSAT and SMT) has applications in many fields in industry. However, the state-of-the-art of SAT solving has experienced a 10-year period of no breakthrough, and there are currently no principled way to parallelize SAT solver. In this project we experiment with new ways to use the conflict-driven-clause-learning (CDCL) technology, and try to find provably efficient ways to parallelize a CDCL solver, for example using a '''splitters''' hyperobject. Splitters are hyperobject introduced in Frigo, et al.'s 2009 paper "Reducers and Other Cilk++ Hyperobjects", which allows the workers to modify a global data structure in parallel, as long as the modification do not merge. Besides above topics, it is also necessary to investigate the bounds on the possible speed up we can get from parallelizing a solver, the sound way to evaluate a solver, and pragmatic issues as how to adjust the solver to various industry formulas.

Revision as of 17:45, 24 June 2011

Contents


Location-Based Memory Fences

Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream --- it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms.

Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread only when a different thread attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, which incurs higher overhead than the proposed hardware mechanism would. Even though applications using the software prototype implementation do not scale as well compared to the traditional memory fences due to the software overhead, our experiments show that applications can benefit from using location-based memory fences. These results suggest that a hardware support for location-based memory fences is worth considering.


Deterministic Parallel Random-Number Generation

Existing concurrency platforms for dynamic multithreading, such as Cilk and TBB, do not provide repeatable parallel random-number generators. We propose that a mechanism called pedigrees be built into the runtime system to enable efficient deterministic parallel random-number generation, and in this work we design an efficient variant of the pedigree mechanism. Experiments with the MIT Cilk runtime system show that the overhead for this mechanism is minimal. On a suite of 10 benchmarks, the relative overhead of Cilk with pedigrees to the original Cilk has a geometric mean of 2%. We also explore library implementations of several deterministic parallel random-number generators that use these runtime mechanisms, based on a generalization of linear congruential generators, XOR'ing entries in a random table, SHA1, and Feistel networks. Although these deterministic parallel random-number generators are 3 to 18 times slower per function call than a nondeterministic parallel version of the popular Mersenne twister, in practical applications that use random numbers the additional overhead from using an efficient, high-quality DPRNG is relatively small.


Deterministic Nonassociative Reducers

Cilk++ and Intel Cilk Plus support a type of parallel data structure, called a reducer (hyperobject), which allows workers to perform update operations to the data structure in parallel. To accomplish this task, each worker maintains its own "view" of a reducer on which it performs its own update operations, and after these workers complete their parallel tasks, these views are combined using a "reduce" operation such that, if this reduce operation is associative, then the data structure that results from combining these views has serial semantics --- the reducer resulting from any parallel execution is equivalent to the reducer resulting from any serial execution of the Cilk program. While reducers are very useful for parallelizing many programs, for some data types, such as floating-point numbers on which additive updates are performed, no associative reduce operation is apparent. We propose that a runtime mechanism called pedigrees be built into the runtime system in order to support deterministic (nonassociative) reducers, a variant of reducers that guarantees that all executions of the program will yield a reducer data structure with serial semantics. In particular, in this work we design deterministic reducers using a particular pedigree mechanism, with the goal of developing the most efficient design possible. Such deterministic reducers allow programmers to write parallel programs that use data types such as floating-point numbers and still behave deterministically, regardless of the underlying nondeterminism of the Cilk scheduler.


Efficient SAT Solver

SAT solving and its variants (such as MaxSAT and SMT) has applications in many fields in industry. However, the state-of-the-art of SAT solving has experienced a 10-year period of no breakthrough, and there are currently no principled way to parallelize SAT solver. In this project we experiment with new ways to use the conflict-driven-clause-learning (CDCL) technology, and try to find provably efficient ways to parallelize a CDCL solver, for example using a splitters hyperobject. Splitters are hyperobject introduced in Frigo, et al.'s 2009 paper "Reducers and Other Cilk++ Hyperobjects", which allows the workers to modify a global data structure in parallel, as long as the modification do not merge. Besides above topics, it is also necessary to investigate the bounds on the possible speed up we can get from parallelizing a solver, the sound way to evaluate a solver, and pragmatic issues as how to adjust the solver to various industry formulas.

Personal tools
Members Only
Off Topic