The PARSEC Project

From SuperTech Wiki

Jump to: navigation, search

We are interested in creating serial versions and Cilk parallelizations of the benchmarks in PARSEC, to understand the limitations and challenges in parallel computing regrading modern applications. The PARSEC benchmark suite comes with 13 benchmarks, most parallelized with Pthreads and a few parallelized with OpenMP and TBB. The program pattern spans from simple "embarrassingly parallel" patterns to complicated pipeline parallelism and stencil-like graph computation. The goal of this project is to understand and isolate interesting problems in parallel computing, and so far we have parallelized, to certain levels, 12 benchmarks with Cilk.From the 12 benchmarks we parallelized, we have isolated a few interesting problems: pipeline parallelism and "chromatic scheduling" (an approach to solve graph problems with local updates in parallel), which have spawned into their own problems. We believe that the community will benefit from a complete treatment of PARSEC using the Cilk language, and even expand the suite with more benchmarks. This project is close to finish, and here is a preview of the results. Please stay tuned for the final results.


x264 facesim raytrace vips freqmine fluidanimate streamcluster bodytrack canneal swaption blackscholes dedup ferret
Ready to write pseudo code + speed up pseudo code + speed up pseudo code tune like fluidanimate pseudo code √ speed up √ speed up
Data size 43M video 370k vertices, 850k tetras 1Mx1M image, 3 frames 56M image 31M input 300K particles, 5 frames 16K 128-dim points 4 frames, 4k particles 400k nodes, 128 steps 64, 20k 64K stocks 185M data stream 256 queries
A (any) calcification solution ready
Parallelism weird, it outputs 0 error "Not a valid ELF binary" same bug 8.93 (cilk region 3792.82, burdened 730.48) weird, 0.07 64 3.25 56 40
Their speedup at 8 cores 6.95 7.44 (icc) 6.58 5.51 7.21 4.06 8.24 6.97 7.89 1.92 7.35
Speedup at 8 cores GCC 3.43 will not work with gcc 3.60
Speedup at 8 cores ICC 5.74 7.38 7.28 3 7.27 6.42 2.16 2.01 4.44
Notes Our parallel versions's cache misses is 1.2X of theirs. coloring, two scheme swaps. how few colors can you get in 3D. (2D 1/4 to 1/5 for tie breaking. instead of being 1/9. (use 3D package 1/14th.) same as fluidanimate Routing cost: 6.9e+8. Cilkview 2.0 outputs <1 parallelism for whole program (Work : 942e+6 instructions, Span : 14e+9 instructions), and output extremely large parallelism for cilk parallel region(s): 76952173022003.08 (work: 18e+18 instructions, span: 239e+3 instructions.) hard to understand cache misses bottleneck at output, remove output much faster version without circular buffer speedup the same
INS - might need to ask other people to parallelize it find out bottleneck stripe out wraps and gens, plug in cilk_for, reducers and such wait get back to a serial version (for all programs), correctness checking. floating point substitute (undergrad blurb) . figure out how few colors do you need remove barrier report cilkview bug, submit intel forum, email Barry do the tests Jim suggested find the bottleneck. add rows icc, gcc, gcc with Angelina's runtime. undergrad blurb

other charts: 1. graphics (chart) work on what the graphic are | bar graph for speed up | graph comparing the effort put in | graph the techniques to parallelize them 2. think about how this paper is gonna work; baker's dozen. think about how someone is gonna read this paper. many people will not read this from beginning to end. they are gonna .... dive in and read the one they are interested in it. | index: each one can be read by its own; total summarization; chart showing data synchronization. (none, locks, software pipelining, histogramming. chart. 3. start to make some of these charts up..... make up some charts. want to have charts that illustrates your story. charts just for the pipeline . pseudo code. (for benchmarks) ... vignette ... of each story. (methodology of using things like cilkprof. cilkview. perf. | silkscreen, gotta run silkscreen |. maybe a little more than a couple paragraphs in the introduction. quickly realize it is this routine. then.. - definitely do the cilkprof - bradley- making make. type make, it goes in it, runs all the program, not transcribing them by hand. (to automate... ) dumps a line into ... a line of latex. (automately write ... into file. (pseudo-code: reason for including the pseudo-code. illustrate the algorithm. ) 4. do some meta programming, might help (part of the question of being organized ..)

Personal tools
Members Only
Off Topic