Benchmark | Description | lines of code | # of constructs in the program | Number of filters in the expanded graph | |||
---|---|---|---|---|---|---|---|
filters | pipelines | splitjoins | feedbackloops | ||||
Sort | 32 element Bitonic Sort | 419 | 4 | 5 | 6 | 0 | 242 |
Benchmark | 250 MHz RAW processor | C on a 2.2 GHz Intel Pentium IV | ||||
---|---|---|---|---|---|---|
StreamIt on 16 tiles | C on a single tile | |||||
Utilization | # of tiles used | MFLOPS | Throughput (per 105 cycles) | Throughput (per 105 cycles) | Throughput (per 105 cycles) | |
Sort | 64% | 16 | N/A | 2,664.4 | 225.6 | 239.4 |
Utilization numbers reported were 42336 useful cycles/ 72992 total cycles = 0.5800.
The Bitonic Sort was performed on integer numbers, so there were no floating point operations involved.
The first iteration was done at time step 0x5e08 (24072 cycles).
The second iteration was done at 0x6944 (26948 cycles) (delta=2876)
The third iteration was done at 0x74FA (29946 cycles) (delta=2998)
Based on these cycle counts, each iteration takes 2998 cycles.
32 outputs every 2998 cycles, normalized to 10^5 cycles results in a throughput of 64*(100000/2998) = 2134.757 outputs every 105 cycles.
64 outputs every 28371 cycles normalized to 10^5 cycles, 64*(100000/28371) = 225.582 outputs / 10^5 cycles.
This is an integer application, so there are no FLOPS counts to report.
Utilization numbers reported were 459030 useful cycles/ 460464 total cycles = 0.99688575
Number of cycles per iteration: 10^7 iterations/ 121.61 second * 64 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 239.215 outputs / 10^5 cycles.