Benchmark | Description | lines of code | # of constructs in the program | Number of filters in the expanded graph | |||
---|---|---|---|---|---|---|---|
filters | pipelines | splitjoins | feedbackloops | ||||
FFT | 64 element FFT | 200 | 3 | 3 | 2 | 0 | 24 |
Benchmark | 250 MHz RAW processor | C on a 2.2 GHz Intel Pentium IV | ||||
---|---|---|---|---|---|---|
StreamIt on 16 tiles | C on a single tile | |||||
Utilization | # of tiles used | MFLOPS | Throughput (per 105 cycles) | Throughput (per 105 cycles) | Throughput (per 105 cycles) | |
FFT | 42% | 16 | 182 | 2,141.9 | 468.9 | 448.5 |
Utilization numbers reported were 79886 useful cycles/ 191232 total cycles = 0.41774389.
flops reported by RAW's cycle-accurate simulator are 8558 min/8806 max flops(avg=8682), which is (8682 flops/11952 cycles) * 250 million cylces/second = 181.60141 MFLOPS.
The first iteration was done at time step 0x8d8f (36239 cycles).
The second iteration was done at 0xbc3f (48191 cycles) (delta=11952)
The third iteration done at 0xeb9d (60317 cycles) (delta=12126)
Based on these cycle counts, each iteration takes 11952 min/12126 max
cycles (average of 12039).
256 outputs every 11952 cycles, normalized to 10^5 cycles results in a throughput of 256*(100000/11952) = 2141.9009 outputs every 10^5 cycles.
128 outputs every 27297 cycles normalized to 10^5 cycles, 128*(100000/27297) = 468.916 outputs / 10^5 cycles.
flops reported are 7006 flops, which is (7006 flops/27297 cycles) * 250 million cycles/second = 64.16456 MFLOPS.
Utilization numbers reported were 432983 useful cycles/ 436752 total cycles = 0.99137039
Number of cycles per iteration: 10^7 iterations/ 129.72 second * 128 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 448.5 outputs / 10^5 cycles.