FFT Benchmark Summary

This file details a summary of how the numbers were obtained for this benchmark.

Benchmark Description lines of code # of constructs in the program Number of filters in the expanded graph
filters pipelines splitjoins feedbackloops
FFT 64 element FFT 200 3 3 2 0 24
Benchmark 250 MHz RAW processor C on a 2.2 GHz Intel Pentium IV
StreamIt on 16 tiles C on a single tile
Utilization # of tiles used MFLOPS Throughput (per 105 cycles) Throughput (per 105 cycles) Throughput (per 105 cycles)
FFT 42% 16 182 2,141.9 468.9 448.5

StreamIt Number Calculations

Each iteration of the steady state schedule for the StreamIt code produces 256 outputs. The FFT is performed on a set of 64 points producing 64 complex numbers, or 128 floating point numbers. Each steady state schedule performs 2 64-point FFTs, thus producing 256 total floating point numbers.

Utilization numbers reported were 79886 useful cycles/ 191232 total cycles = 0.41774389.

flops reported by RAW's cycle-accurate simulator are 8558 min/8806 max flops(avg=8682), which is (8682 flops/11952 cycles) * 250 million cylces/second = 181.60141 MFLOPS.

The first iteration was done at time step 0x8d8f (36239 cycles).
The second iteration was done at 0xbc3f (48191 cycles) (delta=11952)
The third iteration done at 0xeb9d (60317 cycles) (delta=12126)
Based on these cycle counts, each iteration takes 11952 min/12126 max cycles (average of 12039).

256 outputs every 11952 cycles, normalized to 10^5 cycles results in a throughput of 256*(100000/11952) = 2141.9009 outputs every 10^5 cycles.

C Code Single Tile Raw Number Calculations

first iteration done at time 0x796d (31085 cycles)
second iteration done at time 0xe40e (58382 cycles) (delta=27297)
third iteration done at time 0x14eaf (85679 cycles) (delta=27297)
Based on these cycle counts, each iteration takes 27297 min/27297 max cycles (average = 27297).

128 outputs every 27297 cycles normalized to 10^5 cycles, 128*(100000/27297) = 468.916 outputs / 10^5 cycles.

flops reported are 7006 flops, which is (7006 flops/27297 cycles) * 250 million cycles/second = 64.16456 MFLOPS.

Utilization numbers reported were 432983 useful cycles/ 436752 total cycles = 0.99137039

C Code Number Calculations

Performing 10 million iterations at 128 outputs/iteration.
Runtime for 10^7 iterations is 129.72 seconds.

Number of cycles per iteration: 10^7 iterations/ 129.72 second * 128 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 448.5 outputs / 10^5 cycles.