FIR Benchmark Summary

This file details a summary of how the numbers were obtained for this benchmark.

Benchmark Description lines of code # of constructs in the program Number of filters in the expanded graph
filters pipelines splitjoins feedbackloops
FIR 64 tap FIR 125 5 1 0 0 132
Benchmark 250 MHz RAW processor C on a 2.2 GHz Intel Pentium IV
StreamIt on 16 tiles C on a single tile
Utilization # of tiles used MFLOPS Throughput (per 105 cycles) Throughput (per 105 cycles) Throughput (per 105 cycles)
FIR 86% 14 815 1188.1 293.5 445.6

StreamIt Number Calculations

Each iteration of the steady state schedule for the StreamIt code produces a 6 element output array, thus producing 6 outputs per iteration.

Utilization numbers reported were 6606 useful cycles / 7712 total cycles = 0.85658714 .

flops reported by RAW's cycle-accurate simulator are 1728 first/1565 second flops(avg=1646.5), which is (1646.5 flops/505 cycles) * 250 million cylces/second = 815.0990 MFLOPS.

The first iteration done at 0x18f2b (102187 cycles)
The second iteration done at 0x1910d (102669 cycles) (delta=482)
The third iteration done at 0x1931d (103197 cycles) (delta=528)
Based on these cycle counts, each iteration takes 482 first/528 second cycles (average = 505).

6 outputs every 505 cycles, normalized to 10^5 cycles results in a throughput of 6*(100000/505) = 1188.1188 outputs every 10^5 cycles.

C Code Single Tile Raw Number Calculations

first iteration done at 0x1e70 (7792 cycles)
second iteration done at 0x2916 (10518 cycles) (delta=2726)
third iteration done at 0x33bc (13244 cycles) (delta=2726)
Based on these cycle counts, each iteration takes 2726 min/2726 max cycles (average = 2726).

6 outputs every 2726 cycles normalized to 10^5 cycles, 6*(100000/2726) = 220.103 outputs / 10^5 cycles.

flops reported are 516 flops, which is (516 flops/2726 cycles) * 250 million cycles/second = 47.322 MFLOPS.

Utilization numbers reported were 43359 useful cycles/ 43616 total cycles = 0.99410767

C Code Number Calculations

Performing 100 million iterations at 1 outputs/iteration.
Runtime for 10^8 iterations is 10.20 seconds.

Number of cycles per iteration: 10^8 iterations/ 10.20 second * 1 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 445.6328 outputs / 10^5 cycles.