BENCHMARK Benchmark Summary

This file details a summary of the numbers for this benchmark, followed by an overview of how the numbers were obtained for this benchmark.

StreamIt Number Calculations

Each iteration of the steady state schedule for the StreamIt code produces 256 outputs. The BENCHMARK is performed on a set of 64 points producing 64 complex numbers, or 128 floating point numbers. Each steady state schedule performs 2 64-point BENCHMARKs, thus producing 256 total floating point numbers.

The first iteration was done at time step 0x8d8f (36239 cycles).
The second iteration was done at 0xbc3f (48191 cycles) (delta=11952)
The third iteration done at 0xeb9d (60317 cycles) (delta=12126)
Based on these cycle counts, each iteration takes 11952 min/12126 max cycles (average of 12039).

256 outputs every 11952 cycles, normalized to 10^5 cycles results in a throughput of 256*(100000/11952) = 2141.9009 outputs every 10^5 cycles.

flops reported by RAW's cycle-accurate simulator are 8558 min/8806 max flops(avg=8682), which is (8682 flops/11952 cycles) * 250 million cylces/second = 181.60141 MFLOPS.

Utilization numbers reported were 79886 useful cycles/ 191232 total cycles = 0.41774389.

C Code Number Calculations

(Intel® Xeon™ 2.20 GHz, 512KB L2 cache, 2 GB RAM; Performance Guide pdf)

Performing 10 million iterations at 128 outputs/iteration.
Runtime for 10^7 iterations is 129.72 seconds.

Number of cycles per iteration: 10^7 iterations/ 129.72 second * 128 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 448.5 outputs / 10^5 cycles.

C Code Single Tile Raw Number Calculations

first iteration done at time 0x796d (31085 cycles)
second iteration done at time 0xe40e (58382 cycles) (delta=27297)
third iteration done at time 0x14eaf (85679 cycles) (delta=27297)
Based on these cycle counts, each iteration takes 27297 min/27297 max cycles (average = 27297).

128 outputs every 27297 cycles normalized to 10^5 cycles, 128*(100000/27297) = 468.916 outputs / 10^5 cycles.

flops reported are 7006 flops, which is (7006 flops/27297 cycles) * 250 million cycles/second = 64.16456 MFLOPS.

Utilization numbers reported were 432983 useful cycles/ 436752 total cycles = 0.99137039