The first iteration was done at time step 0x8d8f (36239 cycles).
The second iteration was done at 0xbc3f (48191 cycles) (delta=11952)
The third iteration done at 0xeb9d (60317 cycles) (delta=12126)
Based on these cycle counts, each iteration takes 11952 min/12126 max
cycles (average of 12039).
256 outputs every 11952 cycles, normalized to 10^5 cycles results in a throughput of 256*(100000/11952) = 2141.9009 outputs every 10^5 cycles.
flops reported by RAW's cycle-accurate simulator are 8558 min/8806 max flops(avg=8682), which is (8682 flops/11952 cycles) * 250 million cylces/second = 181.60141 MFLOPS.
Utilization numbers reported were 79886 useful cycles/ 191232 total cycles = 0.41774389.
Performing 10 million iterations at 128 outputs/iteration.
Runtime for 10^7 iterations is 129.72 seconds.
Number of cycles per iteration: 10^7 iterations/ 129.72 second * 128 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 448.5 outputs / 10^5 cycles.
128 outputs every 27297 cycles normalized to 10^5 cycles, 128*(100000/27297) = 468.916 outputs / 10^5 cycles.
flops reported are 7006 flops, which is (7006 flops/27297 cycles) * 250 million cycles/second = 64.16456 MFLOPS.
Utilization numbers reported were 432983 useful cycles/ 436752 total cycles = 0.99137039