Bitonic Sort Benchmark Summary

This file details a summary of how the numbers were obtained for this benchmark.

Benchmark Description lines of code # of constructs in the program Number of filters in the expanded graph

filters pipelines splitjoins feedbackloops

Sort 32 element Bitonic Sort 419 4 5 6 0 242

Benchmark 250 MHz RAW processor C on a 2.2 GHz Intel Pentium IV

StreamIt on 16 tiles C on a single tile

Utilization # of tiles used MFLOPS Throughput (per 10⁵ cycles) Throughput (per 10⁵ cycles) Throughput (per 10⁵ cycles)

Sort 64% 16 N/A 2,664.4 225.6 239.4

Benchmark	Description	lines of code	# of constructs in the program	Number of filters in the expanded graph
filters	pipelines	splitjoins	feedbackloops
Sort	32 element Bitonic Sort	419	4	5	6	0	242

Benchmark	250 MHz RAW processor	C on a 2.2 GHz Intel Pentium IV
StreamIt on 16 tiles	C on a single tile
Utilization	# of tiles used	MFLOPS	Throughput (per 10⁵ cycles)	Throughput (per 10⁵ cycles)	Throughput (per 10⁵ cycles)
Sort	64%	16	N/A	2,664.4	225.6	239.4

StreamIt Number Calculations

Each iteration of the steady state schedule for the StreamIt code produces 64 outputs. The Bitonic Sort is performed on a set of 64 points producing 64 complex numbers, or 128 floating point numbers. Each steady state schedule performs 2 64-point Bitonic Sorts, thus producing 256 total floating point numbers.

Utilization numbers reported were 42336 useful cycles/ 72992 total cycles = 0.5800.

The Bitonic Sort was performed on integer numbers, so there were no floating point operations involved.

The first iteration was done at time step 0x5e08 (24072 cycles).
The second iteration was done at 0x6944 (26948 cycles) (delta=2876)
The third iteration was done at 0x74FA (29946 cycles) (delta=2998)
Based on these cycle counts, each iteration takes 2998 cycles.

32 outputs every 2998 cycles, normalized to 10^5 cycles results in a throughput of 64*(100000/2998) = 2134.757 outputs every 10⁵ cycles.

C Code Single Tile Raw Number Calculations

first iteration done at 0x706b (28779 cycles)
second iteration done at 0xdf3e (57150 cycles) (delta=28371)
third iteration done at 0x14e11 (85521 cycles) (delta=28371)
Based on these cycle counts, each iteration takes 28371 min/28371 max cycles (average = 28371).

64 outputs every 28371 cycles normalized to 10^5 cycles, 64*(100000/28371) = 225.582 outputs / 10^5 cycles.

This is an integer application, so there are no FLOPS counts to report.

Utilization numbers reported were 459030 useful cycles/ 460464 total cycles = 0.99688575

C Code Number Calculations

Performing 10 million iterations at 64 outputs/iteration.
Runtime for 10^7 iterations is 121.61 seconds.

Number of cycles per iteration: 10^7 iterations/ 121.61 second * 64 outputs / 1 iteration * 1 second / 2.2*10^9 cycles * 10^5 cycles = 239.215 outputs / 10^5 cycles.