VersaBench – A Benchmark Suite for Versatile Architectures


Please follow the following guidelines for reporting results.

  • Benchmarks may be compiled with the best available compiler.

  • Benchmarks may be rewritten in any language (e.g., C, Java, StreamIt, Brook, Verilog) provided the new code adheres to the original algorithms. For example, a recoding of bmm must use the blocked matrix multiply algorithm. The benchmarks may even be hand-coded in assembly to suit a particular architecture.

  • The Versatility may be computed using wall clock times (preferred) or number of cycles, as long as the method used is clearly specified. If using a cycle counter, you will find two timing markers

    • /*** VERSABENCH START ***/ and
    • /*** VERSABENCH END ***/
    that respectively indicate where cycle-counting should begin and end. In the case of the SERVER benchmarks, report the total time to run twenty-four (24) instances of the benchmark.

  • Real architectures are preferred, but simulators may be used.

  • Although the modeling of real I/O is encouraged, we recognize the difficulty in doing so in a prototype environment. We suggest initializing a region of external DRAM with I/O data, and flushing caches so they are not primed prior to the measurement process. Simulation environments often ignore system calls, in that they are treated as magical instructions that can atomically update memory, without polluting the caches. Alternatively, a deionizer may be used to idealize I/O. We went to great lengths to minimize the effects of I/O in the VersaBench suite.

For some benchmarks, we provide multiple inputs, please use the one designated as the reference input (ref) for timing measurements.

The evaluation process affords a lot of flexibility in how the benchmarks may be coded and executed. However, when reporting results, the details of the methodology that is adopted must be clearly described. Some common parameters (following the guidelines above) include:

  • whether a simulator is used
  • the language and compiler used in the implementation (and if any hand-coding is done),
  • whether wall clock times are used, or whether cycles are being measured,
  • the clock speeds that are assumed for the architecture,
  • whether I/O is accurately simulated, or if the I/O costs are ignored,
  • and the speeds assumed for caches and external memory, and whether the external memory is faithfully modeled.


  • Last updated: Saturday June 04 2004