StreamIt - Quick Start Guide to the StreamIt Compiler

QUICK START GUIDE

The StreamIt compiler is invoked via the strc script:

       strc foo.str

reads foo.str, produces foo.java as an intermediate file, compiles this down to a number of C++ files, and then compiles and links this to produce a binary, a.out.

The StreamIt Cookbook provides a step-by-step tutorial for getting started with the language and compiler. For reference, the command-line options to the compiler are also described below.

strc Command Line Options

--help: Displays a summary of common options.
--more-help: Displays a summary of advanced options (which are not described below).
--cluster <n>: Compile for a cluter or multicore with <n> nodes.
--library: Produce a Java file compatible with the StreamIt Java library, and compile and run it.
--simpleC: Generate a simple C file that inlines the entire application into a single function. This is sometimes more readable than the default uniprocessor output, but the backend is not fully-featured.
--raw <n>, -r <n>: Compile for an <n>-by-<n> Raw processor.
--rstream, -R: Generate a C-like file to be compiled by the RStream compiler from Reservoir Labs.
--output <filename>, -o <filename>: Places the resulting binary in <filename>.
--verbose: Show intermediate commands as they are executed.

Options available for all backends

-O0: Do not optimize (default).
-O1: Perform basic optimizations that should improve performance in most cases. Adds --unroll 16 --destroyfieldarray --partition --wbs.
-O2: Perform extended optimizations that should improve performance in most cases, but may also cause the compiler to become unstable. Adds --unroll 256 --destroyfieldarray --partition --wbs --macros.
--iterations <n>, -i<n>: Run the program for <n> steady-state iterations. Defaults to infinity. For the uniprocessor, cluster, and simpleC backends, the number of iterations can also be passed at the command line of the final executable (a.out -i 100).
--linearreplacement: Domain-specific optimization: combine adjacent ``linear'' filters in the program into a single matrix multiplication operation wherever possible. Corresponds to the ``linear'' option in the PLDI'03 paper.
--statespace: In combination with --linearreplacement, performs combination and optimization of linear statespace filters as described in the CASES'05 paper.
--unroll <n>, -u<n>: Specify loop unrolling limit. The default value is 0.

Options specific to Uniprocessor and Cluster backends

--cacheopt: Performs cache optimizations as described in the LCTES'05 paper.
--l1d <n>: Sets the L1 data cache size (in KB) for cache optimizations. The default is 8 KB.
--l1i <n>: Sets the L1 instruction cache size (in KB) for cache optimizations. The default is 8 KB.
--l2 <n>: Sets the L2 cache size (in KB) for cache optimizations (we assume a unified L2 cache). The default is 256 KB.
--linearpartition, -L: Domain-specific optimization: perform linear replacement and frequency replacement selectively, based on an estimate of where it is most beneficial. Corresponds to the ``autosel'' option in the PLDI'03 paper. (Relies on FFTW installation.)

Options specific to Raw backend

--asciifileio: Specifies that FileReader's and FileWriter's should use ASCII format rather than binary. Also works under the --simpleC backend.
--numbers <n>, -N<n>: Instrument code to gather performance statistics on simulated code over <n> steady-state cycles. The results are placed in results.out in the current directory.
--ssoutputs <n>: For applications containing a dynamic I/O rate, this option indicates how many outputs should count as a steady-state when gathering numbers (with --numbers).
--rawcol <m>, -c<m>: Specify number of columns in Raw processor; --raw specifies number of rows.
--wbs: When laying out communication instructions, use the work-based simulator to estimate exactly when items will be produced and consumed. This improves the scheduling of routing instructions.