Prototyping the TRIPS Scalable Distributed Processing System, WARP2007

Prototyping the TRIPS Scalable Distributed Processing System

Stephen W. Keckler and Doug Burger
University of Texas at Austin

The TRIPS system is a prototype multiprocessor computer designed to demonstrate a dataflow-oriented instruction set architecture and a distributed, scalable, and networked microarchitecture. At the heart of the system is the TRIPS chip, a 170 million transistor custom ASIC implemented in a 130nm process technology. The TRIPS chip is composed of 2 processors, 1 MB of on-chip memory, 2 DRAM controllers, 2 DMA controllers, and a custom 4-port inter-chip router. Each processor can run up to 4 threads simultaneously, execute 16 instructions per cycle and perform 4 memory accesses per cycle, producing a peak theoretical performance of 5.9 GFlops at 366MHz. Processors are composed of 30 individual tiles that interact only through distributed microarchitecture protocols across processor-level data and control networks. Likewise, the on-chip memory system is composed of 16 tiles and can be configured as a static non-uniform cache (NUCA) of different sizes and interleavings. A chip-level interconnection network connects the processors, memory tiles and controllers, and extends gluelessly to neighboring chips using the inter-chip router.

The TRIPS system design places four TRIPS chips, each with 2 GB of DRAM, on a motherboard and enables up to 8 motherboards to be connected into a single distributed multicore multiprocessor with a peak performance of 375 GFlops. TRIPS hardware has been up and running in the lab since January, 2007 and to date we have discoved no hardware bugs. In this talk, we will briefly describe the TRIPS architecture and detail the design of the TRIPS hardware and low-level software. We will also provide insight into the design and verification complexities as well as some of our more interesting experiences in constructing the prototype.