
Summary Session
Moderator: Mark Horowitz Scribe: Steve Keckler
[Commentary]
[Research Questions]
[Wishlist]
[Items Needing Consensus]
[Willing to Contribute]
[Goals]
[Promoting Streaming]
A. Jack Dennis
Jack made the case that the functional programming model augmented
with stream data types is the right approach for stream programming.
First, it provides a formal model of composition which can be used to
compose stream kernels into larger programs. Second, functional
programming's strong links to mathematics and mathematical structure
will enable more sophisticated top-down program analysis, especially
those that are more difficult with imperative programming models. Jack
sees a great need for analysis of signal processing programs and
promise in applying high-level compiler transformations on this
application domain. Finally, functional programming provides the
benefits of implicit parallelism, which he sees as a good match for
the streaming application domain.
His second point focused on determinacy vs. non-determinacy in
streaming systems. His definition of determinacy aims at program
results - the same output is produced given the same input. While he
allows for the possibility non-deterministic behavior encapsulated
within module boundaries, it is critical to choose a language and
programming model that enables determinism for the sake of debugging.
Commenting on architecture, Jack suggested that the key objective of
streaming hardware was to take advantage of the flow of successive
data values, and in particular the organized flow of information from
storage into the computational elements of the machine. He observed
that conventional processors typically lack mechanisms for loading or
storing more than a single "word" to memory and suggested that
conventional processors could be enhanced with multi-word load/store
instructions. Even with conventional cached memory hierarchies, these
augmentations could help provide stream-like behavior to conventional
processors.
B. Burton Smith
Burton presented his observations on a series of slides, which are
available here. Burton
observed a collection of recurring themes through the workshop,
either as points of agreement or debate:
- On-line vs. off-line streaming: how important is real-time
behavior in streaming and what are the implications to streaming
systems.
- Functional programming models or not: more unusual hardware (i.e.,
those that restrict global side-affects) are helped by functional
programming models, but are such programming models necessary or
too restrictive?
- Combinators or not: what kind of aggregation and composition of
stream primitives are necessary and what mechanisms are required
to enable combination (i.e., allowing memory reads?)
- What degree of generality should be allowed in memory references:
bandwidth is really the first order affect here, but latency may
make a difference in cases where there is limited parallelism or
on-line requirements.
- Motivations toward streaming across the communities differ: for
some, the hardware constraints and trends motivate limiting the
"general capabilities" of the architecture; for others, streaming
is a natural fit to their application domain and a convenient
programming model.
Burton then posited that the question of what *isn't* streamable is
intriguing. Examples included:
- computations with little or no parallelism
- computations with lots of control flow, although MIMD streaming
is probably reasonable.
- little leverage can be gained by streaming on operations such as
transposition (i.e., corner turn), sparse matrix multiplication, etc.
- computations that do fine-grain memory updates.
He used the frame buffer as an example of a fine-grained memory update
problem. More detail can be found on his slides.
Finally, Burton suggested several areas of investigation that deserve
further attention:
- The approach of an ordinary language with an extraordinary
compiler?
- The approach of a broad spectrum language with an extraordinary
compiler?
- What features could be used to augment a conventional architecture
without requiring a fundamentally new architecture?
- Can we raise the abstraction levels for on-line programming?
- What are the trade-offs between latency vs. bandwidth
optimizations for this application domain?
- What execution model should hardware use to exploit stream
concurrency?
- VLIW, MIMD, etc.
- How should compiler writers and programmers split the work?
C. Arvind
Arvind suggested that general purpose languages are too broad and that
streaming must define a restricted domain. In addition, the actual
target architecture will likely matter a lot (streaming, general
purpose, VLIW, FPGA) in how streaming applications are optimized and
mapped. As corollaries, he made the following observations:
- Too general a tool (i.e., programming model) will not allow us to
get to the finish line.
- Too narrow a tool may have insufficient power (he drew on Ed Lee's
observation that the synchronous dataflow model was not enough to
express an entire application).
Consequently, the programming model interface needs serious
examination and may require both heterogeneous systems and
architectures.
D. Al Oppenheim
As a self-proclaimed "outsider" to the so-called streaming community,
Al suggested that a major challenge for the group is to describe to
people outside the field, what the field actually is. He gave the
example that he could define "signal processing" in a few sentences,
but does not think that the group has yet been able to articulate the
definition of "stream processing". He also thought that "stream
processing" may require some different terminology as people have
different pre-conceived notions about the term "stream" outside of
this group. He further suggested that when the definition was ready,
that an invited lecture at ICASSP would be appropriate (Dally and
Horowitz agreed that either of them could do it).
Al also thought that there is a difference between "can it be
streamed" and "should it be streamed," and that this area deserved
further discussion. Finally, his biggest takeaway from the workshop
was his increased excitement about signal processing compilers and
opportunities for interesting work in this area, but he wasn't sure if
this pertained to streams.
In the ensuing discussion, Saman asked the panel whether they thought
that we were about to remake past mistakes. Burton responded with the
suggestion that the community be open minded about how to exploit
stream-oriented concurrency and locality, as there are probably many
ways to solve the problem.
- What is stream processing? After a bit of debate, the following
definition was still written on the board.
"A model that uses sequences of data and computation kernels to
expose and exploit concurrency and locality for efficiency."
**Note that this includes efficiency of execution and programmability.
- What is the right model of control for streaming execution
(thread-level, instruction-level, data-level parallelism)?
- What is the right model computation?
- Synchronous data flow (SDF) by itself is not enough, but is it
right for the data plane?
- What else is necessary to create the entire application, such as
glue code or some form of general purpose framework?
- How do we specify operational constraints on streaming applications?
- For example: latency, bandwidth, resources
- The related question is: what computations are performed by the
application and what are the requirements/constraints of the
application?
- Composition - how do we build a structure that can be composed?
- Is it a problem to compose kernels in the presence of
conditionals?
- How general should the expectations and applications of streams be?
- What should be made into streams?
- What should the programmer do and what should the compiler do?
- How does the programmer decompose the program?
- What are the compiler transformations aimed at this class of
applications?
- How do we deal with legacy codes and migration?
- Arvind: lots of transformations are known, but it is a large
engineering task to pull the whole system together.
- Dally:
- Chapin/Bond:
- Lethin:
- W. Mark:
- Amarasinghe:
- Kozyrakis:
- Maze:
- Chapin:
- Amarasinghe:
- Kozyrakis:
- Mattson:
|
more applications
a common programming model, more than just stream kernel processing
an off-the shelf stream processor
a more general parallel processor (late bind the restrictions)
benchmarks, metrics
parameterizable compiler (not just assuming Imagine)
source to matlab libraries
a software development environment
a killer application
programming language people involved in this discussion
patient application developers to provide insightful feedback
|
- Dally/Horowitz: programming models
- The language doesn't matter, but the model does
- The model can evolve over time if necessary
- It's ok to have multiple models... but each model should should be
significantly different from one another. There is no point in
having multiple models that are essentially the same.
- Amarasinghe: applications sets and metrics
- collection of common applications
- defined metrics such as power, latency, etc. so that different
streaming approaches can be compared
- Chapin: a good definition of what streaming is.
Horowitz left this as a hanging question for the group to ponder, but
two items were volunteered:
- Amarasinghe offered to help serve as a clearinghouse for topics on
streaming, so that researchers with similar or complementary
interests can find each other.
- Bond volunteered to make applications available.
- Dally: commercial deployments of real stream systems
- Bond: solid application data to convince sponsor to adopt streaming approach
- Oppenheim: convincing TI and ATI to pay attention to the
technology; eventually convince them to incorporate it into
development of future products.
- Saman/others: convince real programmers to use stream
programming tools (even on conventional hardware) for real full
applications. Of course, this approach will need to provide them
with better performance or programmability.
- Lethin: special issue of CACM/Computer or similar "popular" technical
magazine.
- Amarasinghe: outreach to other communities, such as keynotes and
invited talks
- likely ready in signal processing domain (ICASSP)
- language community probably not the right place at this time
- Dehon: multi-institution courses
- summer school course for researchers (including graduate students)
- video conference courses with cross-institution instructors and students
- Amarasinghe: web page on streaming
- background information
- common bibliography
- Amarasinghe: another workshop
- Horowitz volunteered Dally to host, in about a year.