Our perception of "video data" has been significantly influenced by its long history. Video was designed primarily as a broadcast medium with the major processing costs centralized at the point of production in order to minimize the requirements of the consumer. However, in today's world of low-cost computation resources it seems worthwhile to revisit many of the underlying assumptions that have governed the evolution of video. This is of particular interest when we consider the future development of digital-video standards. Modern digital video requires that a certain amount of processing power be resident at the receiver. We expect that these processing capabilities will migrate toward general-purpose computing models in order to allow for future growth and enhancements. The computational video project summarized here explores how to make the best use of this client-side computing resource and the implications of such a capability on the video medium as a whole.
The fundamental principle of computational video is the encapsulation of video as objects containing data, representing one or more video streams, and methods for operating on video data. This formalism allows common processes associated with video, such as decompression, video editing, and image effects, to be handled in a uniform and maintainable fashion. Using this model we can also readily extend the capabilities of video to include queries, segment switching, information mining, and content playback or rendering. This new form of multimedia will also allow new collaborative applications between multiple viewers.
Unlike conventional digital-video streams, computational video includes computational elements embedded within its data stream. These computational elements, or methods, support dynamic and content-specific user interactions, such as menus, dialogs, conferencing facilities, transaction capabilities, graphic overlays, and indexing frameworks. These methods can be transported to specific clients or broadcast to larger audiences as needed. Multiple video streams can be associated, configured, and controlled using methods attached to one or more video streams. Thus, computational video becomes a natural framework for shared multimedia conferencing applications. In a computational video system, multiple video streams from remote sources can be combined and presented to a number of viewers. The same system can further provide viewers with their own customized view of each session.
Our design partitions computational-video objects into two separate layers. The video component layer provides simple and uniform interfaces to common operations on video-data streams. Video components play the same role in computational video that a standard library plays in a programming language. For example, video components provide for operations such as decompression, spatial filtering and image resizing. Since all components adhere to a common interface, their functionality can be overridden by either substituting one component for another, or cascading multiple components to achieve their aggregate effect. The second layer of a computational-video object is its script layer. The scripting layer is a simple, yet complete, programming language that allows for the declaration of video data and expression of operations upon that data. Each properly formed video script is itself a video component that can be included in other video scripts, or used to override other video components. Our goal is to make this video-scripting language as simple and easy to use as HTML (hypertext markup language). This will allow a novice computational-video user to rapidly construct simple multimedia objects, while also allowing sophisticated users to construct complex applications and rapidly prototype new ideas.
We have some demos of our work: