Abstract.The work described here explores the design of Curl, a single, coherent linguistic basis for expression of Web content at levels ranging from simple formatted text to contemporary object-oriented programming. Curl is part of a research effort aimed at eliminating discontinuities from the function/sophistication curve. This yields an authoring environment in which (1) incremental functionality requires incremental skill acquisition, and (2) a consistent semantics avoids communication obstacles between separately encapsulated fragments of content. We characterize Curl as a "gentle slope system" because it makes it easy to transition from one point to another in the function/sophistication spectrum. |
This complex of independently-conceived content production tools confronts potential Web authors with annoying -- and potentially crippling -- semantic and performance discontinuities. The enhancement of a Web page to include, say, an animation or interactive simulation requires escape from the simple markup language to an unrelated programming language within which the new functionality may be encapsulated. Such discontinuities are undesirable for at least two reasons. First, they represent technological barriers which require substantial new skills to overcome: they tend to partition the content creators into nearly disjoint sets of authors versus programmers. Second, they present serious interface constraints among separately encapsulated functions. It is difficult, for example, to reference some result of an encapsulated computation from external hypertext. It is also difficult to share structured data between different language levels because they usually use different representations.
Curl avoids these problems by using a consistent syntax and semantics for the expression of Web content at levels ranging from simple formatted text to contemporary object-oriented programming. Thus authors with simple tasks can write straightforward code and add more details when more complicated functionality or performance is required. We characterize Curl as a "gentle slope system" because it makes it easy to transition from one point to another in the function/sophistication spectrum.
Technical challenges confronted in the design of such a language stem largely from tensions between ease-of-use characteristics we demand of formatting and scripting languages, and the performance and expressiveness goals we expect of contemporary system programming languages. Strong compile-time typing, for example, is inhumane in the former but essential in the latter. Storage allocation, object orientation, and the handling of English textual content are further examples where the tension between ease-of-use and performance/expressiveness present interesting conundrums for the language designer. In most cases where compromise was necessary, the scripting language aspects were treated as most important.
More generally, we are anxious to capture in Curl the easy approachability to which the Web owes much of its popularity. To a great extent this approachability derives from careful design decisions which allow (or even require) under-specification of function (e.g. output format), in contrast to conventional programming language semantics geared to the complete specification of behavior. Curl's design is explicitly aimed at reconciling the spirit of under-specification with the requirement that it scale to the sophisticated end of the programming spectrum.
Curl source | What user sees | |
The following characters are displayed using a {bold bold-faced
font}. There are {+ 10 4} bold characters altogether.
|
The following characters are displayed using a bold-faced
font. There are 14 bold characters altogether.
|
Interpreting the Curl source above requires definitions for the
bold
and +
operators. In Curl,
the meaning of {operator ...}
is determined by
first locating a definition for operator
in the current
environment and then calling upon that function to parse and interpret
the remaining text "..." of the expression. Note that this mechanism
allows the "..." text to have any desired structure since the parsing
rules are dynamically defined. In this example, the bold
operator treats its arguments as a sequence of Graphic objects (in this
case simple characters) to be displayed with the 'bold
property set to true, while the +
operator parses the
remaining text as a sequence of subexpressions to be evaluated and then
summed. Every Curl object can be converted to a Graphic object (the
conversion process is controlled by a method defined by the object's
type) which in turn can be assembled into a formatted display by the
Curl browser. An interesting side-effect of being able to view
all Curl objects is that we can use the browser metaphor to good
effect when building debuggers, inspectors, etc.
Curl comes with a large (and growing) repertoire of predefined operators which provide formatting operations and a TK-like toolkit of user-interface components. The widespread adoption of TCL/TK and, to a lesser extent, JavaScript/HTML, demonstrates that simple applications of these operators are often sufficient to fulfill many, if not most, of the needs of interactive documents. The following example assembles a few of these components into a simple order form:
Curl source | What user sees | |
{let color:Dynamic='red count=0 | "quantity" can't depend on itself quantity:Dynamic=count {vbox {title Beach Balls} {hbox "Color:" {radiobuttons foo 'red 'white 'blue action={color.set-value foo}}} {hbox "Quantity:" {button "Take another" action={quantity.set-value {set count {+ count 1}}}} {button "Give one back" action={when {> count 0} {quantity.set-value {set count {- count 1}}}}} {paragraph You've ordered {value quantity} {value color} beach balls!}}} |
Since the values for color
and quantity
are
Dynamic
objects, the last line of the display changes
automatically as the user manipulates the color and quantity controls.
A Dynamic object incorporates a simple mechanism for propagating
changes in its value to other dynamic objects that depend on first
object's value. More sophisticated propagation rules could be
supplied by the user by creating a new class of objects derived from
Dynamic objects that have a different "propagate" method.
The screen shot above reflects the fact the user has selected
something besides the default color (red) and quantity (0). Note that
the "text-as-program paradigm" means that the value
operator must be wrapped around variable names inside of paragraphs.
Otherwise, the name of the variable as a text string will be displayed
instead of its value. This example illustrates several Curl features:
Curl source | |
| grab Ballco's pricing server interface {require ballco "http://www.ballco.com/prices.curl"} | our current price for a beach ball {define-variable ball-price:Dynamic={new Dynamic 10}} {define {compute-overhead price} {return {* 1.6 price}}} | update our beach ball price every 10 seconds using | a simple overhead calculation {create-thread {loop {ball-price.set-value {compute-overhead {ballco.get-price 'beach-ball}}} {sleep 10}}} | update interval = 10 sec {let ... {vbox ... {paragraph You've ordered {value quantity} {value color} beach balls which comes to a total of ${* ball-price quantity}}}} |
|
What user sees | |
This example starts by reading in another curl document into an
environment named "ballco". Environments completely encapsulate a
document, making it easy to implement various security, storage
management and clean-up policies. Environments are themselves objects
with instance variables and methods derived from the top-level
definitions in the source documents. Thus ballco.get-price
is a reference to the get-price
procedure defined in
BallCo's "prices.curl" file.
This example also illustrates the use of additional threads of
control to create independent tasks that run in parallel with the
user's main computation. In this case a separate thread is given the
task of periodically querying the BallCo database for the latest beach
ball price which is then used (with a suitable markup!) to update the
price in the example document. Curl threads are preemptive and not
specific to any particular platform; a simple lock
object
with acquire
and release
methods is used as
the lowest-level synchronization primitive. The combination of
threads and dynamic values makes it simple to incorporate information
gleaned from the web into a document. It also shows how procedure and
variable definitions are simply part of the document, though not
displayed.
It is easy to see how a document can grow in incremental steps from a simple text file to include ever more sophisticated interface and information-gathering components. To make this transition a natural one, Curl provides a whole spectrum of operators. Of course users are likely to want operations we haven't thought of (or had time to implement) so it makes sense to provide a way for new operators to be added to the Curl environment. The Curl graphical interface supports the usual things that Web users expect such as tables, hyperlinks and images. This example also shows a simple drag-and-drop interface and how a more sophisticated programmer can extend the functions available to technically naive authors. This example Web page, designed by a Curl neophyte, uses a portfolio object to track investments:
{require portfolio "http://www.portfolios-galore.com/portfolio.curl"} This is a simple demo of {bold Curl}, developed by the {anchor url="http://csail.mit.edu" MIT Laboratory for Computer Science}. It shows the graceful integration of programming with hypertext Web content. {anchor url="Documentation/overview.curl" Click here} for documentation. Our example concerns stocks, for example: {portfolio.stock "NWIR" shares=500 cost=16.125} {portfolio.stock "AAPL" shares=500 cost=23.5} {portfolio.stock "HWP" shares=500 cost=48.25} A small listing of popular stocks can be found {anchor url="topstocks.curl" here}. Securities can be maintained in a {bold portfolio} using a drag-and-drop user interface. {let p={new portfolio.PortfolioBox {portfolio.stock "NSCP" units=100 cost=60.5} {portfolio.stock "MSFT" units=100 cost=101} } {vbox p.graphic {paragraph The total value of my portfolio is ${value p.value}. As of this moment, I've made a net profit of ${- p.value p.cost}.}}} |
PortfolioBox
and stock
have been imported
from the portfolio url and offer some sophisticated features. The
PortfolioBox
displays as a table and accesses a quote
server to maintain up-to-date prices. It provides direct-access
editing features for portfolio manipulation, including a simple drop
method that allows the shown stock objects to be dragged onto the
portfolio window to automatically extend the set of securities
tracked. The stock
object has a simple drag method. As
in the previous example, the value
and cost
fields of the portfolio are Dynamic. and the last paragraph will be
automatically updated as the quantities of stock or their values
change.
Obvious influences on Curl's design include LISP[4,13,14,15], C++, Tcl/Tk[5], TeX[9] and HTML. Language design decisions directly reflect the goal of reconciling its simple accessibility to naive users with the power and efficiency demanded by sophisticated programming tasks. Salient features of Curl include
{circuit-simulation-netlist M1 out in gnd gnd nmos w=4u l=1u M2 out in vdd vdd pmos w=8u l=1u VDD vdd gnd 5v VIN in gnd pwl(0 0 0.1 5) .tran .1n 10n .plot v(in) v(out)}might be found in a document for an engineering course where the instructor wished to insert an interactive simulation for the student to explore.
any
. A datum of type
any
is represented as a pair of words containing the
run-time type of the datum and its value. The semantics of a Curl
program are unchanged if all type declarations are eliminated,
although its performance will suffer. This is the flip side of strong
typing: providing types for each argument or variable can be a time
consuming process which may not be worthwhile while prototyping new
ideas or when efficiency is not of concern to the author.
{let a:any=5 {define {f x:int}:int {return {+ x 3}}} {f a}}will be compiled so that the ambiguously declared variable
a
carries a run-time type, while the formal parameter
x
to the integer procedure f
does not.
Passing a
to f
requires a cast, which
involves a run-time type check after which the value portion of
a
is passed as an integer. To perform such casts
efficiently, type information (when present) is separated from the
representation of values. Because the storage for a value of type
any
is allocated by the compiler, casting a value to the
ambiguous type never causes dynamic storage allocation.
Note that the omission of a type declaration for a variable means something different than in, for example, ML[12]. In ML the variable has a static type that the compiler must determine. In Curl, an omitted type is an under-specification and the actual type may change during program execution.
Since run-time representations are required for our extensible system of types, it is convenient to represent them as Curl objects and to expose them in the programming model. Thus Curl types are simply values: they are named, passed, stored, and accessed identically to other Curl data. This treatment of types provides a natural and consistent semantics for type extension, for type portability (which reduces to the problem of object portability), and for extended type semantics. For example, in
{let t1:type={array int} t2:type={list-of t1} x:t2 ...}
array
and list-of
are procedures which
generate types. This emancipation of types, while semantically
appealing, presents certain challenges. It requires that compilation
order be somewhat lazy, to accommodate circularities. Moreover, the
expressibility of mutable type variables admits constructs whose
efficient translation, or even whose meaning, is questionable. We
currently forbid such constructs, requiring that types used in
declarations be evaluable to compile-time constants.
Ambiguous types, as well as the object class hierarchy, yield a
type lattice in which a value may have arbitrarily many types, only
some of which are compile-time constants. The run-time type of a
value v may be explored using {typeof
v}
, which returns the most specific type object
appropriate to v, using run-time tags if necessary. Similarly,
{isa
t v}
tests the membership of
v in type t, again using the fully-disambiguated
run-time type of v.
Most Curl types are classes. In the abstract, a Curl class provides a namespace whose scope includes objects of that class. Within a class, names can be bound to values, both variable and constant; the latter includes methods specific to the class. Instances of a class have a run-time representation containing slots for the values of variable data, as well as dispatch tables for efficiently invoking methods of the class. Our current implementation uses representations similar to those of C++.
In addition to providing the abstraction mechanism for building complex images, various types of Boxes also control the positioning of their children using dimension information the children supply. Current formatting templates include:
Much of the flexibility of boxes comes from the use of properties to control the rendering of primitive objects. A property is a (name,value) binding and each Graphic object has an associated list of properties. When the value of a property is required, an upward search of the template/instance tree is performed, starting with the current object. Thus properties can be viewed as a dynamically bound environment implemented using a deep binding mechanism. So, as in HTML, many of the rendering choices -- color, font, etc. -- are not made by individual templates but are instead inherited from their parents through their property bindings.
Notification of a change in a property's value is propagated through the tree which may, in turn, cause various reformatting operations to take place. If some Graphic object determines that its image has changed, it can request that it be re-rendered. Since there is sharing of objects within the tree, a re-render request may in fact cause more than one location on the output device to be redrawn. Curl uses the invalidate/repaint protocol used by many GUI frameworks, but users need not concern themselves with this machinery unless they want to.
As in Java, the dispatching of events (mouse motion, keystrokes, etc.) is controlled by the tree where the event handler for the visually topmost Graphic object that encloses the event location is invoked to deal with the event. Handlers can pass the event up the tree if they don't wish to handle the event themselves.
Classes are bound within the environment, neatly embedding the
hierarchy of class names as subtrees of the namespace. Name
resolution is kept simple and unambiguous by requiring the syntax
obj.x
to access x
from an object's class;
the unqualified name x
always references the innermost
lexical binding. Thus for example in
{let x:int=3 | Initialize local variable c:type={class {} | Define local class c x:int {define {m}:int {return {+ x self.x}}}} y:c={new c} | Allocate an instance of c {set y.x 4} | Assign to instance variable {y.m}} | invoke method & return valuethe reference
x
within the method m
gets 3,
while self.x
gets 4.
Note that types may be generated anonymously, e.g. via
class
. Names of types derive solely from bindings in the
environment, whence c
is the name of a type in the above
example. The define-class
form builds a class and enters
it into the current (lexically enclosing) environment.
Curl, as a language, has a type system sufficiently high-level that Curl semantic guarantees can be maintained despite buggy or malignant code via a combination of compile-time and run-time type checks. This is exactly how Java provides this guarantee. These guarantees are the default for normal execution of untrusted code, but entail performance costs and functional restrictions which may be undesirable in high-performance code. For example, stack allocation or allocation in areas that are then reused may provide significant performance advantages. These unsafe mechanisms are also part of the Curl language but are not available in the environment used by untrusted code.
Since the compilation of Curl forms involves invoking compilation
machinery attached to those forms in the compilation environment, the
semantics of a Curl program are entirely derived from the environment
in which it is compiled. A program compiled in an environment with no
bindings for set
or open-file
, for example,
has no access to these primitives. As all imported code is locally
compiled, the tailoring of compilation environments provides an
airtight mechanism for enforcement of an arbitrary variety of safety
policies.
An untrusted applet from an interesting Web page might, for example, offer the user a desktop playpen in which a local simulation environment can be configured. In order to provide for the storage of the applet's state between visits to that page, local file system access is necessary; however, unconstrained access to the local file system presents a security flaw. Such needs can be safely accommodated, however, by providing a limited-access primitive which allows an applet to save its state in a particular file from a user-mediated path, and to subsequently restore it transparently. An untrusted program might request a savelet object which supports such limited access to a single applet-specific file, and be prevented (by the restricted compilation environment) from arbitrary file accesses.
The low-level requirement for any security policy is freedom from stray memory accesses (minimally, from stray writes) and stray system calls. For a safe subset of Curl, one that does not allow stack allocation and other unsafe operations, this guarantee is provided by the above mentioned compilation machinery. For arbitrary Curl programs, an alternative to simply trusting that some code is correct is to use sandboxing[17]. In sandboxing, the compiler inserts a few extra instructions around writes and procedure calls to make sure that writes only occur to areas in memory that the compiler knows are OK, and that calls are only made to procedures under the compiler's control. A system environment that provided sandboxing would be an attractive target for Curl.
Through the object tag, HTML allows an arbitrary external computation to be invoked from an HTML document and its result to be embedded in the document. Of course, the rest of the document cannot interact with the computation; in Curl, by contrast, the markups are expressions in the underlying language and can have arbitrary interactions with the document.
Tcl/Tk provides a scripting language and widget set that allows
user interfaces to be constructed easily and integrated with a
program. Curl procedures with types defaulting to any
also provide this functionality but offer several advantages. First,
Tcl is very inefficient because it represents everything as a string
at the semantic level, while Curl's representations are based on
objects and can be compiled to efficient code. Second, in order to
provide new widgets in Tcl/Tk one has to write code in C. Curl
provides a uniform interface between code at all levels.
Syntactically, Curl resembles LISP and its dialects. We rely on the syntactic extensibility of prefix notation provided by LISP and extend it to include extensible text formatting as part of the language. Curl also differs from LISP in that it is, at its foundation, a statically-typed object-oriented language with non-tagged basic data representations. The part of Curl that is used like a conventional programming language most resembles Dylan[10], before it was changed to use C-like syntax.
The object semantics of Curl are similar to those of Java and C++. Unlike Java, Curl embeds objects within a lexically-scoped procedural programming model, it supports multiple inheritance, and it allows trusted code to exploit the semantic and efficiency advantages of unsafe operations. Unlike C++, Curl supports type safety.
The goal of integrating an authoring/programming language and environment with a gentle slope is a challenging one. There are many places where the most natural or useful syntax and semantics are in conflict. We have tried to resolve these conflicts in a tasteful way. We also had to choose a set of object semantics, a subject about which there is great controversy. The object semantics we have implemented have little impact on the overall system except that we insisted on efficient representations and low dispatch overhead.
Producing an interactive program/document is much easier when the appropriate level of abstraction can be used for all components without having to cross language and representation barriers. It is a challenge to provide high-performance and flexibility but our initial results are encouraging.
[1] See http://home.netscape.com/comprod/products/navigator/version_3.0/building_blocks/jscript/index.html.
[2] Gosling, Joy, Steele, The Java Language Specification, Addison-Wesley. August 1996.
[3] Java white papers at http://java.sun.com/nav/read/whitepapers.html.
[4] William Clinger and Jonathan Rees (editors), The Revised Report on the Algorithmic Language Scheme, Lisp Pointer IV(3): 1-55, July-September 1991.
[5] John K. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, 1993.
[6] World Wide Web Consortium. See http://w3.org/.
[7] T. Berners-Lee and D. Connolly, Hypertext Markup Language: A Representation of Textual Information and Metainformation for Retrieval and Interchange, CERN and Atrium Technology Inc. July 1993.
[8] T. Berners-Lee and D. Connolly, RFC 1866: Hypertext Markup Language --- 2.0, Nov 3, 1995.
[9] Donald E. Knuth, The Texbook: A Complete Guide to Computer Typesetting With TeX, Addison-Wesley. April 1, 1988.
[10] Apple Computer, Dylan Reference Manual, Apple Computer Inc., Cupertino, California. September 29, 1995. See also http://www.cambridge.apple.com.
[11] L.Cardelli, A Polymorphic Lambda Calculus with Type:Type, Digital Equipment Corp., Systems Research Center. 1986.
[12] Robin Milner, Mads Tofte, and Robert Harper, The Definition of Standard ML, MIT Press, Cambridge, 1990.
[13] Guy Steele, Jr., Common Lisp, Second Edition, Digital Press 1990.
[14] David Moon et al, LISP Machine Manual, Fifth Edition, MIT Artificial Intelligense Laboratory, January 1983.
[15] Daniel Bobrow et al, A Common Lisp Object System Specification, Lisp and Symbolic Computation I, January 1989.
[16] Bjarne Stroustrup, The C++ Programming Language, Addison Wesley, 1991.
[17] Robert Wahbe, Steven Lucco, Thomas E. Anderson, Susan L. Graham. Efficient Software-Based Fault Isolation. Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles. December, 1993.
Mat Hostetter (mat@lcs.mit.edu) is an M.Eng candidate in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. His current research interests include runtime code generation, CPU emulation, and high-performance compilers.
David Kranz (kranz@lcs.mit.edu) is a Principal Research Scientist at the MIT Laboratory for Computer Science. His research interests are in programming language design and implementation for parallel computing. Kranz received a BA from Swarthmore. While earning a PhD at Yale, he worked on high-performance compilers for Scheme and applicative languages.
Cotton Seed (cottons@lcs.mit.edu) is a member of the research staff at the MIT Laboratory for Computer Science.
Chris Terman (cjt@mit.edu) is a Research Scientist at the MIT Laboratory for Computer Science. Chris has worked in both academic and industrial settings on a variety of projects including VLSI architectures and design methodologies, computer-aided design tools, language and compiler technology, and web-based educational tools. He received a BA in Physics from Wesleyan University and SM, EE and PhD degrees in Computer Science from MIT.
Steve Ward (ward@mit.edu) is a Professor in the Electrical Engineering and Computer Science Department at the Massachusetts Institute of Technology.