6.894 Lightweight Formal Methods Lecture 18: Dependences May 2, 2005 Topics for today: Introduction Dependences are different from all the formal models we've seen so far. Don't describe _behaviour_, even partially; instead, dependences describe relationships between behavioural entities. In a dependency relation, the entities can be at different levels of granularity (from lines of code to entire subsystems), and can be different kinds of things (code, requirements, runtime objects, development tasks). Dependences have been extensively studied in computer science and software engineering and have lots of applications. The most exciting applications -- to software design and analysis -- are only just being worked out. Most of the rigorous work in CS to date has been on dependences for compilation, not for software engineering -- internal stuff that makes programs run faster but is otherwise invisible to the developer. Dependences versus dependencies? The terms seem to be used interchangeable. But some terminological distinctions are useful: a _dependent_ is something that _depends_ on a _dependee_. Properties of dependence relations Dependences are usually represented as binary relations. Don't have to be homogeneous (eg, module on module); can be heterogeneous (eg, module on test case) too. Not transitive, but often useful to think about the transitive closure. To see why, just think about standard case of module dependences: introduce a facade F between A and B to break the dependence, so that A depends on F and F depends on B. If A as a result depended on B, there would be no point introducing F! Usually not symmetric: if A depends on B, it's rare for B to depend on A. Directionality of dependences tells you what direction changes propagate in. Whether relation is reflexive doesn't really matter -- just a technical issue, since we're usually not interested in the relationship between something and itself. Three Perspectives: Separating Goals, Meaning and Mechanism Any dependence scheme has a -- Purpose: what the whole scheme is for; -- Form: what the dependents and dependees are; and can be viewed in terms of three different perspectives: -- Operational: what you can do when you know the dependence relation; -- Semantic: what a dependence means. -- Analytic: how the dependence relation is obtained, usually by some automatic analysis. The semantic perspective is often very subtle, and because it's hard to articulate, people often don't bother. But it's very important, because it bridges between the other two perspectives. Often a single semantic perspective can serve multiple operational perspectives, and can be implemented by multiple analysis methods. It acts as a 'specification' for the analysis 'implementations'. If you don't articulate it, it's very hard to argue that an analysis is correct. Approximation Analysis usually can't obtain a dependence relation exactly, so it's usually approximated. In almost all cases, it's _over-approximated_, so the analysis might include dependences that aren't really there. For that reason, the operational perspective can only rely on _absence_ of dependences for critical tasks. Three Examples Example: Program Slicing A very influential idea. Mark Weiser; Program slicing; Proceedings of the 5th International Conference on Software Engineering; March 1981. Weiser's paper is very widely cited, and led to a whole subfield of research. He was also very influential in his ideas on 'ubiquitous computing'. Sadly, he died very young. -- Purpose: Help a programming debugging a program. -- Form: (variable, line) pair depends on line. The skeleton program obtained from program P by eliminating all the lines that (x,l) does not depend on is called 'a slice of P with respect to (x,l)'. -- Operational: If the value of x at line l is bad, you can figure out why by looking only at the slice with respect to (x,l). -- Semantic: The slice of P with respect to (x,l) is a program that gives the same behaviour as the original P if you only make observations of x at line l. There are various refinements and subtleties (such as what happens if there's a non-terminating loop before l). The pair (x,l) depends on a line t if the presence of line t can affect the value of x at l. -- Analytic: The analysis method is to translate the source code into a program dependence graph, with an edge from line p to line q when line p writes a variable that q reads in the next step; these are called 'data-flow dependences'. Conditional expressions are also represented as nodes, connected to the statements whose executions they affect; these are called 'control-flow dependences'. To find a slice, you start at the relevant statement and trace backwards, marking every statement of conditional expression that it depends directly or indirectly. The slice is then obtained by deleting from the program all unmarked statements. (You may wonder why the variable itself doesn't get taken into account. See: Daniel Jackson and Eugene J. Rollins; A New Model of Program Dependences for Reverse Engineering; Proc. SIGSOFT Conf. on Foundations of Software Engineering, New Orleans, December 1994; http://sdg.csail.mit.edu/~dnj/publications/fse94.pdf). For a nice summary of PDGs, see: The use of program dependence graphs in software engineering Susan Horwitz, Thomas Reps June 1992 Proceedings of the 14th international conference on Software engineering Example: Aspect My PhD thesis. See: Daniel Jackson; Aspect: an economical bug-detector; Proceedings of the 13th International Conference on Software Engineering; May 1991. -- Purpose: Find bugs in code by comparing the code of a method to a dependency specification. -- Form: Specified dependences take the form: post-state variable or data structure component depends on pre-state variable or data structure component. -- Operational: If a specified dependence is missing from the code, then either there is a bug in the code or the spec is wrong. -- Semantic: A post-state component C' depends on a pre-state component C if there are two executions of the procedure whose pre-states differ only in C and whose post-states differ in at least C'. -- Analytic: The code dependences are obtained by a method close to the standard method for compilation dependences and slicing, but the dependence is on data structure components and not on statements. A tool subtracts the code dependences from the specification dependences; any left over are reported as bugs. Example: Alloy unsat core -- Purpose: Find constraints responsible for an inconsistency when a predicate that is expected to have instances has none; -- Form: Predicate depends on subconstraints; the set of constraints that a predicate depends on is called its 'core'; -- Operational: You can conduct your search for a contradiction entirely within the core, and it's guaranteed that the core will contain a set of one or more constraints that are mutually inconsistent (but it's not guaranteed that you can resolve the contradiction entirely within the core: there may be another contradiction outside it); -- Semantic: Given a particular core, making any syntactically acceptable change to a constraint outside the core will not make the predicate consistent. Like many semantic definitions, this is more subtle than it appears to be. Not the same as saying that 'a constraint is outside the core if modifying it won't make the predicate consistent', because if there are two self-inconsistent constraints, this would result in an empty core! -- Analytic: The analysis method is to translate the constraints to a set of SAT clauses. A SAT solver then returns a subset of this set that is sufficient to establish inconsistency. These are mapped back to Alloy constraints, which are then marked as being in the core. Other uses of code-level dependences Program differencing -- Find fine-grained differences between programs using dependences. Applications include, eg, doing better integration than CVS. Compiler optimization -- Dependence structure can be used to determine when code can be lifted out of loops, what statements can be executed in parallel, etc. Regression test minimization -- Eliminate tests whose executions don't intersect with the forward slices of modified statements Design-level Dependences: Reasoning and Isolation Sample Program: A browser Parnas's Uses Graph David L. Parnas. Designing software for ease of extension and contraction. IEEE Trans. Software Eng. SE-5, 2 (1979). "I have identified some simple concepts that can help programmers design software so that subsets and extensions are more easily obtained. These concepts are simple if you think about software in the way suggested by this paper. Programmers do not commonly do so." --From Parnas's paper on the uses-relation Semantic perspective: A uses B = correct execution of B may be necessary for A to complete the task described in its specification Not the same as invokes: A may invoke B but expect no response; B may be an interrupt handler required to preserve invariants of A. When may A use B according to Parnas? Ý A is made simpler by using B Ý B is not made substantially more complex Ý some subset contains B and not A Ý no subset contains A and not B Operational perspectives: With uses, can organize reasoning, testing, reuse, etc. -- A reusable subset is closed under uses: each module in the subset uses only other modules in the subset. -- To check the correctness of a module, get a spec for each used module; check this module with that spec as an assumed behaviour for the used module, then check the used module against the spec. -- Bottom-up testing: can avoid the need for testing stubs by testing a module before the modules that use it. If uses is acyclic, can define levels Ý level 0: components use no others Ý level K: use at least one component from level K-1 and none from a level higher than K-1 Analytic perspective -- ignoring interrupts and concurrency -- A uses B when a procedure in A calls a procedure in B A New Dependence Model for Reasoning and Isolation This is a model I developed for 6.170 to overcome some of the problems of Parnas's model. A short and easy to read paper describing it: Daniel Jackson. Module Dependencies in Software Design. Post-workshop Proceedings of the 2002 Monterey Workshop: Radical Innovations of Software and Systems Engineering in the Future (Venice, Italy October 7-11, 2002). Springer Verlag, 2003. http://sdg.lcs.mit.edu/pubs/2002/monterey.pdf Why ÔusesÕ is not good enough Ý not adequate to describe modern software no notion of replaceability, for example Ý by definition, uses is transitive! Ý uses is binary; no measure of extent of coupling Ý certain kinds of coupling not captured (eg, sharing) Ý doesn't account for polymorphism Example: observer pattern -- look at uses before and after: how does it help? The new model -- each dependence is mediated by a specification -- the edge A --S--> B says A depends on B to satisfy S Look at observer now: [show diagram] Requires and provides relations -- can split this model into two kinds of binary dependences -- requires: module requires specs -- provides: module provides services that discharge specs For more precision, can make requires a relation for each module from specs to specs. Now we can be more precise: Semantic perspective: -- module M requires T for S means that: in order to provide the service of specification S, module M assumes that a module is present that provides it the service T; -- module M provides S means that it can discharge the obligations of specification S Analytic perspective: -- analysis is now more tricky, because we need to resolve which module provides a given service at runtime. For an application that shows how this more elaborate model was used to find problems in a radiotherapy machine, see: Andrew Rae, Daniel Jackson, Prasad Ramanan, Jay Flanz, and Didier Leyman. Critical feature analysis of a radiotherapy machine. Reliability Engineering & System Safety Volume 89, Industrial Computer System Elsevier B.V. http://authors.elsevier.com/sd/article/S0951832004001759 Other interesting questions -- Name dependence: OO programs typically have name dependences, in which a class names another class, and that class can't be replaced by a class with the same behaviour but a different name. Sometimes, the name dependence isn't accompanied by any actual use -- the module just passes objects through. How should this be handled? Or is it a language defect? See: Felleisen et al's "units". -- Assumptions that can't be captured in a classic spec. What about a process that depends on its scheduler to schedule it often enough? A module that depends on another module not to trash its memory? There's no reason in principle that dependences should always follow procedure calls, with the called procedure providing the service. Approaches from outside CS Design Structure Matrices Invented in the 1960's; lots of recent work by Steve Eppinger (Sloan School). Applications mostly in mechanical engineering. Dependences on design and production tasks, due to information and material flow; Eppinger also considers dependences within design of object itself, and argues that it mirrors dependences of development process. Key idea: reorder and cluster tasks to minimize length of cycles and thus make process more efficient. Several heuristic algorithms invented to do this. Two ideas from DSMs are finding application in software engineering: representing dependences as a matrix, and applying heuristic algorithms to reorganize and identify dependences that cause cycles. [Demonstration of Lattix tool]. Axiomatic Design Theory of complexity and design developed by Nam Suh, MIT Mechanical Engineering. Dependences are tracked between functional variables (FVs) which represent the specification and design variables (DVs) which represent the implementation. Decoupling idea: each FV should depend on at most one DV. Example: faucet design. FVs are: flow and temperature. Design #1: two taps, one for hot and one for cold, with mixer. Design #2: one tap with rotation for temperature and up/down for flow. With Design #1, have to apply series of adjustments to change temperature. This idea seems to be related to Parnas's information hiding: each design 'secret' should be in one module so it can be easily changed.