Program Analysis Reading Group

[ Next meetings | About the Group | Meeting format | Previous meetings | Future topics ]

Mailing List: send mail | manage your subscription (requires MIT certificate)

Recent and upcoming conferences:
[ ISSTA'08 | ECOOP'08 | USENIX Security'08 | Oakland'08 | PLDI'08 | NDSS'08 | ICSE'08 | POPL'08 ]
[ ICFP'07 | ASE'07 | OOPSLA'07 | ESEC/FSE'07 ]

Next Scheduled Meeting(s):

  • Automatically Checking Type System Soundness: Wednesday, August 6, 2:00pm, 32-G725

  • Efficient Software Model Checking of Soundness of Type Systems
    Michael Roberson, Melanie Agnew, Paul T. Darga, and Chandrasekhar Boyapati
    (Suggested by Adam Kiezun)

    This paper presents novel techniques for checking the soundness of a type system automatically using a software model checker. Our idea is to systematically generate every type correct intermediate program state (within some finite bounds), execute the program one step forward if possible using its small step operational semantics, and then check that the resulting intermediate program state is also correct type--but do so efficiently by detecting similarities in this search space and pruning away large portions of the search space. Thus, given only a specification of type correctness and the small step operational semantics for a language, our system automatically checks type soundness by checking that the progress and preservation theorems hold for the language (albeit for program states of at most some finite size). Our preliminary experimental results on several languages--including a language of integer and boolean expressions, a simple imperative programming language, an object-oriented language which is a subset of Java, and a language with types ownership--indicate that our approach is feasible and that our search space pruning techniques do indeed significantly reduce what is otherwise an extremely large search space. Our paper thus makes contributions both in the area of checking soundness of type systems, and in the area of reducing the state space of a software model checker.

    [ PDF | PS ]

    Future Topics (suggestions welcome):

  • Brute-force Optimization: Thursday, TBD, 3:00pm, 32-G825

  • Automatic Generation of Peephole Superoptimizers
    Sorav Bansal and Alex Aiken
    (Suggested by Adam Kiezun)

    Peephole optimizers are typically constructed using human-written pattern matching rules, an approach that requires expertise and time, as well as being less than systematic at exploiting all opportunities for optimization. We explore fully automatic construction of peephole optimizers using brute force superoptimization. While the optimizations discovered by our automatic system may be less general than human-written counterparts, our approach has the potential to automatically learn a database of thousands to millions of optimizations, in contrast to the hundreds found in current peephole optimizers. We show experimentally that our optimizer is able to exploit performance opportunities not found by existing compilers; in particular, we show speedups from 1.7 to a factor of 10 on some compute intensive kernels over a conventional optimizing compiler.

    [ PDF | PS ]

    See the README file in this directory for instructions about updating this webpage. If you can't perform the steps in that file, then please send the URL of an up-to-date PostScript and/or PDF camera-ready copy of any papers you submit. This saves me a lot of time. ACM Portal pages are preferred; CiteSeer often caches old or preliminary versions of papers. When a paper will appear (or has appeared) in a conference, ensure that we read the final version of the paper rather than a submission or technical report. Authors are usually happy to send a preprint or a camera-ready version.

    About the Group

    The Program Analysis Reading Group is a group of students and faculty that meets once a week to discuss papers drawn from a broad spectrum of research into program analysis, design, implementation, software engineering, and theory. Papers are mostly recent, but occasionally we include an older "classic". Suggestions are welcome for papers to read. Feel free to add yourself to the mailing list (see links at the top of this page); you can remove yourself at any time.

    Meeting format

    The group is informal: we come together to further our understanding of the field. Each group member is expected to read the paper, understand it to the best of your ability, and come to the meeting with questions and topics for discussion. If you haven't read the paper beforehand, you are likely to get much less from the meeting, and also to contribute less to it. However, you don't have to fully understand the paper, or to be an expert in the topic! We encourage researchers of all abilities (from undergrads to faculty) to attend. We have found that what some feared might be "stupid" questions sparked very interesting and informative discussions that left everyone (including both the more experienced and the less experienced members) with a better understanding of the paper.

    Each meeting starts with one group member (the "presenter") giving a 5-10 minute overview of the paper. (In some sense, the presenter just shares with the group the notes that you made when you carefully read the paper.) This sets the background for the rest of the discussion, which consists of people raising the points they noticed when they read the paper, and the group discussing those points.

    Here are examples of topics that the presenter (or others) might raise.

    Previous Meetings:

  • Creating Exploits from Published Patches: Wednesday, July 30, 2:00pm, 32-G825

  • Automatic Patch-Based Exploit Generation is Possible:
    David Brumley, Pongsin Poosankam, Dawn Song, and Jiang Zheng
    (Suggested by Adam Kiezun)

    The automatic patch-based exploit generation problem is: given a program P and a patched version of the program P', automatically generate an exploit for the potentially unknown vulnerability present in P but fixed in P'. In this paper, we propose techniques for automatic patch-based exploit generation, and show that our techniques can automatically generate exploits for 5 Microsoft programs based upon patches provided via Windows Update. Although our techniques may not work in all cases, a fundamental tenet of security is to conservatively estimate the capabilities of attackers. Thus, our results indicate that automatic patch-based exploit generation should be considered practical. One important security implication of our results is that current patch distribution schemes which stagger patch distribution over long time periods, such as Windows Update, may allow attackers who receive the patch first to compromise the significant fraction of vulnerable hosts who have not yet received the patch.

    [ PDF | PS ]

  • Experience with Parallelizing for Multicores: Wednesday, July 16, 2:00pm, 32-G725

  • Software Engineering for Multicore Systems - An Experience Report
    Victor Pankratius and Christoph Schaefer and Ali Jannesari and Walter F. Tichy
    (Suggested by Danny Dig)

    The emergence of inexpensive parallel computers powered by multicore chips combined with stagnating clock rates raises new challenges for software engineering. As future performance improvements will not come "for free" from increased clock rates, performance critical applications will need to be parallelized. However, little is known about the engineering principles for parallel general-purpose applications. This paper presents an experience report with four diverse case studies on multicore software development for general- purpose applications. They were programmed in different languages and benchmarked on several multicore computers. Empirical findings include: 1) Multicore computers deliver: Real speedups are achievable, albeit with significant programming effort and speedups that are typically lower than the number of cores employed; 2) Massive refactoring of sequential programs is required, sometimes at several levels. Special tools for parallelization refactorings appear to be an important area of research; 3) Autotuning is indispensable, as manually tuning thread assignment, number of pipeline stages, size of data partitions and other parameters is difficult and error prone; 4) Architectures that encompass several parallel components are poorly understood. Tuneable architectural patterns with parallelism at several levels need to be discovered.

    Downloads below limited to MIT, paper is also available in ACM DL.

    [ PDF | PS ]

  • Externalizing Concurrency: Wednesday, July 9, 2:00pm, 32-G725

  • Externalizing Java Server Concurrency with CAL
    Charles Zhang and Hans-Arno Jacobsen
    (Suggested by Danny Dig)

    One of most important architectural decisions for a server program is with respect to its concurrency mechanisms. However, the increasingly pervasive applications of networked services makes it difficult, if not impossible, to conceive an optimal concurrency model for a server program at the design time. In this work, we advocate that the concurrency architecture is to be externalized from the server code. To enable such separation, we propose and evaluate CAL, the concurrency aspect library, to raise the concurrency programming abstractions and to mediate the intrinsic differences among concurrency models. Through CAL, a server program can be customized to use concurrency frameworks of fundamentally different natures with no modi?cations. Concurrency programming in CAL is simpli?ed and looks closer to design. Leveraging the customizability of CAL, we show that a commercial complex middle- ware server can outperform its original version by as much as 10 times.

    Downloads below limited to MIT, version copied from the first author's web site.

    NOTICE room and time change. During the summer months we will meet in G725 at 2PM on Wednesdays (so that Mike could attend too).

    [ PDF | PS ]

  • Better ESC: Wednesday, June 18th, 3:00pm, 32-G825

  • Calysto: Scalable and Precise Extended Static Checking
    Domagoj Babic and Alan J. Hu
    (Suggested by Sung Kim and Stephen McCamant)

    Automatically detecting bugs in programs has been a long-held goal in software engineering. Many techniques exist, trading-off varying levels of automation, thoroughness of coverage of program behavior, precision of analysis, and scalability to large code bases. This paper presents the CALYSTO static checker, which achieves an unprecedented combination of precision and scalability in a completely automatic extended static checker. CALYSTO is interprocedurally path-sensitive, fully context-sensitive, and bit-accurate in modeling data operations--comparable coverage and precision to very expensive formal analyses--yet scales comparably to the leading, less precise, static-analysis-based tool for similar properties. Using CALYSTO, we have discovered dozens of bugs, completely automatically, in hundreds of thousands of lines of production, open-source applications, with a very low rate of false error reports. This paper presents the design decisions, algorithms, and optimizations behind CALYSTO's performance.

    Downloads below limited to MIT, but a version with slightly different pagination is available from the first author's web site.

    Correct (non-Latin1) spelling of first author's name: Babić

    [ PDF | PS ]

  • Locks from "atomic": Thursday, May 29th, 3:00pm, 32-G825

  • Inferring Locks for Atomic Sections
    Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani
    (Suggested by Danny Dig)

    Atomic sections are a recent and popular idiom to support the development of concurrent programs. Updates performed within an atomic section should not be visible to other threads until the atomic section has been executed entirely. Traditionally, atomic sections are supported through the use of optimistic concurrency, either using a transactional memory hardware, or an equivalent software emulation (STM).

    This paper explores automatically supporting atomic sections using pessimistic concurrency. We present a system that combines compiler and runtime techniques to automatically transform programs written with atomic sections into programs that only use locking primitives. To minimize contention in the transformed programs, our compiler chooses from several lock granularities, using fine-grain locks whenever it is possible.

    This paper formally presents our framework, shows that our compiler is sound (i.e., it protects all shared locations accessed within atomic sections), and reports experimental results.

    [ PDF | PS ]

  • Sketching for concurrency: Thursday, May 22nd, 3:00pm, 32-G825

  • Sketching Concurrent Data Structures
    Armando Solar-Lezama, Christopher Grant Jones, and Rastislav Bodík
    (Suggested by Stephen McCamant)

    We describe PSKETCH, a program synthesizer that helps programmers implement concurrent data structures. The system is based on the concept of sketching, a form of synthesis that allows programmers to express their insight about an implementation as a partial program: a sketch. The synthesizer automatically completes the sketch to produce an implementation that matches a given correctness criterion.

    PSKETCH is based on a new counterexample-guided inductive synthesis algorithm (CEGIS) that generalizes the original sketch synthesis algorithm from [ASPLOS'06] to cope efficiently with concurrent programs. The new algorithm produces a correct implementation by iteratively generating candidate implementations, running them through a verifier, and if they fail, learning from the counterexample traces to produce a better candidate; converging to a solution in a handful of iterations.

    PSKETCH also extends SKETCH with higher-level sketching constructs that allow the programmer to express her insight as a "soup" of ingredients from which complicated code fragments must be assembled. Such sketches can be viewed as syntactic descriptions of huge spaces of candidate programs (over 108 candidates for some sketches we resolved).

    We have used the PSKETCH system to implement several classes of concurrent data structures, including lock-free queues and concurrent sets with fine-grained locking. We have also sketched some other concurrent objects including a sense-reversing barrier and a protocol for the dining philosophers problem; all these sketches resolved in under an hour.

    [ PDF | PS ]

  • Validating Sanitation Routines: Thursday, May 15th, 3:00pm, 32-G825

  • Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications
    Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic, Engin Kirda, Christorpher Kruegel, and Giovanni Vigna
    (Suggested by Adam Kiezun)

    Web applications are ubiquitous, perform mission-critical tasks, and handle sensitive user data. Unfortunately, web applications are often implemented by developers with limited security skills, and, as a result, they contain vulnerabilities. Most of these vulnerabilities stem from the lack of input validation. That is, web applications use malicious input as part of a sensitive operation, without having properly checked or sanitized the input values prior to their use.

    Past research on vulnerability analysis has mostly focused on identifying cases in which a web application directly uses external input in critical operations. However, little research has been performed to analyze the correctness of the sanitization process. Thus, whenever a web application applies some sanitization routine to potentially malicious input, the vulnerability analysis assumes that the result is innocuous. Unfortunately, this might not be the case, as the sanitization process itself could be incorrect or incomplete.

    In this paper, we present a novel approach to the analysis of the sanitization process. More precisely, we combine static and dynamic analysis techniques to identify faulty sanitization procedures that can be bypassed by an attacker. We implemented our approach in a tool, called Saner, and we applied it to a number of real-world applications. Our results demonstrate that we were able to identify several novel vulnerabilities that stem from erroneous sanitization procedures.

    [ PDF | PS ]

  • Conceptual queries: Thursday, May 1st, 3:00pm, 32-G825

  • Answering Conceptual Queries with Ferret
    Brian de Alwis and Gail C. Murphy
    (Suggested by Jeff Perkins)

    Programmers seek to answer questions as they investigate the functioning of a software system, such as "which execution path is being taken in this case?" Programmers attempt to answer these questions, which we call conceptual queries, using a variety of tools. Each type of tool typically highlights one kind of information about the system, such as static structural information or control-flow information. Unfortunately for the programmer, the tools seldom directly answer the programmer's conceptual queries. Instead, the programmer must piece together results from different tools to determine an answer to the initial query. At best, this process is time consuming and at worst, this process can lead to data overload and disorientation.

    In this paper, we present a model that supports the integration of different sources of information about a program. This model enables the results of concrete queries in separate tools to be brought together to directly answer many of a programmer's conceptual queries. In addition to presenting this model, we present a tool that implements the model, demonstrate the range of conceptual queries supported by this tool, and present the results of use of the conceptual queries in a small field study.

    [ PDF | PS ]

  • Bugs developers will fix: Thursday, April 24th, 3:00pm, 32-G825

  • Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach
    Joseph R. Ruthruff, John Penix, J. David Morgenthaler, Sebastian Elbaum, and Gregg Rothermel
    (Suggested by Shay Artzi)

    Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This paper reports automated support to help address these challenges using logistic regression models that predict the foregoing types of warnings from signals in the warnings and implicated code. Because examining many potential signaling factors in large software development settings can be expensive, we use a screening methodology to quickly discard factors with low predictive power and cost-effectively build predictive models. Our empirical evaluation indicates that these models can achieve high accuracy in predicting accurate and actionable static analysis warnings, and suggests that the models are competitive with alternative models built without screening.

    [ PDF | PS ]

  • Code reviews in Apache: Thursday, April 10th, 3:00pm, 32-G825

  • Open Source Software Peer Review Practices: A Case Study of the Apache Server
    Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey
    (Suggested by Jaime Quinonez)

    Peer review is seen as an important quality assurance mechanism in both industrial development and the open source software (OSS) community. The techniques for performing inspections have been well studied in industry; in OSS development, peer reviews are less well understood. We examine the two peer review techniques used by the successful, mature Apache server project: review-then-commit and commit-then-review. Using archival records of email discussion and version control repositories, we construct a series of metrics that produces measures similar to those used in traditional inspection experiments. Specically, we measure the frequency of review, the level of participation in reviews, the size of the artifact under review, the calendar time to perform a review, and the number of reviews that find defects. We provide a comparison of the two Apache review techniques as well as a comparison of Apache review to inspection in an industrial project. We conclude that Apache reviews can be described as (1) early, frequent reviews (2) of small, independent, complete contributions (3) conducted asynchronously by a potentially large, but actually small, group of self-selected experts (4) leading to an efficient and effective peer review technique.

    [ PDF | PS ]

  • Atomic sets: Thursday, March 27th, 3:00pm, 32-G825

  • Associating Synchronization Constraints with Data in an Object-Oriented Language
    Mandana Vaziri, Frank Tip, and Julian Dolby
    (Suggested by Danny Dig)

    Concurrency-related bugs may happen when multiple threads access shared data and interleave in ways that do not correspond to any sequential execution. Their absence is not guaranteed by the traditional notion of "data race" freedom. We present a new definition of data races in terms of 11 problematic interleaving scenarios, and prove that it is complete by showing that any execution not exhibiting these scenarios is serializable for a chosen set of locations. Our definition subsumes the traditional definition of a data race as well as high-level data races such as stale-value errors and inconsistent views. We also propose a language feature called atomic sets of locations, which lets programmers specify the existence of consistency properties between fields in objects, without specifying the properties themselves. We use static analysis to automatically infer those points in the code where synchronization is needed to avoid data races under our new definition. An important benefit of this approach is that, in general, far fewer annotations are required than is the case with existing approaches such as synchronized blocks or atomic sections. Our implementation successfully inferred the appropriate synchronization for a signicant subset of Java's Standard Collections framework.

    [ PDF | PS ]

  • Why didn't my program do that?: Thursday, March 20th, 3:00pm, 32-G825

  • Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior
    Andrew J. Ko and Brad A. Myers
    (Suggested by Sung Kim)

    When software developers want to understand the reason for a program's behavior, they must translate their questions about the behavior into a series of questions about code, speculating about the causes in the process. The Whyline is a new kind of debugging tool that avoids such speculation by instead enabling developers to select a question about program output from a set of why did and why didn't questions derived from the program's code and execution. The tool then finds one or more possible explanations for the output in question, using a combination of static and dynamic slicing, precise call graphs, and new algorithms for determining potential sources of values and explanations for why a line of code was not reached. Evaluations of the tool on one task showed that novice programmers with the Whyline were twice as fast as expert programmers without it. The tool has the potential to simplify debugging in many software development contexts.

    [ PDF | PS ]

  • Management structure matters: Thursday, March 13th, 3:00pm, 32-G825

  • The Influence of Organizational Structure on Software Quality: An Empirical Case Study
    Nachiappan Nagappan, Brendan Murphy, and Victor R. Basili
    (Suggested by Stephen McCamant)

    Often software systems are developed by organizations consisting of many teams of individuals working together. Brooks states in the Mythical Man Month book that product quality is strongly affected by organization structure. Unfortunately there has been little empirical evidence to date to substantiate this assertion. In this paper we present a metric scheme to quantify organizational complexity, in relation to the product development process to identify if the metrics impact failure-proneness. In our case study, the organizational metrics when applied to data from Windows Vista were statistically significant predictors of failure-proneness. The precision and recall measures for identifying failure-prone binaries, using the organizational metrics, was significantly higher than using traditional metrics like churn, complexity, coverage, dependencies, and pre-release bug measures that have been used to date to predict failure-proneness. Our results provide empirical evidence that the organizational metrics are related to, and are effective predictors of failure-proneness.

    [ PDF | PS ]

  • Cross-testing using concrete and symbolic execution: Thursday, March 6th, 3:00pm, 32-G725

  • Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation
    David Brumley, Juan Caballero, Zhenkai Liang, James Newsome and Dawn Song
    (Suggested by Adam Kiezun)

    Different implementations of the same protocol specification usually contain deviations, i.e., differences in how they check and process some of their inputs. Deviations are commonly introduced as implementation errors or as different interpretations of the same specification. Automatic discovery of these deviations is important for several applications. In this paper, we focus on automatic discovery of deviations for two particular applications: error detection and fingerprint generation.

    We propose a novel approach for automatically detecting deviations in the way different implementations of the same specification check and process their input. Our approach has several advantages: (1) by automatically building symbolic formulas from the implementation, our approach is precisely faithful to the implementation; (2) by solving formulas created from two different implementations of the same specification, our approach significantly reduces the number of inputs needed to find deviations; (3) our approach works on binaries directly, without access to the source code.

    We have built a prototype implementation of our approach and have evaluated it using multiple implementations of two different protocols: HTTP and NTP. Our results show that our approach successfully finds deviations between different implementations, including errors in input checking, and differences in the interpretation of the specification, which can be used as fingerprints.

    [ PDF | PS ]

  • Binary protocol reverse-engineering: Wednesday, February 20th, 3:00pm, 32-G725

  • Automatic Network Protocol Analysis
    Gilbert Wondracek, Paolo Milani Comparetti, Christopher Kruegel, and Engin Kirda
    (Suggested by Vijay Ganesh)

    Protocol reverse engineering is the process of extracting application-level specifications for network protocols. Such specifications are very helpful in a number of security-related contexts. For example, they are needed by intrusion detection systems to perform deep packet inspection, and they allow the implementation of black-box fuzzing tools. Unfortunately, manual reverse engineering is a time-consuming and tedious task. To address this problem, researchers have recently proposed systems that help to automate the process. These systems operate by analyzing traces of network traffic. However, there is limited information available at the network-level, and thus, the accuracy of the results is limited.

    In this paper, we present a novel approach to automatic protocol reverse engineering. Our approach works by dynamically monitoring the execution of the application, analyzing how the program is processing the protocol messages that it receives. This is motivated by the insight that an application encodes the complete protocol and represents the authoritative specication of the inputs that it can accept. In a first step, we extract information about the fields of individual messages. Then, we aggregate this information to determine a more general specification of the message format, which can include optional or alternative fields, and repetitions. We have applied our techniques to a number of real-world protocols and server applications. Our results demonstrate that we are able to extract the format specication for different types of messages. Using these specications, we then automatically generate appropriate parser code.

    [ PDF | PS ]

  • Occurrence typing for Scheme: Wednesday, February 13th, 3:00pm, 32-G725

  • The Design and Implementation of Typed Scheme
    Sam Tobin-Hochstadt and Matthias Felleisen
    (Suggested by Jonathan Edwards and Stephen McCamant)

    When scripts in untyped languages grow into large programs, maintaining them becomes difficult. A lack of types in typical scripting languages means that programmers must (re)discover critical pieces of design information every time they wish to change a program. This analysis step both slows down the maintenance process and may even introduce mistakes due to the violation of undiscovered invariants.

    This paper presents Typed Scheme, an explicitly typed extension of an untyped scripting language. Its type system is based on the novel notion of occurrence typing, which we formalize and mechanically prove sound. The implementation of Typed Scheme additionally borrows elements from a range of approaches, including recursive types, true unions and subtyping, plus polymorphism combined with a modicum of local inference. Initial experiments with the implementation suggest that Typed Scheme naturally accommodates the programming style of the underlying scripting language, at least for the first few thousand lines of ported code.

    [ PDF | PS ]

  • Automatic generation of text tools: Wednesday, February 6th, 3:00pm, 32-G825

  • From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data
    Kathleen Fisher, David Walker, Kenny Q. Zhu, and Peter White
    (Suggested by Jeff Perkins)

    An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. In this paper, we demonstrate that it is possible to generate a suite of useful data processing tools, including a semi-structured query engine, several format converters, a statistical analyzer and data visualization routines directly from the ad hoc data itself, without any human intervention. The key technical contribution of the work is a multi-phase algorithm that automatically infers the structure of an ad hoc data source and produces a format specication in the PADS data description language. Programmers wishing to implement custom data analysis tools can use such descriptions to generate printing and parsing libraries for the data. Alternatively, our software infrastructure will push these descriptions through the PADS compiler, creating format-dependent modules that, when linked with format-independent algorithms for analysis and transformation, result in fully functional tools. We evaluate the performance of our inference algorithm, showing it scales linearly in the size of the training data -- completing in seconds, as opposed to the hours or days it takes to write a description by hand. We also evaluate the correctness of the algorithm, demonstrating that generating accurate descriptions often requires less than 5% of the available data.

    [ PDF | PS ]

  • Pruing paths based on side effects: Wednesday, January 16th, 3:00pm, 32-G725

  • RWset: Attacking Path Explosion in Constraint-Based Test Generation
    Peter Boonstoppel, Cristian Cadar, Dawson Engler
    (Suggested by Adam Kiezun)

    Recent work has used variations of symbolic execution to automatically generate high-coverage test inputs. Such tools have demonstrated their ability to find very subtle errors. However, one challenge they all face is how to effectively handle the exponential number of paths in the code. This paper presents a new technique for reducing the number of traversed code paths by discarding those that must have side-effects identical to some previously explored path. Our results on a mix of open source applications and device drivers show that this (sound) optimization reduces the numbers of paths traversed by several orders of magnitude, often getting coverage out of reach for a standard constraint-based execution system.

    [ PDF | PS ]

  • Symbolic Daikon: Wednesday, December 12th, 3:00pm, 32-G825

  • DySy: Dynamic Symbolic Execution for Invariant Inference
    Christoph Csallner, Nikolai Tillmann and Yannis Smaragdakis
    (Suggested by Shay Artzi)

    Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both "expected" program inputs and the subset of program behaviors that is normal under those inputs. In this paper, we introduce a technique that can drastically increase the relevance of inferred invariants, or reduce the size of the test suite required to obtain good invariants. Instead of falsifying invariants produced by pre-set patterns, we determine likely program invariants by combining the concrete execution of actual test cases with a simultaneous symbolic execution of the same tests. The symbolic execution produces abstract conditions over program variables that the concrete tests satisfy during their execution. In this way, we obtain the benefits of dynamic inference tools like Daikon: the inferred invariants correspond to the observed program behaviors. At the same time, however, our inferred invariants are much more suited to the program at hand than Daikon's hard-coded invariant patterns. The symbolic invariants are literally derived from the program text itself, with appropriate value substitutions as dictated by symbolic execution.

    We implemented our technique in the DySy tool, which utilizes a powerful symbolic execution and simplication engine. The results confirm the benets of our approach. In Daikon's prime example benchmark, we infer the majority of the interesting Daikon invariants, while eliminating invariants that a human user is likely to consider irrelevant.

    [ PDF | PS ]

  • A good UI for code transformation: Wednesday, December 5th, 3:00pm, 32-G825

  • Aligning Development Tools with the Way Programmers Think About Code Changes
    Marat Boshernitsan, Susan L. Graham, and Marti A. Hearst
    (Suggested by Danny Dig)

    Software developers must modify their programs to keep up with changing requirements and designs. Often, a conceptually simple change can require numerous edits that are similar but not identical, leading to errors and omissions. Researchers have designed programming environments to address this problem, but most of these systems are counter-intuitive and dificult to use.

    By applying a task-centered design process, we developed a visual tool that allows programmers to make complex code transformations in an intuitive manner. This approach uses a representation that aligns well with programmers' mental models of programming structures. The visual language combines textual and graphical elements and is expressive enough to support a broad range of code-changing tasks. To simplify learning the system, its user interface scaffolds construction and execution of transformations. An evaluation with Java programmers suggests that the interface is intuitive, easy to learn, and effective on a representative editing task.

    [ PDF | PS ]

  • Delayed types for initialization: Wednesday, November 28th, 3:00pm, 32-G825

  • Establishing Object Invariants with Delayed Types
    Manuel Fähndrich and Songtao Xia
    (Suggested by Matt Papi)

    Mainstream object-oriented languages such as C# and Java provide an initialization model for objects that does not guarantee programmer controlled initialization of fields. Instead, all fields are initialized to default values (0 for scalars and null for non-scalars) on allocation. This is in stark contrast to functional languages, where all parts of an allocation are initialized to programmer-provided values. These choices have a direct impact on two main issues: 1) the prevalence of null in object oriented languages (and its general absence in functional languages), and 2) the ability to initialize circular data structures. This paper explores connections between these differing approaches and proposes a fresh look at initialization. Delayed types are introduced to express and formalize prevalent initialization patterns in object-oriented languages.

    [ PDF | PS ]

  • Indirect conflict awareness: Wednesday, November 14th, 3:00pm, 32-G825

  • Towards Supporting Awareness of Indirect Conflicts across Software Configuration Management Workspaces
    Anita Sarma, Gerald Bortis, and André van der Hoek
    (Suggested by Sung Kim)

    Workspace awareness techniques have been proposed to enhance the effectiveness of software configuration management systems in coordinating parallel work. These techniques share information regarding ongoing changes, so potential conflicts can be detected during development, instead of when changes are completed and committed to a repository. To date, however, workspace awareness techniques only address direct conflicts, which arise due to concurrent changes to the same artifact, but are unable to support indirect conflicts, which arise due to ongoing changes in one artifact affecting concurrent changes in another artifact. In this paper, we present a new, cross-workspace awareness technique that supports one particular kind of indirect conflict, namely those indirect conflicts caused by changes to class signatures. We introduce our approach, discuss its implementation in our workspace awareness tool Palantír, illustrate its potential through two pilot studies, and lay out how to generalize the technique to a broader set of indirect conflicts.

    [ PDF | PS ]

  • Testing using symbolic execution and grammar-based specifications: Wednesday, October 31st, 3:00pm, 32-G825

  • Directed Test Generation using Symbolic Grammars
    Rupak Majumdar and Ru-Gang Xu
    (Suggested by Adam Kiezun)

    We present CESE, a tool that combines exhaustive enumeration of test inputs from a structured domain with symbolic execution driven test generation. We target programs whose valid inputs are determined by some context free grammar. We abstract the concrete input syntax with symbolic grammars, where some original tokens are replaced with symbolic constants. This reduces the set of input strings that must be enumerated exhaustively. For each enumerated input string, which may contain symbolic constants, symbolic execution based test generation instantiates the constants based on program execution paths. The "template" generated by enumerating valid strings reduces the burden on the symbolic execution to generate syntactically valid inputs and helps exercise interesting code paths. Together, symbolic grammars provide a link between exhaustive enumeration of valid inputs and execution-directed symbolic test generation.
    Preliminary experiments with CESE show that the combination achieves better coverage than both pure enumerative test generation and pure directed symbolic test generation, in orders of magnitude less time and number of generated inputs. In addition, CESE is able to automatically generate inputs that achieve coverage within 10% of manually constructed tests.

    [ PDF | PS ]

  • Inferring uniqueness and ownership: Wednesday, October 17th, 3:00pm, 32-G825

  • Inferring Aliasing and Encapsulation Properties for Java
    Kin-Keung Ma and Jeffrey S. Foster
    (Suggested by Michael Ernst)

    There are many proposals for language techniques to control aliasing and encapsulation in object oriented programs, typically based on notions of object ownership and pointer uniqueness. Most of these systems require extensive manual annotations, and thus there is little experience with these properties in large, existing Java code bases. To remedy this situation, we present Uno, a novel static analysis for automatically inferring ownership, uniqueness, and other aliasing and encapsulation properties in Java. Our analysis requires no annotations, and combines an intraprocedural points-to analysis with an interprocedural, demand-driven predicate resolution algorithm. We have applied Uno to a variety of Java applications and found that some aliasing properties, such as temporarily lending a reference to a method, are common, while others, in particular field and argument ownership, are relatively uncommon. As a result, we believe that Uno can be a valuable tool for discovering and understanding aliasing and encapsulation in Java programs.

    [ PDF | PS ]

  • Empirical time complexity: Wednesday, October 10th, 3:00pm, 32-G825

  • Measuring Empirical Computational Complexity
    Simon F. Goldsmith, Alex S. Aiken, and Daniel S. Wilkerson
    (Suggested by Stephen McCamant)

    The standard language for describing the asymptotic behavior of algorithms is theoretical computational complexity. We propose a method for describing the asymptotic behavior of programs in practice by measuring their empirical computational complexity. Our method involves running a program on workloads spanning several orders of magnitude in size, measuring their performance, and fitting these observations to a model that predicts performance as a function of workload size. Comparing these models to the programmer's expectations or to theoretical asymptotic bounds can reveal performance bugs or confirm that a program's performance scales as expected. Grouping and ranking program locations based on these models focuses attention on scalability-critical code. We describe our tool, the Trend Profiler (trend-prof), for constructing models of empirical computational complexity that predict how many times each basic block in a program runs as a linear (y = a + bx) or a powerlaw (y = axb) function of user-specified features of the program's workloads. We ran trend-prof on several large programs and report cases where a program scaled as expected, beat its worst-case theoretical complexity bound, or had a performance bug.

    [ PDF | PS ]

  • Object usage anomalies: Wednesday, October 3rd, 3:00pm, 32-G825

  • Detecting Object Usage Anomalies
    Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig
    (Suggested by Jeff Perkins)

    Interacting with objects often requires following a protocol--for instance, a specic sequence of method calls. These protocols are not always documented, and violations can lead to subtle problems. Our approach takes code examples to automatically infer legal sequences of method calls. The resulting patterns can then be used to detect anomalies such as "Before calling next(), one normally calls hasNext()". To our knowledge, this is the first fully automatic defect detection approach that learns and checks method call sequences. Our Jadet prototype has detected yet undiscovered defects and code smells in ve popular open-source programs, including two new defects in AspectJ.

    [ PDF | PS ]

  • Contract-Driven Development: Wednesday, September 26th, 3:00pm, 32-G825

  • Contract Driven Development = Test Driven Development - Writing Test-Cases
    Andreas Leitner, Ilinca Ciupa, Manuel Oriol, Bertrand Meyer, and Arno Fiva
    (Suggested by Shay Artzi)

    Although unit tests are recognized as an important tool in software development, programmers prefer to write code, rather than unit tests. Despite the emergence of tools like JUnit which automate part of the process, unit testing remains a time-consuming, resource-intensive, and not particularly appealing activity.

    This paper introduces a new method, called Contract-Driven Development, that takes the task of writing unit tests off the developers' shoulders, while still taking advantage of the developers' knowledge of the intended semantics and structure of the code. This methodology exploits actions that programmers perform anyway as part of the normal process of writing code, by extracting test cases from failure-producing runs of the system that the programmers trigger. The approach is based on the presence of contracts in code, which act as the oracle of the test cases. The test cases are extracted completely automatically, run in the background when the code evolves, and can easily be maintained over versions. The tool implementing this methodology is called Cdd and is available both in binary and in source form.

    [ PDF | PS ]

  • Where bad values come from: Wednesday, September 19th, 3:00pm, 32-G825

  • Tracking Bad Apples: Reporting the Origin of Null and Undefined Value Errors
    Michael D. Bond, Nicholas Nethercote, Stephen W. Kent, Samuel Z. Guyer, and Kathryn S. McKinley
    (Suggested by Matt Papi)

    Programs sometimes crash due to unusable values, for example, when Java and C# programs dereference null pointers and when C and C++ programs use undefined values to affect program behavior. A stack trace produced on such a crash identifies the effect of the unusable value, not its cause, and is often not much help to the programmer.

    This paper presents efficient origin tracking of unusable values; it shows how to record where these values come into existence, correctly propagate them, and report them if they cause an error. The key idea is value piggybacking: when the original program stores an unusable value, value piggybacking instead stores origin information in the spare bits of the unusable value. Modest compiler support alters the program to propagate these modified values through operations such as assignments and comparisons. We evaluate two implementations: the first tracks null pointer origins in a JVM, and the second tracks undefined value origins in a memory-checking tool built with Valgrind. These implementations show that origin tracking via value piggybacking is fast and often useful, and in the Java case, has low enough overhead for use in a production environment.

    [ PDF | PS ]

  • JQual: Wednesday, September 5th, 2:00pm, 32-G825

  • Type Qualier Inference for Java
    David Greenfieldboyce and Jeffrey S. Foster
    (Suggested by Telmo Correa)

    *** Note: one hour earlier than the summer time

    Java's type system provides programmers with strong guarantees of type and memory safety, but there are many important properties not captured by standard Java types. We describe JQual, a tool that adds user-defined type qualifiers to Java, allowing programmers to quickly and easily incorporate extra lightweight, application-specic type checking into their programs. JQual provides type qualifier inference, so that programmers need only add a few key qualifier annotations to their program, and then JQual infers any remaining qualifiers and checks their consistency. We explore two applications of JQual. First, we introduce opaque and enum qualifiers to track C pointers and enumerations that flow through Java code via the JNI. In our benchmarks we found that these C values are treated correctly, but there are some places where a client could potentially violate safety. Second, we introduce a readonly qualifier for annotating references that cannot be used to modify the objects they refer to. We found that JQual is able to automatically infer readonly in many places on method signatures. These results suggest that type qualifiers and type qualifier inference are a useful addition to Java.

    [ PDF | PS ]

  • Checkpointing Java: Wednesday, August 29th, 3:00pm, 32-G825

  • Efficient Checkpointing of Java Software Using Context-Sensitive Capture and Replay
    Guoqing Xu, Atanas Rountev, Yan Tang, and Feng Qin
    (Suggested by Sung Kim)

    Checkpointing and replaying is an attractive technique that has been used widely at the operating/runtime system level to provide fault tolerance. Applying such a technique at the application level can benefit a range of software engineering tasks such as testing of long-running programs, automated debugging, and dynamic slicing. We propose a checkpointing/replaying technique for Java that operates purely at the language level, without the need for JVM-level or OS-level support. At the core of our approach are static analyses that select, at certain program points, a safe subset of the program state to capture and replay. Irrelevant statements before the checkpoint are eliminated using control-dependence-based slicing; the remaining statements together with the captured run-time values are used to indirectly recreate the call stack of the original program at the checkpoint. At the checkpoint itself and at certain subsequent program points, the replaying version restores parts of the program state that are necessary for execution of the surrounding method. Our experimental studies indicate that the proposed static and dynamic analyses have the potential to reduce signicantly the execution time for replaying, with low run-time overhead for checkpointing.

    [ PDF | PS ]

  • Mining bad behavior: Wednesday, August 22nd, 3:00pm, 32-G825

  • Mining Specications of Malicious Behavior
    Mihai Christodorescu, Somesh Jha, and Christopher Kruegel
    (Suggested by Michael Ernst)

    Malware detectors require a specication of malicious behavior. Typically, these specications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specication by comparing the execution behavior of a known malware against the execution behaviors of a set of benign programs. In other words, we mine the malicious behavior present in a known malware that is not present in a set of benign programs. The output of our algorithm can be used by malware detectors to detect malware variants. Since our algorithm provides a succinct description of malicious behavior present in a malware, it can also be used by security analysts for understanding the malware. We have implemented a prototype based on our algorithm and tested it on several malware programs. Experimental results obtained from our prototype indicate that our algorithm is effective in extracting malicious behaviors that can be used to detect malware variants.

    [ PDF | PS ]

  • Extending Java with attribute grammars: Wednesday, August 15th, 3:00pm, 32-G825

  • Attribute Grammar-based Language Extensions for Java
    Eric Van Wyk, Lijesh Krishnan, August Schwerdfeger, and Derek Bodin
    (Suggested by Stephen McCamant)

    Abstract. This paper describes the Java Language Extender framework, a tool that allows one to create new domain-adapted languages by importing domain-specic language extensions into an extensible implementation of Java 1.4. Language extensions may define the syntax, semantic analysis, and optimizations of new language constructs. Java and the language extensions are specied as higher-order attribute grammars. We describe several language extensions and their implementation in the framework. For example, one embeds the SQL database query language into Java and statically checks for syntax and type errors in SQL queries. The tool supports the modular specication of composable language extensions so that programmers can import into Java the unique set of extensions that they desire. When extensions follow certain restrictions, they can be composed without requiring any implementation-level knowledge of the language extensions. The tools automatically compose the selected extensions and the Java host language specication.

    [ PDF | PS ]

  • Scala: Wednesday, August 1st, 3:00pm, 32-G825

  • An Overview of the Scala Programming Language
    Martin Odersky, Philippe Altherr, Vincent Cremet, Iulian Dragos, Gilles Dubochet, Burak Emir, Sean McDirmid, Stéphane Micheloud, Nikolay Mihaylov, Michel Schinz, Erik Stenman, Lex Spoon, and Matthias Zenger
    (Suggested by Jeff Perkins)

    Scala fuses object-oriented and functional programming in a statically typed programming language. It is aimed at the construction of components and component systems. This paper gives an overview of the Scala language for readers who are familar with programming methods and programming language design.

    [ PDF | PS ]

  • Effects in object invariants: Wednesday, July 25th, 3:00pm, 32-G825

  • Validity Invariants and Effects
    Yi Lu, John Potter, and Jingling Xue
    (Suggested by Takeo IMAI)

    Object invariants describe the consistency of object states, and are crucial for reasoning about the correctness of object-oriented programs. However, reasoning about object invariants in the presence of object abstraction and encapsulation, arbitrary object aliasing and re-entrant method calls, is difficult.

    We present a framework for reasoning about object invariants based on a behavioural contract that species two sets: the validity invariant -- objects that must be valid before and after the behaviour; and the validity effect -- objects that may be invalidated during the behaviour. The overlap of these two sets is critical because it captures precisely those objects that need to be re-validated at the end of the behaviour. When there is no overlap, no further validity checking is required.

    We also present a type system based on this framework using ownership types to confine dependencies for object invariants. In order to track the validity invariant, the type system restricts updates to permissible contexts, even in the presence of re-entrant calls. Object referencing and read access are unrestricted, unlike earlier ownership type systems.

    [ PDF | PS ]

  • Better error messages for type inference: Wednesday, July 18th, 3:00pm, 32-G825

  • Searching for Type-Error Messages
    Benjamin S. Lerner, Matthew Flower, Dan Grossman, and Craig Chambers
    (Suggested by Matt Papi)

    Advanced type systems often need some form of type inference to reduce the burden of explicit typing, but type inference often leads to poor error messages for ill-typed programs. This work pursues a new approach to constructing compilers and presenting type-error messages in which the type-checker itself does not produce the messages. Instead, it is an oracle for a search procedure that finds similar programs that do type-check. Our two-fold goal is to improve error messages while simplifying compiler construction.

    Our primary implementation and evaluation is for Caml, a language with full type inference. We also present a prototype for C++ template functions, where type instantiation is implicit. A key extension is making our approach robust even when the program has multiple independent type errors.

    [ PDF | PS | ACM Portal ]

  • Generalizing interfaces with type classes: Wednesday, July 11th, 3:00pm, 32-G825

  • JavaGI: Generalized Interfaces for Java
    Stefan Wehr, Ralf Lämmel, and Peter Thiemann
    (Suggested by David Glasser)

    JavaGI is an experimental language that extends Java 1.5 by generalizing the interface concept to incorporate the essential features of Haskell's type classes. In particular, generalized interfaces cater for retroactive and constrained interface implementations, binary methods, static methods in interfaces, default implementations for interface methods, interfaces over families of types, and existential quantication for interface-bounded types. As a result, many anticipatory uses of design patterns such as Adapter, Factory, and Visitor become obsolete; several extension and integration problems can be solved more easily. JavaGI's interface capabilities interact with subtyping (and subclassing) in interesting ways that go beyond type classes. JavaGI can be translated to Java 1.5. Its formal type system is derived from Featherweight GJ.

    [ PDF | PS ]

  • Compound bug predicates: Wednesday, June 27th, 3:00pm, 32-G825

  • Statistical Debugging Using Compound Boolean Predicates
    Piramanayagam Arumuga Nainar, Ting Chen, Jake Rosin, and Ben Liblit
    (Suggested by Michael Ernst)

    Statistical debugging uses dynamic instrumentation and machine learning to identify predicates on program state that are strongly predictive of program failure. Prior approaches have only considered simple, atomic predicates such as the directions of branches or the return values of function calls. We enrich the predicate vocabulary by adding complex Boolean formulae derived from these simple predicates. We draw upon three-valued logic, static program structure, and statistical estimation techniques to efficiently sift through large numbers of candidate Boolean predicate formulae. We present qualitative and quantitative evidence that complex predicates are practical, precise, and informative. Furthermore, we demonstrate that our approach is robust in the face of incomplete data provided by the sparse random sampling that typies post-deployment statistical debugging.

    [ PDF | PS ]

  • Comparing analysis error: Wednesday, June 20th, 3:00pm, 32-G825

  • Comparing Call Graphs
    Ondrej Lhotak
    (Suggested by Stephen McCamant)

    Comparing program analysis results from different static and dynamic analysis tools is difcult and therefore too rare, especially when it comes to qualitative comparison. Analysis results can be strongly affected by specic details of programs being analyzed, so quantitative evaluation should be supplemented by qualitative identication of those details. Our general aim is to develop tools to reduce the difculty of qualitative comparison. In this paper, we focus on comparison of call graphs in particular.

    We present two complementary tools for comparing call graphs. Our main contribution is a call graph difference search tool that ranks call graph edges by their likelihood of causing large differences in the call graphs. This is complemented by a simple interactive call graph viewer that highlights specic differences between call graphs, and allows a user to browse through them. In a search for the causes of call graph differences, a user rst uses the search tool to identify which of the thousands of spurious edges to look at more closely, and then uses the interactive viewer to determine in detail the root cause of a difference.

    We present the ranking algorithm used in the difference search tool. We also report on a case study using the comparison tools to determine the most important sources of imprecision in a typical static call graph by comparing it to a dynamic call graph of the same benchmark.

    (Direct download links restricted to MIT.EDU; use the ACM digital library or email the suggester if you have trouble getting a copy)

    Correct spelling of author's name: Ondřej Lhoták

    [ PDF | PS | ACM Portal ]

  • Call depth in bug finding: Wednesday, June 13th, 3:00pm, 32-G825

  • Variably Interprocedural Program Analysis for Runtime Error Detection
    Aaron Tomb, Guillaume Brat, and Willem Visser
    (Suggested by Jeff Perkins)

    This paper describes an analysis approach based on a combination of static and dynamic techniques to find run-time errors in Java code. It uses symbolic execution to find constraints under which an error (e.g., a null pointer dereference, array out of bounds access, or assertion violation) may occur and then solves these constraints to find test inputs that may expose the error. It only alerts the user to the possibility of a real error when it detects the expected exception during a program run.

    The analysis is customizable in two important ways. First, we can adjust how deeply to follow calls from each top-level method. Second, we can adjust the path termination condition for the symbolic execution engine to be either a bound on the path condition length or a bound on the number of times each instruction can be revisited.

    We evaluated the tool on a set of benchmarks from the literature as well as a number of real-world systems that range in size from a few thousand to 50,000 lines of code. The tool discovered all known errors in the benchmarks (as well as some not previously known) and reported on average 8 errors per 1000 lines of code for the industrial examples. In both cases the interprocedural call depth played little role in the error detection. That is, an intraprocedural analysis seems adequate for the class of errors we detect.

    [ PDF | PS ]

  • Debugging field failures: Wednesday, June 6th, 3:00pm, 32-G825

  • A Technique for Enabling and Supporting Debugging of Field Failures
    James Clause and Alessandro Orso
    (Suggested by Sung Kim)

    It is difficult to fully assess the quality of software in-house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash reporting provide only limited support for this task. In this paper, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in-house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client.

    [ PDF | PS ]

  • Thin Slicing: Wednesday, May 16th, 3:00pm, 32-G825

  • Thin Slicing
    Manu Sridharan, Stephen J. Fink and Rastislav Bodik
    (Suggested by Adam Kiezun)

    Program slicing systematically identifies parts of a program relevant to a seed statement. Unfortunately, slices of modern programs often grow too large for human consumption. We argue that unwieldy slices arise primarily from an overly broad definition of relevance, rather than from analysis imprecision. While a traditional slice includes all statements that may affect a point of interest, not all such statements appear equally relevant to a human.

    As an improved method of finding relevant statements, we propose thin slicing. A thin slice consists only of producer statements for the seed, i.e., those statements that help compute and copy a value to the seed. Statements that explain why producers affect the seed are excluded. For example, for a seed that reads a value from a container object, a thin slice includes statements that store the value into the container, but excludes statements that manipulate pointers to the container itself. Thin slices can also be hierarchically expanded to include statements explaining how producers affect the seed, yielding a traditional slice in the limit.

    We evaluated thin slicing for a set of debugging and program understanding tasks. The evaluation showed that thin slices usually included the desired statements for the tasks (e.g., the buggy statement for a debugging task). Furthermore, in simulated use of a slicing tool, thin slices revealed desired statements after inspecting 3.3 times fewer statements than traditional slicing for our debugging tasks and 9.4 times fewer statements for our program understanding tasks. Finally, our thin slicing algorithm scales well to relatively large Java benchmarks, suggesting that thin slicing represents an attractive option for practical tools.

    [ PDF | PS ]

  • Clarifying error messages: Wednesday, May 9th, 3:00pm, 32-G825

  • Improved Error Reporting for Software that Uses Black-Box Components
    Jungwoo Ha, Christopher J. Rossbach, Jason V. Davis, Indrajit Roy, Hany E. Ramadan, Donald E. Porter, David L. Chen, and Emmett Witchel
    (Suggested by Michael Ernst)

    An error occurs when software cannot complete a requested action as a result of some problem with its input, configuration, or environment. A high-quality error report allows a user to understand and correct the problem. Unfortunately, the quality of error reports has been decreasing as software becomes more complex and layered. End-users take the cryptic error messages given to them by programs and struggle to fix their problems using search engines and support websites. Developers cannot improve their error messages when they receive an ambiguous or otherwise insufcient error indicator from a black-box software component.

    We introduce Clarify, a system that improves error reporting by classifying application behavior. Clarify uses minimally invasive monitoring to generate a behavior profile, which is a summary of the program's execution history. A machine learning classifier uses the behavior profile to classify the application's behavior, thereby enabling a more precise error report than the output of the application itself.

    We evaluate a prototype Clarify system on ambiguous error messages generated by large, modern applications like gcc, LaTeX, and the Linux kernel. For a performance cost of less than 1% on user applications and 4.7% on the Linux kernel, the prototype correctly disambiguates at least 85% of application behaviors that result in ambiguous error reports. This accuracy does not degrade insufficient with more behaviors: a Clarify classier for 81 LaTeX error messages is at most 2.5% less accurate than a classier for 27 LaTeX error messages. Finally, we show that without any human effort to build a classier, Clarify can provide nearest-neighbor software support, where users who experience a problem are told about 5 other users who might have had the same problem. On average 2.3 of the 5 users that Clarify identifies have experienced the same problem.

    [ PDF | PS ]

  • Incrementalizing invariants: Wednesday, May 2nd, 3:00pm, 32-G825

  • DITTO: Automatic Incrementalization of Data Structure Invariant Checks (in Java)
    Ajeet Shankar and Rastislav Bodík
    (Suggested by Jeff Perkins)

    We present DITTO, an automatic incrementalizer for dynamic, side-effect-free data structure invariant checks. Incrementalization speeds up the execution of a check by reusing its previous executions, checking the invariant anew only on the changed parts of the data structure. DITTO exploits properties specific to the domain of invariant checks to automate and simplify the process without restricting what mutations the program can perform. Our incrementalizer works for modern imperative languages such as Java and C#. It can incrementalize, for example, verification of red-black tree properties and the consistency of the hash code in a hash table bucket. Our source-to-source implementation for Java is automatic, portable, and efficient. DITTO provides speedups on data structures with as few as 100 elements; on larger data structures, its speedups are characteristic of non-automatic incrementalizers: roughly 5-fold at 5,000 elements, and growing linearly with data structure size.

    [ PDF | PS ]

  • Automatic fixes for memory errors: Wednesday, April 25th, 3:00pm, 32-G825

  • Exterminator: Automatically Correcting Memory Errors with High Probability
    Gene Novark, Emery D. Berger, and Benjamin G. Zorn
    (Suggested by Yoav Zibin)

    Programs written in C and C++ are susceptible to memory errors, including buffer overflows and dangling pointers. These errors, which can lead to crashes, erroneous execution, and security vulnerabilities, are notoriously costly to repair. Tracking down their location in the source code is difficult, even when the full memory state of the program is available. Once the errors are finally found, fixing them remains challenging: even for critical security-sensitive bugs, the average time between initial reports and the issuance of a patch is nearly one month.

    We present Exterminator, a system that automatically corrects heap-based memory errors without programmer intervention. Exterminator exploits randomization to pinpoint errors with high precision. From this information, Exterminator derives runtime patches that fix these errors both in current and subsequent executions. In addition, Exterminator enables collaborative bug correction by merging patches generated by multiple users. We present analytical and empirical results that demonstrate Exterminator's effectiveness at detecting and correcting both injected and real faults.

    [ PDF | PS ]

  • Static SQL injection checking: Wednesday, April 18th, 3:00pm, 32-G825

  • Sound and Precise Analysis of Web Applications for Injection Vulnerabilities
    Gary Wassermann and Zhendong Su
    (Suggested by Stephen McCamant)

    Web applications are popular targets of security attacks. One common type of such attacks is SQL injection, where an attacker exploits faulty application code to execute maliciously crafted database queries. Both static and dynamic approaches have been proposed to detect or prevent SQL injections; while dynamic approaches provide protection for deployed software, static approaches can detect potential vulnerabilities before software deployment. Previous static approaches are mostly based on tainted information ow tracking and have at least some of the following limitations: (1) they do not model the precise semantics of input sanitization routines; (2) they require manually written specifications, either for each query or for bug patterns; or (3) they are not fully automated and may require user intervention at various points in the analysis. In this paper, we address these limitations by proposing a precise, sound, and fully automated analysis technique for SQL injection. Our technique avoids the need for specifications by considering as attacks those queries for which user input changes the intended syntactic structure of the generated query. It checks conformance to this policy by conservatively characterizing the values a string variable may assume with a context free grammar, tracking the nonterminals that represent user-modifiable data, and modeling string operations precisely as language transducers. We have implemented the proposed technique for PHP, the most widely-used web scripting language. Our tool successfully discovered previously unknown and sometimes subtle vulnerabilities in real-world programs, has a low false positive rate, and scales to large programs (with approx. 100K loc).

    [ PDF | PS ]

  • Adaptive dynamic analysis: Wednesday, April 11th, 3:00pm, 32-G825

  • Adaptive Online Program Analysis: Concepts, Infrastructure, and Applications
    Matthew B. Dwyer, Alex Kinneer, and Sebastian Elbaum
    (Suggested by Shay Artzi)

    Dynamic analysis of state-based properties is being applied to problems such as validation, intrusion detection, and program steering and reconfiguration. Dynamic analysis of such properties, however, is used rarely in practice due to its associated run-time overhead that causes multiple orders of magnitude slowdown of program execution. In this paper, we present an approach for exploiting the state-fullness of specifications to reduce the cost of dynamic program analysis. With our approach, the results of the analysis are guaranteed to be identical to those of the traditional, expensive dynamic analyses, yet with overheads between 23% and 33% relative to the un-instrumented application, for a range of non-trivial analyses. We describe the principles behind our adaptive online program analysis technique, extensions to our Java run-time analysis framework that support such analyses, and report on the performance and capabilities of two different families of adaptive online program analyses.

    [ PDF | PS ]

  • Inputs of death: Wednesday, April 4th, 3:00pm, 32-G825

  • EXE: Automatically Generating Inputs of Death
    Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler
    (Suggested by Adam Kiezun)

    This paper presents EXE, an effective bug-finding tool that automatically generates inputs that crash real code. Instead of running code on manually or randomly constructed input, EXE runs it on symbolic input initially allowed to be "anything." As checked code runs, EXE tracks the constraints on each symbolic (i.e., input-derived) memory location. If a statement uses a symbolic value, EXE does not run it, but instead adds it as an input-constraint; all other statements run as usual. If code conditionally checks a symbolic expression, EXE forks execution, constraining the expression to be true on the true branch and false on the other. Because EXE reasons about all possible values on a path, it has much more power than a traditional runtime tool: (1) it can force execution down any feasible program path and (2) at dangerous operations (e.g., a pointer dereference), it detects if the current path constraints allow any value that causes a bug. When a path terminates or hits a bug, EXE automatically generates a test case by solving the current path constraints to find concrete values using its own co-designed constraint solver, STP. Because EXE's constraints have no approximations, feeding this concrete input to an uninstrumented version of the checked code will cause it to follow the same path and hit the same bug (assuming deterministic code).

    EXE works well on real code, finding bugs along with inputs that trigger them in: the BSD and Linux packet filter implementations, the udhcpd DHCP server, the pcre regular expression library, and three Linux file systems.

    [ PDF | PS ]

  • Tree-based clone detection: Thursday, March 22nd, 3:00pm, 32-G631

  • DECKARD: Scalable and Accurate Tree-based Detection of Code Clones
    Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stéphane Glondu
    (Suggested by Yoav Zibin)

    Note: postponed from 3/14 to 3/21 3/22, new location

    Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space Rn and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

    [ PDF | PS ]

  • Developer information needs: Wednesday, March 7th, 3:00pm, 32-G825

  • Information Needs in Collocated Software Development Teams
    Andrew J. Ko, Robert DeLine, and Gina Venolia
    (Suggested by Sung Kim)

    Previous research has documented the fragmented nature of software development work, with frequent interruptions and coordination. To explain this in more detail, we analyzed software developers' day-to-day information needs. We observed seventeen developers at a large software development company and transcribed their activities minute by minute in 90 minute sessions. We analyzed these logs for the information that developers sought, the sources that they used, and the situations that prevented information from being acquired. We identify twenty-one information types and catalog the outcome and source when each type of information was sought. The most frequently sought information included awareness about artifacts and coworkers. The most often deferred searches included knowledge about design and program behavior, such as why code was written a particular way, what a program was supposed to do, and the cause of a program state. Developers often had to defer tasks because the only sources of knowledge were unavailable coworkers.

    [ PDF | PS ]

  • Random + symbolic testing: Wednesday, February 21st, 3:00pm, 32-G825

  • Hybrid Concolic Testing
    Rupak Majumdar and Koushik Sen
    (Suggested by Carlos Pacheco)

    We present hybrid concolic testing, an algorithm that interleaves random testing with concolic execution to obtain both a deep and a wide exploration of program state space. Our algorithm generates test inputs automatically by interleaving random testing until saturation with bounded exhaustive symbolic exploration of program points. It thus combines the ability of random search to reach deep program states quickly together with the ability of concolic testing to explore states in a neighborhood exhaustively. We have implemented our algorithm on top of CUTE and applied it to obtain better branch coverage for an editor implementation (VIM 5.7, 150K lines of code) as well as a data structure implementation in C. Our experiments suggest that hybrid concolic testing can handle large programs and provide, for the same testing budget, almost 4× the branch coverage than random testing and almost 2× that of concolic testing.

    [ PDF | PS ]

  • Inferring heap types: Wednesday, February 7th, 3:00pm, 32-G825

  • Dynamic Heap Type Inference for Program Understanding and Debugging
    Marina Polishchuk, Ben Liblit and Chloë W. Schulze
    (Suggested by Michael Ernst)

    C programs can be difficult to debug due to lax type enforcement and low-level access to memory. We present a dynamic analysis for C that checks heap snapshots for consistency with program types. Our approach builds on ideas from physical subtyping and conservative garbage collection. We infer a program-defined type for each allocated storage location or identify "untypable" blocks that reveal heap corruption or type safety violations. The analysis exploits symbolic debug information if present, but requires no annotation or recompilation beyond a list of defined program types and allocated heap blocks. We have integrated our analysis into the GNU Debugger (gdb), and describe our initial experience using this tool with several small to medium-sized programs.

    [ PDF | PS ]

  • Detecting Java leaks: Wednesday, January 31st, 3:00pm, 32-G825

  • Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages
    Maria Jump and Kathryn S. McKinley
    (Suggested by Jeff Perkins)

    A memory leak in a garbage-collected program occurs when the program inadvertently maintains references to objects that it no longer needs. Memory leaks cause systematic heap growth, degrading performance and resulting in program crashes after perhaps days or weeks of execution. Prior approaches for detecting memory leaks rely on heap differencing or detailed object statistics which store state proportional to the number of objects in the heap. These overheads preclude their use on the same processor for deployed long-running applications.

    This paper introduces a dynamic heap-summarization technique based on type that accurately identifies leaks, is space efficient (adding less than 1% to the heap), and is time efficient (adding 2.3% on average to total execution time). We implement this approach in Cork which utilizes dynamic type information and garbage collection to summarize the live objects in a type points-from graph (TPFG) whose nodes (types) and edges (references between types) are annotated with volume. Cork compares TPFGs across multiple collections, identifies growing data structures, and computes a type slice for the user. Cork is accurate: it identifies systematic heap growth with no false positives in 4 of 15 benchmarks we tested. Cork's slice report enabled us (non-experts) to quickly eliminate growing data structures in SPECjbb2000 and Eclipse, something their developers had not previously done. Cork is accurate, scalable, and efficient enough to consider using online.

    [ PDF | PS ]

  • A modular certified verifier in Coq: Wednesday, January 24th, 3:00pm, 32-G725

  • Modular Development of Certified Program Verifiers with a Proof Assistant
    Adam Chlipala
    (Suggested by Stephen McCamant)

    I report on an experience using the Coq proof assistant to develop a program verification tool with a machine-checkable proof of full correctness. The verifier is able to prove memory safety of x86 machine code programs compiled from code that uses algebraic datatypes. The tool's soundness theorem is expressed in terms of the bit-level semantics of x86 programs, so its correctness depends on very few assumptions. I take advantage of Coq's support for programming with dependent types and modules in the structure of my development. The approach is based on developing a library of reusable functors for transforming a verifier at one level of abstraction into a verifier at a lower level. Using this library, it's possible to prototype a verifier based on a new type system with a minimal amount of work, while obtaining a very strong soundness theorem about the final product.

    [ PDF | PS ]

  • Markov anomaly detection: Wednesday, January 17th, 3:00pm, 32-G631

  • On-line Anomaly Detection of Deployed Software: A Statistical Machine Learning Approach
    George K. Baah, Alexander Gray, and Mary Jean Harrold
    (Suggested by Shay Artzi)

    This paper presents a new machine-learning technique that performs anomaly detection as software is executing in the field. The technique uses a fully observable Markov model where each state in the model emits a number of distinct observations according to a probability distribution, and estimates the model parameters using the Baum-Welch algorithm. The trained model is then deployed with the software to perform anomaly detection. By performing the anomaly detection as the software is executing, faults associated with anomalies can be located and fixed before they cause critical failures in the system, and developers time to debug deployed software can be reduced. This paper also presents a prototype implementation of our technique, along with a case study that shows, for the subjects we studied, the effectiveness of the technique.

    (Local copy below limited to MIT.EDU)

    [ PDF | PS | ACM Portal ]

    Links


    adonovan@csail.mit.edu
    Last updated: 04 August 2008