Why Automatic Patch Generation
Software systems has entered every corner of our life. We use software in our desktop PC to work, store our photos to cloud system, and plan our dinner with applications in our phone. Software defects are pervasive in software systems and can cause undesirable user experience, denial of service, or even security exploitation. Generating a patch for a defect is a tedious, time-consuming, and often repetitive process. Human developers have to diagnose the defect from the collected bug reports, understand its root cause, craft the patch to correct the defect, and validate the patch with regression tests and code reviews.
Automatic patch generation holds out the promise of automatically correcting software defects without the need for human developers to diagnose, understand, and correct these defects. Given a set of test cases, at least one of which exposes a defect in the software, the goal of automatic patch generation system is to produce correct patches to eliminate the defect.
Why Learning from Human Patches
As human beings, we wrote a lot of code. The number of open source projects in GitHub just reached 12 millions at 2015 and it is still counting. I believe the large amount of code provides not just challenges but opportunities. It enables many potential data-driven techniques that was not possible before.
Our observation is that when fixing software errors human developers often unconsciously consider only those patches that follow certain software engineering code patterns and that exhibit certain characteristics of successful patches. The goal of our projects is to use a combination of cutting-edge machine learning and program analysis techniques to learn those human knowledges from successful human patches widely available over the web.
Building an automatic patch generation system is very challenging. The given test cases are always incomplete and there could be many plausible patches that produce correct output for all given test cases but produce incorrect output for some other cases. A successful automatic patch generation system has to rely on additional information other than given test cases to rank correct patches ahead of such plausible but incorrect patches. Our ISSTA'15 paper provides a detailed analysis to show why this issue is critical and why many previous patch generation systems produce unsatisfactory results because of ignoring this issue (and making other mistakes).
The standard generate-and-validate approach for building patch generation system operates with a search space derived from a set of transformations to generate a set of candidate patches. It then validates each candidate patch with the given test cases. Like all search-based techniques, it faces an inherent trade-off between the coverage and the tractability. Our ICSE'16 paper discusses and quantitatively analyzes this important search space design trade-off.
Genesis is the state-of-art automatic patch generation system for Java programs. The key difference between Genesis and the rest of patch generation system is that it does not rely on manually defined transforms to generate candidate patches. Instead, Genesis takes a set of past successful human patches and learns what kinds of useful patterns human use to fix errors. It automatically infers a set of code transforms that forms a productive search space respecting the trade-off between the coverage and the tractability. Genesis outperforms the previous patch generation system PAR on a systematically benchmark set of 20 NPE errors, 13 OOB errors, and 16 CCE errors.
Prophet is the state-of-art automatic patch generation system for C programs. It uses a combination of sophisticated machine learning and program analysis techniques to learn a probabilistic model of patch correctness from past successful human patches. It then uses the learned model to guide the automatic patch generation process to prioritize potentially correct patches inside a search space. We applied Prophet to 69 real world defects. Our results show that Prophet outperforms all previous patch generation systems (e.g., SPR, GenProg, and AE) evaluated on this benchmark set.
SPR is the baseline system on which Prophet is built. It uses the condition synthesis technique to explore its search space up to two magnitude faster.
Download: replication package.
Kali is a naive patch generation system that only deletes functionalities. We used Kali in our ISSTA'15 paper to analyze three previous patch generation systems. Although Kali is not designed as a serious patch generation system, Kali generates at least as many correct patches as prior GenProg, RSRepair, and AE systems; Kali also generates at least as many patches that produce correct outputs for the inputs in the validation test suite as the three prior systems. It helped us to identify several important issues in the past patch generation systems.