Fault Invariant Classifier

Software engineers devote the majority of their project's time lines to debugging programs. Even after extensive testing, no guarantee can be provided that the software project actually follows its specification. The Fault Invariant Classifier is a technique that assists programmers in locating faults in code. It uses models of known faults to discover similar faults in other code. The novelty and power of the technique lie in the ability to base models on examples of code errors, rather than formal definitions, and extending those models to apply to different errors.

The Fault Invariant Classifier extracts properties of code with known errors, and classifies those properties as fault-revealing and non-fault-revealing. Fault-revealing properties appear in faulty code and also in correct code, whereas non-fault-revealing properties appear in all types of code. The Fault Invariant Classifier extracts features from the properties and defines each property as a multidimensional vector. Those vectors are used to train a machine learning model and later classify properties of user code.

We have built an implementation of the Fault Invariant Classifier that uses Daikon to dynamically extract properties, or invariants, and two machine learning algorithms, C5 and SVMfu. C5 is an implementation of the decision tree algorithm and SVMfu is an implementation of the support vector machine algorithm. In preliminary results, the implementation of the Fault Invariant Classifier shows the ability to reduce the non-fault-revealing properties by 60%.

These results are presented in the ICSE 2004 paper ``Finding latent code errors via machine learning over program executions'', by Yuriy Brun and Michael D. Ernst.

Program Analysis Group