Fjalar: A Dynamic Analysis Framework for C and C++ Programs
Fjalar is a framework that facilitates the construction of dynamic
analysis tools for programs written in C and C++. It is often
difficult to build robust and scalable dynamic analyses for C and C++
programs due to the lack of memory and type safety in these languages.
For instance, the run time system does not keep track of array sizes
or whether values have been initialized. Existing frameworks based on
source-to-source transformation often suffer from scalability and
robustness problems due to the difficulty of adding instrumentation
source code to track memory usage and initialization. Frameworks that
operate purely at the binary level cannot provide rich language-level
information about data structures that are useful for many kinds of
analyses. Fjalar combines aspects of both source- and binary-based
approaches and allows tools built upon it to safely access rich
information at both the language and machine levels during run
time.
Fjalar can be used to build tools that dynamically instrument
un-modified x86/Linux executables compiled with DWARF2 debugging
information. The ability to operate on executables rather than source
code places less burden on the tool's users, because there is no need
to deal with complex configuration and Makefile options to determine
which source files to instrument. It also achieves greater
scalability by not having to deal with the difficulties of parsing and
instrumenting complex C and C++ source code constructs. Fjalar has
been tested to work on executables of programs on the order of 1
million lines of code, including gcc, xemacs,
Apache, and CTAS. However, because Fjalar is built
upon the Valgrind binary
instrumentation framework, it shares many of Valgrind's limitations.
In particular, Fjalar can only work on x86 and x86-64 executables on
the Linux platform, and imposes around a 100X slowdown while executing
the instrumented program.
Features
Here are some services that Fjalar provides to tools built upon it:
- Rich compile-time information about data structures,
variables, and functions
Fjalar parses the debugging symbols,
filters out uninteresting and redundant data, and makes information
available to tools in a convenient format consisting of entries for
types, variables, and functions.
- Provides access to variable names, types, addresses, sizes for
static arrays, and visibility (e.g., file static, private member
variable)
- Associates struct/class types with their member variables to
enable easy traversal within nested and recursively-defined data
structures
- Simplifies typedefs by finding their referred-to types,
creates names for unnamed structs/classes, and performs other
misc. tasks to disambiguate obscure usage of C and C++
syntax
- Provides access to function names (including mangled names for
C++), addresses, prototypes, and visibility, and associates
parameters and return values with their types
- Creates unique names for global variables and functions to
eliminate ambiguity for tools that perform whole-program
analysis
- Supports C++ features such as overloaded functions, reference
parameters, multiple class inheritance, and member functions
- Outputs compile-time program structure in XML format for
debugging, program understanding, and further processing. Here
is a sample XML file
showing a C++ stack class and its test program.
- Enables analysis code to run at arbitrary points during
program execution
- The default functionality allows tools to run analysis code
at function entrances and exits. If necessary, though, a tool
could use Valgrind's API to specify other times when its code
should be run.
- Allows tools or users to specify the functions at whose
entrance and exit points the analysis code should run. This
allows analyses to focus on selected functions within a large
program.
- Maintains function execution states at run time, including
register values and a copy of the function's stack
frame.
- Run-time memory initialization and array size
information
- Given an address at runtime, determines whether that address
refers to an initialized value. This can help tools to ignore
garbage values, which never provide useful information and
often hinder the precision of analyses.
- Automatically determines array sizes at run time. This
feature allows tools to safely traverse inside of arrays of all
types without risk of crashing the target program. Given an
address at run time, Fjalar can determine how many elements of a
given type this address refers to.
- Safe recursive traversals within data structures and
arrays
Fjalar allows analyses to traverse through data structures at
run time and perform operations based on observed
values.
- Provides support for traversing inside of structs and
arrays, including performing recursive traversals of structs
within structs. This service provides tools with pointers to
every field of a struct and every array element along with the
corresponding type information. These pointers are guaranteed
to point to allocated and initialized memory locations.
Callback functions passed into the Fjalar traversal routines
allow tools to record, analyze, or modify values at run
time.
- Generates unique names for variables derived from traversing
inside of structures and arrays. These names can be parsed more
or less as
valid C expressions (e.g., foo->bar.baz[1]).
- Allows tools to control how deep to traverse inside of
structs and whether to treat specified pointer variables as
pointers to a single element or to an array of elements. Also
allows tools to coerce pointers to a specified type.
- Allows tools or users to specify a list of variables to
trace, which is useful for focusing analyses on certain data
structures within the target program.
- All instrumentation functionality provided by Valgrind
Valgrind provides an API and plug-in architecture for creating
a variety of dynamic analysis tools. Fjalar allows tool builders to
use all of Valgrind's features in addition to its own
services.
Implementation
Fjalar is implemented in C as a plug-in tool for the Valgrind dynamic binary
instrumentation framework. It uses Valgrind to instrument the target
program's executable with statements to pause execution at certain
points. It uses the Valgrind Memcheck tool to ensure memory safety
during the analysis by providing an indication of which memory
locations hold allocated and/or initialized values. It obtains
language-level information by parsing debugging information in the
DWARF2 format using the Readelf program from the GNU Binutils
collection.
Download
Download Fjalar 1.4 (4.8 MB) -
Newest version includes improvements for handling C++ and multi-threaded
programs. AMD64 support has also been improved. Archive contains
source code for Valgrind, Fjalar, and a basic tool built upon
Fjalar (Released Oct 05, 2009)
Older versions:
View the Fjalar Programmer's Manual
for documentation related to implementing Fjalar tools.
These two header files comprise the interface between Fjalar and
its tools and also serve as documentation: fjalar_include.h and fjalar_tool.h
Fjalar is also distributed as part of the Kvasir and DynComp
tools within the Daikon
distribution. This is an example of using Fjalar to build two
moderately-sized tools.
Papers
-
Philip J. Guo. A Scalable Mixed-Level Approach
to Dynamic Analysis of C and C++
Programs. Master of Engineering thesis,
Department of Electrical Engineering and Computer
Science, Massachusetts Institute of Technology,
May 2006.
(View)
-
Philip Guo and Stephen McCamant. A Scalable Mixed-Level
Framework for Dynamic Analysis of C/C++ Programs. MIT CSAIL Student
Workshop, September 2005.
(View PDF)
Applications of Fjalar
We have used Fjalar to build two dynamic analysis tools.
Kvasir Value Profiling Tool
Kvasir traverses through data structures at run time and outputs
their values to a trace file. The main role of Kvasir is to serve as
a C/C++ front end for the Daikon dynamic invariant detector. Kvasir
provides Daikon with value traces, and Daikon infers likely invariants
over the data structures observed in those traces.
DynComp Dynamic Abstract Type Inference Tool
DynComp computes which variables are likely to be used in a related
manner by the programmer. It infers a finer set of types for
variables called abstract types such that all variables that
belong to the same abstract type have held values that interacted at
some point during execution (i.e., via arithmetic or comparisons).
This finer type information is useful for program understanding, bug
finding, debugging, abstraction violation detection, and aiding other
program analysis tools.
-
Philip Guo and Stephen McCamant. Dynamic Variable Comparability
Analysis for C and C++ Programs. MIT CSAIL Student Workshop,
September 2005.
(View PDF)
- Philip J. Guo, Jeff H. Perkins, Stephen McCamant, and Michael
D. Ernst. Dynamic Inference of Abstract Types. In
Proceedings of the 2006 International Symposium on Software Testing
and Analysis, July 2006.
(View)
-
Download DynComp
with the Daikon distribution
Ongoing and Future Work
- Fjalar currently only supports traversals to variables that
are reachable from globals and function formal parameters and
return values. It would not be too difficult to extend support
to local variables if needed.
- We have done extensive testing on C programs but have not
tested many C++ programs. We support all of the important core
C++ features, but bug fixes and support for additional features
are desirable.
- We would like to design and implement a more general method
for precisely controlling how to traverse inside of data
structures at run time, perhaps implemented in the form of a
query language
- We would like to extend Fjalar to work on additional
platforms that Valgrind now supports, such as the PowerPC.
Related projects
- Valgrind - The dynamic binary
instrumentation framework that Fjalar is built upon
- PIN -
Another dynamic binary instrumentation framework
- libdwarf -
A library for parsing DWARF debugging information
- readelf - program to display information about ELF format object files;
part of GNU binutils
- CIL -
C front-end and simplifier that many source analysis and transformation tools build
upon
People
Please send bug reports and feature requests to
daikon-developers@lists.csail.mit.edu.
About the name
Fjalar is the name of a dwarf in Norse mythology. This name
was inspired both by the DWARF debugging information format and by the
name of the Valgrind framework. Valgrind is the name of a
legendary gate from Norse mythology.
Philip Guo <pgbovine (@) alum (.) mit (.) edu>
Last modified on October 6th, 2009