Final Report:
Adaptive Knowledge-Based Monitoring
for Information Assurance
Sponsor: DARPA; Air Force Research Laboratory, Contract
Number F30602-99-1-0509
Dates: 5/11/1999-7/10/2003 (including a no-cost
extension)
Report submitted: 5/17/2004
Introduction
Monitoring tasks play a central role in information
assurance, because it is difficult to respond to attacks one cannot detect.
Ensuring effective monitoring is difficult, however, because the threats to
which IA monitoring systems must respond evolve continually in the never-ending
competition between adversaries and defenders. These changing threats pose
important problems for the cost and technological requirements for responding.
Current techniques for building and deploying monitoring
systems rely mostly on human analysis of the structures, tasks and
vulnerabilities of each enclave and on the customized construction and
configuration of monitoring tools by local security personnel. These are
difficult and time-consuming tasks, and in the absence of formalization of their
results, the lessons learned in each enclave are not easily transferred to other
enclaves or situation. In addition, competition with adversaries continues to
ratchet up the sophistication of attacks and therefore the need to deploy more
wide-scale, quickly responding semi-automated defenses.
Our proposal, work on which commenced in May 1999, planned
to address these challenges by focusing on the following technical problems. We
also planned to address solutions to these at several scales, ranging from the
monitoring tasks supervised by an individual security officer or analyst to
larger scales eventually encompassing the NII community.
- Creation of monitors that include situational
awareness to give a context for interpreting the significance of what they
are monitoring
- Building formally-defined models of systems,
enclaves and domains being monitored, threats offered by potential attackers,
and the operations and communications that may be the subject of attacks
- Organizing these plans and processes into sharable
libraries that support efficient adaptation to new, similar tasks and
circumstances
- Develop explicitly encoded knowledge to allow our
system to reason about and partly automate the task of constructing and
updating monitors as new knowledge is acquired and as circumstances change
- Apply a uniform architecture of monitoring to
tasks at different scales of operation and shared across organizations.
Achievements
The purpose of this project was to develop enhanced
technical approaches to providing improved Information Assurance and to
contribute to the DARPA Cyber Command and Control (CC2) effort. Our research
group had begun, under the earlier DARPA High Performance Knowledge Bases (HPKB)
program, to define and build a general knowledge-based monitoring architecture
called MAITA (Monitoring, Analysis and Interpretation Tool Arsenal). With
support from the current project we pursued the objectives listed below. Rather
than trying to summarize all of the research results achieved for these
objectives in this report, we cite the publications that report them and refer
the interested reader there. They are also included as appendices to this
report.
- We further developed the design and implementation of
our monitoring architecture, recognized that it was developing into a
sophisticated high-level distributed operating system, and enhanced its
self-monitoring, checkpoint and automatic restart capabilities. The final
version of the design document is:
Jon Doyle, Isaac Kohane, William Long, and Peter Szolovits, "The
Architecture of MAITA: A Tool For Monitoring, Analysis, and Interpretation",
MIT CSAIL Technical Report, Cambridge, MA 02139, March 2004. The report is
also available on the Web at
http://medg.csail.mit.edu/projects/maita/documents/architecture/architecture-final.pdf.
- We applied the architecture to problems of monitoring
data relevant to detecting potential intrusions into our own computer systems
and began to explore its application to realistic simulation data produced
near the end of the CC2 project by Lincoln Laboratories. The final version of
the implementation is available at
http://medg.csail.mit.edu/projects/maita/maita-system.tar.
Please refer to
http://medg.csail.mit.edu/projects/maita/maita-system-README.html
for pointers on how to install and use the programs.
We also described the application of our methods to the CC2 problem in the
following publications:
- Jon Doyle, Isaac Kohane, William Long, Howard Shrobe,
Peter Szolovits. (2001). Event Recognition Beyond Signature and Anomaly.
IEEE-SMC Workshop on Information Assurance and Security, West Point, NY,
June 5-6, 2001.
http://medg.csail.mit.edu/projects/maita/documents/events/events01.pdf.
- Jon Doyle, Isaac Kohane, William Long, Howard Shrobe,
and Peter Szolovits, "Agile Monitoring for Cyber Defense", Second
DARPA Information Survivability Conference and Exposition (DISCEX-II),
Anaheim, California, June 12-14, 2001
http://medg.csail.mit.edu/projects/maita/documents/agile/agile01.pdf.
- William J. Long, Jon Doyle, Glenn Burke, Peter
Szolovits. (2003). Detection of intrusion across multiple sensors. SPIE
Signals and Image Processing Conference.
http://medg.csail.mit.edu/projects/maita/documents/events/detection-gg.pdf.
We conducted research on a number of fundamental problems
that we had identified as critical to making further progress in this domain.
These included:
- Michael McGeachie, under the supervision of Dr. Jon
Doyle, developed as part of his Master's thesis a method of allowing users to
specify their preferences, all other things being equal, and automatically
turning these into a classical utility function that is consistent with the
user's preferences. Mike's Master's thesis is
Michael McGeachie. "Utility Functions for Ceteris Paribus Preferences",
Masters Thesis, Massachusetts Institute of Technology. 2002. It is available
at
http://www.mit.edu/~mmcgeach/docs/mmcgeach-masters.PDF.
Other publications based on this work include:
McGeachie, M. (2001) "Utility Function for Autonomous Agent Control," MIT
Student Oxygen Workshop. (http://www.mit.edu/~mmcgeach/docs/mike_oxygen_submission.pdf)
Michael McGeachie and Jon Doyle "Utility Functions for Ceteris Paribus
Preferences" AAAI First Workshop on Preferences in AI, Edmonton, Alberta
(2002). (http://www.csc.ncsu.edu/faculty/doyle/publications/uc02.pdf).
Michael McGeachie and Jon Doyle. "Efficient Utility Functions for Ceteris
Paribus Preferences." AAAI Eighteenth National Conference on Artificial
Intelligence, Edmonton, Alberta (2002). (http://www.mit.edu/~mmcgeach/docs/cuc02.pdf)
Michael McGeachie and Jon Doyle. Utility Functions for Ceteris Paribus
Preferences. Computational Intelligence, 20:2:158-217, May
2004.
(http://www.mit.edu/~mmcgeach/docs/ci03-utilityfunctions-mcgeachie-doyle.pdf)
- Mary DeSouza, supervised by Prof. Szolovits, developed
an optimized method of matching trend templates (which describe temporal and
inter-signal patterns of interest) against data arriving in real time. This
improved both the speed and accuracy of the trend template matcher earlier
developed by Dr. Ira Haimowitz. The thesis is available as:
Mary T. DeSouza, "Automated Medical Trend Detection", MIT M.Eng. thesis, May
2000. (http://medg.csail.mit.edu/projects/maita/documents/desouza/thesis-reprint.pdf).
- Stephen Bull, as part of his Master's thesis, devised a
modified language for expressing trend templates that permits more efficient
matching without significantly reducing the expressiveness of the templates
that can be defined. Its principal contribution was to reduce the amount of
search needed to find the best time point at which to end one temporal segment
of a template and begin the next. The thesis is:
Bull, Stephen M. "Diagnostic Process Monitoring with Temporally Uncertain
Models." MIT EECS Master of Engineering Thesis, May 2002.
(http://medg.csail.mit.edu/projects/maita/documents/bull/Bull-thesis.pdf)
- Dr. William Long created a new pre-processing method
that segments time into successive periods during which all changing
time-oriented signals can be reasonably approximated by a linear
relationship. This algorithm processes a continuous stream of data by fitting
a regression line to the data and comparing that to the fit of two lines
connected at the optimum point. If the two segments are better, the last
segment is used for making decisions about the data that follows. The
criteria for deciding which is the better fit can be arbitrary, including
factors such as the characteristics of other data streams and the probability
of a change in slope. The technique is generalizable to other alternate
hypotheses such as single point outliers and constant shifts of the regression
line. The current version of the paper is available as:
William Long, "Real-Time Trend Detection
Using Segmental Linear Regression".
(http://medg.csail.mit.edu/projects/maita/documents/trends/segmental-trends.pdf)
- Dr. Christine Tsien developed, as part of her PhD
thesis, a method of deriving "interesting" temporal patterns using more
conventional machine learning techniques. In contrast with more
knowledge-based methods, she encoded a large variety of signals derived from
raw data by smoothing, trending and temporal analysis methods, and allowed the
machine learning methods to choose which of these derived signals most
accurately predicted the desired interpretations of the signals. Her doctoral
thesis is:
Christine L. Tsien, "TrendFinder: Automated Detection of Alarmable Trends, MIT
Ph.D. dissertation, April 2000.
(http://medg.csail.mit.edu/projects/maita/documents/tsien/)
Other papers reporting aspects of this work are:
Tsien CL, Kohane IS, McIntosh N. Building ICU artifact detection models with
more data in less time. Proc AMIA Symp 2001:706-10.
(http://medg.csail.mit.edu/projects/maita/documents/tsien/D010001045.pdf)
Zhang Y, Tsien CL. Prospective Trials of Intelligent Alarm Algorithms
for Patient Monitoring. Proc AMIA Symp 2001:1068.
(http://medg.csail.mit.edu/projects/maita/documents/tsien/D010001581.pdf)
Tsien CL, Kohane IS, McIntosh N. Multiple signal integration by decision tree
induction to detect artifacts in the neonatal intensive care unit. Artif
Intell Med 2000;19(3):189-202.
(on-line version not currently available)
Tsien CL. Event discovery in medical time-series data. Proc AMIA Symp
2000:858-62.
(http://medg.csail.mit.edu/projects/maita/documents/tsien/D200403.pdf)
- Ying Zhang, as part of her Master's thesis, adapted Dr.
Tsien's techniques to learning in real time. One major advantage of this
enhancement is that a monitor can issue alarms even as it is being trained.
Therefore, it becomes much easier to diagnose false alarms, because they are
announced in real time as data are being collected and human sources are able
to interpret the context in which the algorithm may have been misled. The
thesis is at:
Zhang Y.  "Real-Time Analysis of Physiological Data and Development of Alarm
Algorithms for Patient Monitoring in the Intensive Care Unit." MIT EECS
Master of Engineering Thesis, Aug 2003.
(http://medg.csail.mit.edu/ftp/yzhang/mastersthesis_ying_zhang.pdf)
- Joseph Hastings developed a new model of computer system
behavior that assumes that system calls are generated by a number of Markov
processes that alternately control the system. Building on earlier research
by Dr. Marco Ramoni, he implemented a system that, in real time, learns the
relevant Markov processes and can recognize aberrant behavior that is
inconsistent with the processes so far learned.
Hastings JR. "Incremental Bayesian Segmentation for Intrusion Detection."
[M.Eng.]. Cambridge, MA: MIT; 2003.
(http://medg.csail.mit.edu/ftp/hastings/IBS.pdf)
- Dr. Howard Shrobe developed a technique called
Computational Vulnerabilty Analysis that is useful in deducing that plans that
a potential attacker might use. The attack plans developed by Computational
Vulnerability Analysis can be directly converted into Trend Templates for use
in attack plan recognition. In contract to most other approaches to
vulnerability analysis, this system works from first principles: Given a model
of the computational and networking environment, it generates multi-step
complex plans for compromising specific (or generic) resources in the
environment. It does this by a careful analysis of the "control", "input" and
"output" relationships that exist within individual computer systems and
between systems in a larger networked environment. For example, it reasons
that since the scheduler of a computer system controls the performance of an
application running on that system, it then follows that an attacker
interested in adversely affecting performance might try to gain control of the
scheduler. It similarly reasons that since one way to control any component
is to modify its inputs, then the attacker might try to compromise the
scheduler by compromising its parameter file. Ultimately, any of these attacks
must exploit some vulnerability of the system using a generic attack form
(e.g. exploiting a "bounds check" vulnerability by launching a buffer overflow
attack); thus the technique does not rely on detailed knowledge of any
particular virus or worm and is much more general than would be a catalog of
known exploits.
(http://medg.csail.mit.edu/projects/maita/documents/vulnerability/comp-vulnerability3.pdf)