Consultation, Knowledge Acquisition, and Instruction: A Case Study(1)

Randall Davis

Davis, R.  "Consultation, Knowledge Acquisition, and Instruction: A Case Study."  Chapter 3 in Szolovits, P. (Ed.) Artificial Intelligence in Medicine. Westview Press, Boulder, Colorado.  1982.

Abstract

This paper reviews an approach to the design and construction of a group of programs intended to function as a consultative system on the question of medical diagnosis and therapy selection. It describes the system in terms of the nature of the problems involved in: (a) making decisions, (b) adding new knowledge to the system, and (c) teaching knowledge in the program to students seeking instruction. We describe the factors which make these problems difficult, and consider the design goals that have led to the construction of a system with several novel capabilities. Many of those capabilities result from representing domain specific knowledge in the system in terms of numerous judgmental decision rules. Examples of the system in operation are given to illustrate many of these issues, and performance is compared with previous approaches to automated medical decision making. Finally, we consider the domain independence and generality of the methodology, and consider the potential impact the system may have as a tool for medical decision making.

Introduction

Over the past five years, a group of computer scientists and clinicians at Stanford University has developed a collection of computer programs designed to function as a consultation system for problems of diagnosis and therapy selection.* The system's current domain of application is infectious disease, and it displays an encouraging level of performance in dealing with cases of bacteremia (bacterial infections of the blood) and meningitis. In this paper we examine three of the programs developed by the group (Figure 1). Mycin [19] is the "performance program," its task is to supply consultative advice to a physician seeking assistance on a difficult case. Teiresias [3] explored the problem of knowledge acquisition and functions as the link between an expert and the performance program, to allow the expert to educate the program. Guidon [2] explored issues of computer-aided instruction and functions as a tutor to a student who wants to learn about the task.

System Overview

Fig. 1.  Overview of the System

One interesting point in the architecture is that Teiresias helps to build a base of knowledge which Mycin then uses to solve problems, and which Guidon can teach to the student. The same knowledge base is thus used in three different ways by the three programs. Each program has also been designed to deal with a knowledge base of a certain structure (described below), but none of the programs is designed around knowledge of a particular field. Thus the system above can build, use and teach a base of knowledge about medicine, about auto repair, etc. We discuss below the range and limitation of this domain independence and explore the constraints which arise due to the need to structure the knowledge in a particular form. Because medicine has been the system's original and to date most extensive application, this paper describes it from that perspective.

Mycin

Mycin's fundamental task is to act as a consultant, aiding in determining the identity and significance of organisms causing an infection. and selecting the appropriate drug(s) (if any) for treatment. A typical clinical situation begins with a patient showing signs of infection, and a specimen (of blood, urine, etc.) is obtained and cultured to check for the presence of disease-causing bacteria. While cultures may show some evidence of bacterial growth within twelve hours, typically 24 to 48 hours are required for positive identification of the organisms. Treatment often cannot be delayed that long, so the physician must base his decision on whatever information is available. This typically includes several easily observable characteristics of the bacteria in the culture (e.g., overall shape, response to oxygen, etc.), as well as the past history of the patient (e.g., previous infections, other clinical evidence of infection, or events that may make the patient particularly susceptible to a particular type of bacterium).

Two fundamental characteristics of this information are central to the view of Mycin as a system for medical decision making: the information is both incomplete and inexact. Incompleteness may arise from time constraints (as is the case with organism identity), gaps in the patient's medical records, or simply because it is impractical to administer an exhaustive series of tests before initiating treatment. Inexactness is inherent in the domain because many test results are qualitative (rather than quantitative), and because important factors are often based on subjective impressions (e.g., "has the patient responded to previous therapy?").

A further complication arises from the fact that this inexactness affects not only the data on which decisions are based, but the decision criteria themselves. In common with other domains, there are relatively few statements that can be made with absolute certainty. Instead, the system must have some means of representing the fact that "X and Y seem to suggest Z," or "A and B tend to rule out C." Further sections below will make clear how the system deals with both the incompleteness of information and the two forms of inexactness.

Suitability of the Domain

There are a number of reasons why infectious disease diagnosis and therapy was chosen as an appropriate domain in which to develop the system. First, there is a constant need in the hospital for consultative advice of this sort. Infections are often developed as secondary effects of other events (e.g., surgery, burns, wounds) so the physician caring for the patient may not be an expert in infectious disease.

Second, as numerous recent studies have shown, there is a significant problem of antibiotic misuse. In a recent year, almost one out of every four people in the country was given penicillin, yet almost 90% of those prescriptions were unnecessary [11]. One study [16] examined the use of a common antibiotic (chloramphenicol) in 992 cases, and concluded that virtually all of the drug was prescribed inappropriately. The problem arises from a range of factors, including patient pressure for therapy when none may be necessary, the temptation on the part of the doctor to use a broad spectrum drug in lieu of a more precise diagnosis [14], and the "antibiotic revolution" that started with the discovery of the first natural antibiotics and has led to a bewildering array of ever more new drugs.

Finally, antimicrobial therapy appears to be an especially suitable domain because the components of the decision making process are more readily definable than in many other areas of medicine, and the consequences of the physician's decision can usually be assessed in terms of the direct therapeutic action. One of the most difficult tasks in developing the Mycin program has been the elucidation, testing, and validation of the previously informal decision criteria used by physicians. This task would have been far more difficult in a domain where the results of therapeutic actions were less easily determined.

Previous Approaches to Medical Decision Making

It will be useful to take a brief look at some of the other approaches that have been taken to the problem of medical decision making, to motivate some of the features and capabilities of the Mycin program. Three other approaches have received extensive attention in the literature:

  1. Decision trees--as in [13], in which a sequence of decisions is structured in the form of a tree. Each node represents a particular question, and the answer determines which branch of the tree to follow to get to the next question. Final results are obtained by descending all the way to a leaf of the tree.
  2. Bayesian techniques--as in [23], in which extensive frequency data make it possible to use Bayes' theorem as a basis for diagnosis. 

  3. Decision analysis and utility theory--as in [8], in which there is associated with each piece of information a likely cost of obtaining it, and a measure of the benefit to be derived from having it. Information is requested until the projected cost of asking another question (perhaps requiring another lab test or operative procedure) outweighs the benefit to be obtained (presumably a more precise diagnosis).

Each of these has a number of attractive aspects, but also encounters some limitations which provided the motivation for our work on a rule-based system. Decision trees, for example, offer simple, readily understandable procedures for diagnosing specific ailments. Problems occur, however, if they encounter unexpected data or if test results are unavailable. The representation of knowledge they offer can be somewhat inflexible as well; since the attempt to make changes deep down in the tree often requires consideration of all previous decisions made further up the tree.

The Bayesian technique offers an appealing generality and precision, since it is a domain independent technique based on exact principles. Limitations here arise from the need for extensive amounts of frequency data concerning a priori and conditional probabilities. Where these data exist, the technique can be used quite effectively, but such figures are often unavailable [7].

Techniques based on utility theory can present a well-motivated sequence of questions that appears to "zero in on the underlying ailment. Like the Bayesian approach, however, it requires extensive data on conditional probabilities of symptoms and disease.

Since none of these is intended to be a model of the reasoning process typically employed by clinicians, it can at times prove difficult for a clinician to discover the basis for the conclusions drawn by any of them. While they each present a compact encoding of knowledge that can provide an appealing efficiency to programs based on them, there is an unavoidable loss of comprehensibility to the physician using them. Reasoning which requires several distinct inferential steps by a clinician, for instance, might be expressed in a single value of a conditional probability in the Bayesian method.

Two additional techniques have received some attention lately. One effort [15] is aimed at creating a system with a broad range of diagnostic power, similar to that required of a primary care physician. The approach is based on assembling a large and carefully constructed hierarchy of diseases with their associated symptoms, and uses a technique similar in some respects to the Bayesian method, with physicians' estimates substituted for formal probabilities. This approach is well suited to problems that require dealing with a wide range of disease categories with multiple levels of hierarchy, but is but is less useful for "specialist" programs like ours, which attempt to deal with a single disease category in extensive detail.

A second technique centers around the use of sophisticated models of physiological processes, as in e.g., [20] and [12]. Where the system involved is sufficiently well-understood and isolatable (e.g., the glaucoma model in [12]), this can be a powerful approach. But both of these requirements are lacking in our domain, since infectious disease diagnosis and therapy selection involves a number of different phenomena, many of which are only very imperfectly understood.

Design Goals

The limitations encountered in other approaches, along with our own estimation of the capabilities required for a useful medical decision making system provided a number of design goals for Mycin.

The most fundamental of these, of course, is consistently high performance, and this in turn has a number of implications. It means, for instance, the ability to deal with a large and constantly changing body of technical knowledge. Large amounts of task specific knowledge seem to be required for high performance, and it is not often possible to specify this knowledge in one, or even a small number of attempts. We rely instead on what may be called an incremental approach to competence. This in turn means that the system's collection of medical knowledge (its "knowledge base") is subject to significant changes over time. Each modification must therefore be a reasonable task, or the program will soon begin to stagnate. A flexible knowledge base also means that the system is inherently dynamic in character. It will be easily to modify it to take into account regional variations in practice, new results which arise from progress in medical research, or changes in drug resistance patterns.

The second goal is the ability to handle an interactive dialog. The system should not be a "black box," printing a collection of orders for the user to follow, but should instead be capable of supplying coherent explanations of its results. (This was perhaps the major motivation for the selection of a symbolic reasoning paradigm, rather than one which, for example, relies on statistics.) Giving the program the ability to explain its results offers a number of useful features. It can lead to greater acceptance by the user population, since the system's conclusions need not be accepted blindly. Instead the user can examine and determine the basis for each of them. An explanation facility also makes it possible for an expert in the field to check the validity of the system's reasoning process, and (as we will see), this can be a significant aid in improving the system's performance. Finally, such explanations can also have an educational influence for the user who does not have extensive experience in the domain.

The desire to provide interactive dialogs means that the system will require extensive human engineering features designed to make interaction simple for someone unaccustomed to computers. Examples below will demonstrate that this has motivated a number of features in the system.

Finally, the system has to be able to handle the problem noted earlier, of dealing with both incomplete and inexact information. It must be at least flexible enough to continue to function despite the lack of any particular piece (or pieces) of information, and at best, it might (like people) be able to use the very lack of information as a diagnostic clue.

System Design and Operation

Figures 2 and 3 show the initial and final parts of a sample dialog with Mycin. User type-in is in BOLDFACE CAPITALS; italicized comments are for explanatory purposes and are not part of the actual dialog. A great deal of attention has been paid to making the interaction easy and natural. Questions have been carefully worded, for example, so as to set up expectations of responses without making them as uninteresting as multiple choice questions. Currently the dialog is directed primarily by the system-patient information can be entered only by answering each question in turn.

--------PATIENT-1--------
1)  Patient's name: (first-last)
**  FRED SMITH
2)  Sex:
**  MALE
3)  Age:
** 55
4)  Have you been able to obtain positive cultures from a site 
    at which Fred Smith has an infection?
**  YES
  --------INFECTION-1--------
  5)  What is the infection?
  **  PRIMARY-BACTEREMIA
  6)  Please give the date and approximate time when signs or symptoms
      of the primary-bacteremia (INFECTION-1) first appeared. (mo/da/yr)
  **  MAY 5, 1975
  The most recent positive culture associated with the
  primary-bacteremia (INFECTION-1) will be referred to as:
    --------CULTURE-1--------
    7)  From what site was the specimen for CULTURE-1 taken?
    ** BLODD
    = BLOOD
    8)  Please give the date and time when this blood culture
        (CULTURE-1) was obtained. (mo/da/yr)
    ** MAY 9, 1975
    The first significant organism from this blood culture
    (CULTURE-1) will be referred to as:
      --------ORGANISM-1--------
      9)  Enter the identity of ORGANISM-1:
      **  UNKNOWN
             {typically the identity is not known and must be inferred}
      10)  Is ORGANISM-1 a rod or coccus (etc.):
      **  ROD
      11)  The gram stain of ORGANISM-1:
      **  GRAMNEG
      12)  Did ORGANISM-1 grow aerobically?
      **  YES (.8)
      13)  Is the patient's illness with ORGANISM-1 a hospital-acquired infection?
      **  ?
      Was the infection with ORGANISM-1 acquired while the patient was hospitalized?
      Expected responses are: YES NO
      Enter HELP for list of user options.
      **  Y
                           {more questions follow, to gather sufficient information
                            to infer possible identities of the infecting organisms}

Fig. 2.  Initial part of a consultation.

My therapy recommendation will be based on the following probable 
  infection(s) and potential causative organism(s):
INFECTION-1 is PRIMARY-BACTEREMIA                    {possible identities}
<Item 1> The identity of ORGANISM-1 may be PSEUDOMONAS-AERUGINOSA
<Item 2> The identity of ORGANISM-1 may be KLEBSIELLA-PNEUMONIAE
<Item 3> The identity of ORGANISM-1 may be E.COLI
<Item 4> The identity of ORGANISM-1 may be BACTEROIDES-FRAGILIS
<Item 5> The identity of ORGANISM-1 may be ENTEROBACTER
<Item B> The identity of ORGANISM-1 may be PROTEUS-NON-MIRABILIS
  46)  On a scale of 0 to 4, where higher numbers indicate increasing
       severity how would you rate Fred Smith's degree of sickness?
  **  3
  47)  Does Fred Smith have a clinically significant allergic
       reaction to any antimicrobial agent?
  **  NO
  48)  Fred Smith's weight in kilograms (or <number> POUNDS):
  **  70
                       {note that 2 drugs cover for 6 potential organisms)
[Rec 1] My preferred therapy recommendation is as follows:
  In order to cover for Items <1 2 3 5 6>:
    Give:  GENTAMICIN
    Dose:  119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
    Comments:  Modify dose in renal failure
  In order to cover for Item <4>:
    Give:  CLINDAMYCIN
    Dose:  595 mg (8.5 mg/kg) q6h IV [or IM] for 14 days
    Comments:  If diarrhea or other GI symptoms develop, patient 
      should be evaluated for possible pseudomembranous colitis.

Do you wish to see the next choice therapy?
**  Y                                  {alternative choices are available}
Which, If any, of the above drugs do you wish eliminated from 
  consideration?
**  CLINDAMYCIN
[Rec 2] Next best therapy recommendation:
  In order to cover for Items <2 3 4 5 6>:
    Give:  CHLORAMPHENICOL
    Dose:  1 gm (15 mg/kg) q6h IV for 14 days
    Comments:  Monitor patient's white count 
  In order to cover for Item <1>:
    Give:  GENTAMICIN
    Dose:  119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
    Comments:  Modify dose in renal failure

Fig. 3.  Final segment of a consultation.

There are several options available to make the interaction more informative. The user can, for instance, ask for a rephrasing of the question or a display of some (or all) of the acceptable responses (as in question 15). If a requested item of information is unavailable, he can respond with UNKNOWN (U, or UNK for short). The user can also modify any answer of which he is unsure by attaching a number between 0 and 1, indicating a degree of certainty (as, for example, in question 13). The system also has the ability to correct minor typing mistakes (as in question 7), to insure that unfamiliarity with computer terminals or lack of typing ability does not present a problem.

The Rules and Judgmental Knowledge

The primary source of medical knowledge in the system is a set of some 400 decision rules like the one shown in Figure 4, each with a premise and an action.

If 1) the gram stain of the organism is gram negative, and 
   2) the morphology of the organism is rod and 
   3)the aerobicity of the organism is anaerobic, 
Then there is suggestive evidence (.7) that 
   the identity of the organism is Bacteroides.

Fig. 4. Typical medical decision rule.

Many of the system's unique and important capabilities are made possible by encoding knowledge in rules like the one above. Such rules form modular "chunks" of knowledge about the domain, represented in a form that is comprehensible to a clinician.

The consultation system uses the collection of rules to make conclusions about the patient. If, for instance, it is attempting to determine the identity of an organism responsible for a particular infection, it retrieves the entire list of rules which, like the one in Fig. 4, conclude about identity. It then attempts to ascertain whether the conclusion of the first rule is valid, by evaluating in turn each of the clauses of the premise. Thus, for the example rule shown, the first thing to find out is its gram stain. If this Information is already available in the data base, the program retrieves it. If not, determination of gram stain becomes the objective of a new rule. The program retrieves all rules which conclude about it and tries to use each of them to obtain the value of gram stain. If, after trying all the relevant rules, the answer still has not been discovered, the program asks the user for the relevant information which will permit it to establish the validity of the premise clause. Thus, the rules "unwind" to produce a succession of goals, and it is the attempt to achieve each goal that drives the consultation.

We noted earlier the necessity for dealing with inexact decision criteria, and the number in the conclusion of each rule (called the "certainty factor," or CF) is the mechanism used to capture this inexactness. For the rule in Figure 4, the evidence cited in the premise is strongly indicative (.7 out of 1) of the conclusion, but cannot justify it with absolute certainty.

The methods for combining CFs are embodied in a model of what we might call "approximate implication." While CFs are derived from and are related to probabilities, they are distinctly different (for a detailed review of the concept, see [18]). Evidence confirming an hypothesis is collected separately from that which disconfirms it, and the truth of the hypothesis at any time is the algebraic sum of the current evidence for and against it. This is an important aspect of the truth model, since it makes plausible the simultaneous existence of evidence in favor and against the same hypothesis. We believe this is an important characteristic of any model of inexact reasoning.

Question Answering

The use of a rule-based representation of knowledge makes it possible for Mycin to explain the basis for its clinical recommendations. For example, if the clinician asks "How did you decide that the organism might be an e.coli?" the program answers by displaying the rules which were actually used, and explaining, if requested, how each of the premises of the rules was established. (Several examples are shown below, additional examples can be found in [19] and [17].) This is something which the clinician can readily understand, and it provides a far more comprehensible and acceptable explanation that would be possible if the program were to use a simple statistical approach to diagnosis.

Note that Mycin is capable of answering questions concerning actions the program failed to take (as in question [d] below). For questions like [e] below, the search through the knowledge base is directed by a simple parsing of the question into a request for a set of rules, with constraints on premise and/or action contents. The retrieval of relevant rules is guided primarily by pre-established (but automatically generated) lists which indicate premise and action contents.

(a) ** HOW DID YOU DECIDE THAT ORGANISM-1 MIGHT BE AN E.COLI?
    I used RULEO21 to conclude that the identity of ORGANISM-1 is
      e.coli. This gave a cumulative CF of (.47).
    Then I used RULE084 to conclude that the identity of ORGANISM-1
      is e.coli. This gave a cumulative CF of (.55).
    Finally, I used RULE003 to conclude that the identity of
      ORGANISM-1 is e.coli. This gave a cumulative CF of (.74).
(b) ** DID YOU USE RULE 163 TO FIND OUT ANYTHING ABOUT ORGANISM-1?
    RULE163 was tried in the context of ORGANISM-1, but it failed
      because it is not true that the patient has had a
      genito-urinary tract manipulative procedure (clause 3).
(c) ** DID RULE 116 TELL YOU ANYTHING ABOUT INFECTION-1?
    No. RULE116 would be relevant to the primary-bacteremia when
      applied to ORGANISM-1.
    But clause 2 ("the gram stain of the organism is not known")
      was already known to be false for ORGANISM-1, so the rule
      was never tried.
(d) ** WHY DIDN'T YOU CONSIDER STREPTOCOCCUS AS A POSSIBILITY FOR ORGANISM-1?
    The following rule could have been used to determine that the
      identity of ORGANISM-1 was streptococcus: RULE033
    But clause 2 ("the morphology of the organism is coccus")
      was already known to be false for ORGANISM-1, so the rule
      was never tried.
(e) ** DO YOU RECOMMEND CLINDAMYCIN FOR BACTEROIDES?
    Yes.
    RULE060
    -------
    If the identity of the organism is bacteroides
    Then I recommend therapy be chosen from among the following:
      CHLORAMPHENICOL, CLINDAMYCIN, RIFAMPIN, ERYTHROMYCIN,
      CARBENICILLIN, TETRACYCLINE, LINCOMYCIN
(f) ** IS BLOOD A STERILE SITE?
    Yes, blood is one of those sites that are normally sterile.
(g) ** WHAT ARE THE ORGANISMS ASSOCIATED WITH THE GASTRO-INTESTINAL TRACT?
    The likely pathogens associated with the gastro-intestinal
      tract are  ENTEROCOCCUS, CLOSTRIDIUM-GANGRENE, BACTEROIDES,
      KLEBSIELLA, PSEUDOMONAS, E.COLI, ENTEROBACTER, PROTEUS.

Fig. 5.  Mycin's question-answering capability.

Teiresias

Teiresias is designed to function as a bridge between the expert and Mycin (Figure 6), or more generally, between the expert and the program he is trying to "educate."

wpe3.gif (7369 bytes)

Fig. 6.  TEIRESIAS as a link.

This gives the physician (who presumably knows little or no programming) two important capabilities. lie can (i) interact with Mycin to find out why it is performing as it is (information flow from right to left, explanation), and (ii) can "educate" Mycin by editing or adding to its knowledge base (in formation flow from left to right, knowledge acquisition).

Explanations

Normally during a consultation the system takes the initiative, asking questions of the user. If one of those questions seems inappropriate, however, the user can interrupt and use the explanation facilities provided by Teiresias to find out what's going on. By examining the chain of reasoning that prompted the question, he can find out if the reasoning was motivated by plausible considerations. Several examples are shown below, in a segment extracted from a consultation (additional examples can be found in t4]).

To determine the motivation for any question, the user can ask why it was asked, and can follow the reasoning chain as far as he likes (Figure 7).

18) In this series, how many of the cultures were 
     positive for the e.coli (ORGANISM-1)?          {typical Mycin prompt}
** WHY { request from user to determine motivation for question} 
[i.e. WHY is it important to determine the number of cultures 
   which were positive for the organism?] 
                  {Teiresias's rephrasing of the WHY question, and answer}
[1.0] This will help to determine whether there is significant 
      disease associated with this occurrence of the organism. 
   It has already been established that 
      [1.1] the culture was taken from a sterile source 
      [1.2] the number of cultures in the series is greater than 2 
   Therefore, if 
      [1.3] the number of cultures which were positive for 
            ORGANISM-1 is greater than 2 
   Then there is suggestive evidence (.5) that there is significant 
      disease associated with this occurrence of the e.coli (ORGANISM-1). 
   [RULEO49] 
   
** WHY       {repeated WHY to determine relevance of previous explanation}
[i.e. WHY is it important to determine if there is significant
   disease associated with this occurrence of the organism?] 
[2.0] This will help to determine a diagnosis for which the 
      infection should be treated. 
   It has already been established that 
      [2.1] the infection is not one of: primary-bacteremia, meningitis 
      [2.2] for some current culture of the patient it is true 
            that the site of the culture is blood 
   Therefore, If 
      [2.3] there is significant disease associated with this 
            occurrence of the organism 
   Then it is definite (1.0) that the diagnosis for which the 
     infection should be treated is secondary-bacteremia 
   [RULE1O3]

Fig. 7. Determining the motivation for a question.

Re can also determine the basis for any of the conclusions the system has drawn by inquiring how the conclusion was made (Figure 8).

** HOW 1.1
   { the 1.1 refers to the information in the line labeled [1.1] in Figure 7 }
[i.e. HOW was it determined that the culture was taken from a sterile source?]
[3.1] RULEOO1 was used to conclude that there is suggestive evidence (.7) 
      that the culture was taken from a sterile source.
[3.2] RULEO22 was used to conclude that there is strongly
      suggestive evidence (.7) that the culture was taken 
      from a sterile source.
** HOW 3.1    {another request from the user}
[i.e. HOW was RULEO0i used?]
It has already been established that
   [4.1] the site of the culture Is one of: those sites from
         which the sterility of cultures depends upon the method of collection
   [4.2] the method of collection of the culture is one of: the recognized 
         methods of collection associated with the site of the culture, and
   [4.3] it is not known whether care was taken in collecting the culture
Therefore
   there is strongly suggestive evidence (.8) that the culture was taken 
   from a sterile source
   [RULEO22]

Fig. 8 Determining the basis for a conclusion.

Teiresias's fundamental approach to explanation is thus to display some recap of Mycin's internal actions, a trace of its reasoning. The success of this technique is predicated on the claim that Mycin's basic approach to the problem is sufficiently intuitive that a summary of those actions is at least a reasonable basis from which to start. While it would be difficult to prove the claim in any formal sense, there are several factors which suggest its plausibility.

First, we are dealing with a domain in which deduction, and deduction in the face of uncertainty, is a primary task. The use of production rules in an if/then format seems therefore to be a natural way of expressing things about the domain, and the display of such rules should be comprehensible. Second, the use of such rules in a backward chaining mode is, we claim, a reasonably intuitive scheme. Modus ponens is a well-understood and widely (if not explicitly) used mode of inference. Thus, the general form of the representation and the way it is employed should not be unfamiliar to the average user. More specifically, however, consider the source of the rules. They have been given to us by human experts who were attempting to formalize their own knowledge of the domain. As such, they embody accepted patterns of human reasoning, implying that they should be relatively easy to understand, especially for those familiar with the domain. As such, they will also attack the problem at what has been judged an appropriate level of detail. That is, they will embody the right size of "chunks" of the problem to be comprehensible.

We are not, therefore, recapping the binary bit level operations of the machine instructions for an obscure piece of code. We claim instead to be working with primitives and a methodology whose (a) substance, (b) level of detail, and (c) mechanism are all well suited to the domain, and to human comprehension, precisely because they were provided by human experts. This approach seems to provide what may plausibly be an understandable explanation of system behavior.

By way of contrast, we might try to imagine how a program based on a statistical approach might attempt to explain itself. Such systems can, for instance, display a disease which has been deduced and a list-of relevant symptoms, with prior and posterior probabilities. No more informative detail is available, however. When the list of symptoms is long, it may not be clear how each of them (or some combination of them) contributed to the conclusion. It is more difficult to imagine what sort of explanation could be provided if the program were interrupted with interim queries while in the process of computing probabilities. The problem, of course, is that statistical methods are not good models of the actual reasoning process (as shown in psychological experiments of [7] and [21]), nor were they designed to be. While they are operationally effective when extensive data concerning disease incidence are available, they are also for the most part, "shallow," one-step techniques which capture little of the ongoing process actually used by expert problem solvers in the domain.*

The presence of even the current basic explanation capabilities is extremely useful, as they have begun to pass the most fundamental test: it has become easier to ask Teiresias what Mycin did than to trace through the code by hand. The continued development and generalization of these capabilities is one focus of present research.

Knowledge Acquisition

Since the field of infectious disease therapy is both large and constantly changing, it was apparent from the outset that we would have to deal with an evolving knowledge base. The size - of the domain makes writing a complete set of rules in a single try an impossible task, so the system was designed to facilitate an incremental approach to competence. In addition, new research in the domain produces new results and modifications of old principles, so that a broad scope of knowledge-base management capabilities are clearly necessary. The approach to knowledge acquisition used in Teiresias is modeled after a standard tutorial in which a student is given a difficult problem to solve, while the teacher observes and occasionally corrects the student's performance. An underlying principle is that the physician's task of formulating rules- will be easier if set in the context of a specific shortcoming in the program's knowledge base. That is, some error on the system's part will become apparent during the consultation, perhaps through an incorrect organism identification or therapy selection. Tracking down this error by tracing back through the program's actions is a reasonably straightforward process which presents the expert with a methodical and complete review of the system's reasoning. He is obligated to either approve of each step or to correct it. This means that the expert is faced with a sharply focused task of adding a chunk of knowledge to remedy a specific bug. This makes it far easier for him to formalize his knowledge than would be the case if he were asked, for example, "tell me about bacteremia."

Since knowledge acquisition dialogs are somewhat extended they are not included here for the sake of brevity. (The interested reader may consult [5] for a detailed review and discussion.) In addition to the basic idea of knowledge acquisition in context, Teiresias relies on the notion of model-based understanding in interpreting the physician's new rule. This is implemented using a mechanism called "rule models" which give Teiresias a prediction of the content of the information it can expect to receive. It uses this both to help interpret the new rule and as a basis for suggesting possible revisions to the rule.

Guidon

Guidon is designed to function as a bridge between the performance program and a student seeking instruction (Figure 9).

wpe27.gif (3996 bytes)

Fig. 9. Guidon as a link between performance program and subject.

Guidon's basic approach is to present the student with a case, supply additional information about the case when requested, and, by keeping track of the information requested, infer the problem solving approach being used by the student. If this approach appears inappropriate, Guidon may interrupt to redirect the student's attention or perhaps teach him an appropriate rule from the knowledge base.

To give some idea about the nature of the tutoring problem and what is difficult about it, we consider several brief dialog segments. Each is intended to illustrate one particular issue which was perceived as central to the problem. (Each is also extracted from a longer dialog; the complete dialog is somewhat more immediately comprehensible than the segments shown here.)

One important issue is that the program should provide assistance in context. For example, the tutor's guidance should be based upon the student's partial solution so far. In general, this is a difficult problem because it requires that the tutor be sensitive to the student's current problem-solving strategy, the kind of advice he prefers (a hint? full details?), and be able to specify clearly a problem-solving method that might be applied.

In the example of Figure 10, Guidon provides assistance by bringing up for discussion a rule (rule 507) which is relevant to the evidence discussed so far and which was alluded to in an earlier hint. Since the rule has only a single precondition, and one which can be inferred directly from known evidence (using rule 509), the rule is given directly rather than opened up for discussion using some of the more elaborate techniques Guidon has available (see [2]).

** HINT
Hint: Look for strong, positive evidence that Diplococcus-pneumoniae
   is one of the organisms that might be causing the infection.

...
** HELP
Let's consider whether J. Smith has a head injury.

We already know that the patient has not had an injury to the
  central nervous system; this is evidence that he does not have
  a head injury [RULE509].

We now have strong evidence that Diplococcus-pneumoniae and
  Neisseria-meningitidis are organisms that might be causing the
  infection (considering the age of J. Smith and the fact that
  he does not have a head injury defect [RULE5O7]).

It remains for us to consider other factors for determining the
  organisms that might be causing the infection.

Fig. 10. Providing assistance in concert.

A second principle suggests that the program should provide some feedback that enables the student himself to evaluate his competence. Here the tutor must be able to account for the student's behavior in terms of problem-solving methods and strategies. This may involve interpreting wrong but consistent behavior in terms of misconceptions. In correcting the student, the tutor may need to refer back to past student actions and compare alternative approaches. Providing assistance in a mixed-initiative exchange poses technical problems in dialog management, such as maintaining context while sharing initiative.

In this excerpt in Figure 11, the student's hypothesis for a subproblem is correct, but after analyzing the previous history of interaction with the student, Guidon is not sure that the student really knows why this hypothesis is correct. Therefore, instead of accepting this answer, the program requests relevant factors to support the hypothesis, and probes further to see if the student knows why an alternative hypothesis is discredited.

What is the type of the infection?
** BACTERIAL
What facts about this case tell you that the type of the infection
  is bacterial?
** CEREBRAL-FLUID GLUCOSE VALUE
** PMNS
The percent of PMN's in the CBC is not relevant for determining the
  type of the infection.
** PMNS IN THE CSF
** PROTEIN OF THE CSF
**
You're right about the CSF protein, the percent of PMNs in the CSF
  and the CSF glucose value, however you should have mentioned the
  fact that the meningitis is partially treated, the WBC from the
  CSF and the fact that the simultaneous blood glucose value is not
  known.

Fig. 11. Providing feedback.

A third principle indicates that the program should make tutorial presentations for the purpose of introducing new material to the student. In a knowledge-based tutor, presentations are generated solely from the knowledge base of rules and facts. This requires that the tutor have presentation methods that adapt material to the needs of the dialog In particular, the tutor has t( be sensitive to how a tutorial dialog fits together, what kinds of interruptions and probing are reasonable and expected in this kind of discourse.

The following example (Figure 12) is taken from a tutorial that uses a knowledge base dealing with the problem of pulmonary function analysis. Here Guidon first interrupts the student's data collection to suggest that he make an hypothesis; but the student refuses to do so. The program then observes (a) there is a particular rule (rule 20) that can be applied that is probably known to the student, and (b) the student probably can't apply the rule to this case because he doesn't know how to verify a necessary precondition. Guidon presents the inference that it believes is unknown to the student (rule 28), and then asks him to take this evidence forward.

** TOPIC
We are discussing the degree of reversibility of airway
  obstruction of M. Smith.  Hypotheses you should be considering
  are: none, insignificant, slight, good and significant.
** WHAT IS THE CHANGE IN FORCED EXPIRATORY VOLUME?
The change in forced expiratory volume is 31.
** WHAT IS THE CHANGE IN RESISTANCE?
The change in Resistance is 20.  We have substantially completed
  our discussion of the degree of reversibility of airway
  obstruction in this patient.
Would you like to make a hypothesis now (as opposed to gathering
  more information)?
** NO
We will now consider evidence that leads to a conclusion about the
  degree of reversibility of airway obstruction in this patient.
Here is a conclusion that you may not have considered....
The fact that the change in forced expiatory volume and the change
  in resistance are known is evidence that there are post
  bronchodilation test results [RULE028].
Given the post bronchodilation test results, does this tell you
  that the degree of reversibility of airway obstruction is good?
** YES
No, given this fact, we can conclude that the degree of
  reversibility of airway obstruction is significant [RULE020].
The conclusion in the question holds when the change in forced
  expiatory volume is between 5 and 10.

Fig. 12. Making tutorial presentations.

Finally, the program should give the student freedom to explore Mycin's reasoning. In laying out that reasoning, the program should not act like a passive information retrieval system. In addition to clearly laying out data and inferences, the tutor has to reason about what constitutes reasonable, expected elaboration on the basis of what has been previously discussed.

In the excerpt in Figure 13, Guidon provides details for an inference (rule 148) by offering to support necessary preconditions that were not considered in the dialog up to this point, though they can be inferred from known data.

Summary of evidence that the type of the infection is
  bacterial (.98) viral (-.75) fungal (-.83) tb (-.83):
29a.  The fact that the meningitis is partially treated (.95) is
  evidence that the type of the infection is bacterial (.19)
  [RULE148]
29b.  The WBC from the CSF and the percent of PMNs in the CSF is
  evidence that the type of the infection is bacterial (.9)
  viral (-.6) fungal ...
** DETAILS 29A
For which of the following factors would you like to see a review
  of the evidence? (select from the following):
    1)  the infection which requires therapy
    2)  the fact that the meningitis is partially treated (.95)
** 2
Summary of evidence that the meningitis is partially treated (.95):
32a.  The nature of the infection, the fact that organisms were not
  seen on the stain of the pending csf culture, and the time since
  therapy with the cephalothin was started. are all evidence that
  the meningitis is partially treated (.95) [RULE145]
Do you want to see RULE148?
** NO

Fig. 13. Allowing the student to explore.

Domain Independence

The fundamental design and implementation these programs does not restrict their use to medical domains. This is due in part to. the widespread applicability of the concept of diagnosis and therapy as a problem solving method. Many problems that can be viewed as the discovery and correction of "errors" can be viewed as diagnosis and therapy, whether the domain is medicine, repair of machines, or program debugging. This means that the fundamental approach to problem solving is potentially widely applicable.

In addition, the implementation of all of the programs was kept strongly domain independent from the outset, primarily to permit extension to other areas of infectious disease. The current knowledge base can easily be augmented or removed entirely and replaced with another.

It has proven possible, for instance, to build additional knowledge bases for such disparate fields as chemotherapy for psychiatry and auto mechanics. In one of the first such efforts [22], a small part of an auto repair manual was rewritten as production rules, and inserted in place of the bacteremia knowledge base. What resulted was a very simple but folly functional consultant capable of diagnosing and curing problems in a part of an auto electrical system. More recently, a pilot system for psychiatric diagnosis is being assembled. While it has currently only some 5O rules, it is fully functional, and displays primitive but encouraging performance. In both systems, much of the established explanation facilities work as designed, without modification.

Finally, the basic methodology developed in designing and implementing the programs has provided a basis for a number of other Systems. The work described in [10], for instance, deals with a system oriented toward repair of electro-mechanical devices, while [1] describes a system for the creation of intelligent terminals. Both of these share significant points of methodology with those described here.

There are, naturally, some domains that might be less profitable to explore. One of the interesting lessons of the auto repair system was that domains with little inexactness in the reasoning process--those for which algorithmic diagnostic routines can be written--are not particularly appropriate for this methodology. The precision in these domains means that little use is made of the certainty factor mechanism, and many of Mycin's more complicated (and computationally expensive) features go unused.

Nor is it reasonable to expect to be able to write rules for an arbitrary domain. As knowledge in an area accumulates, it becomes progressively more formalized. There is a certain stage of this formalization process when it is appropriate to use rules of the sort shown above. Earlier than this the knowledge is too unstructured, later on it may (like the auto repair system) be more effective to write straightforward algorithms.

It is also possible that the knowledge in some domains is inherently unsuited to a rule-like representation, since rules become increasingly awkward as the number of premise clauses increases. Dealing with a number of interacting factors may be difficult for any representation, but given the reliance here on rules as a medium of communication of knowledge, the problem becomes especially significant.

Impact As a Medical Decision Making System

We expect that the primary long term impact of this system will be in providing consultative support in primary health care centers where such expertise is currently in short supply. While such a step is still a long way off, with a sufficiently large knowledge base it may prove possible to provide near-human level performance. With the advent of phone line networks for communication to computers, this service can be made available to hospitals and clinics in rural areas that are typically undersupplied with expert medical care. Given the flexibility of the system, modifications might be introduced to deal with regional variations in accepted medical practice, or seasonal variations in infection susceptibility.

More generally, we believe that our basic methodology for handling decision making is capable of offering assistance in a wide a range of domains. Its fundamental power lies in the ability to deal with a certain sort of complexity. To see this, consider two of the factors that may make decision making difficult: interconnectedness and size. Problems whose sub-parts are tightly interconnected are difficult because they are not decomposable: connections between sub-parts are sufficiently rich that no small number of parts can be considered independently. The difficulty here lies In trying to keep track of too many interrelated items at once.

Even where problems can be decomposed and sub-problems solved separately, difficulty can arise because there is simply too much to do. The decision making capabilities discussed are currently well suited to problems of this character. Given a sufficiently large and diverse collection of rules, the system can deal with all necessary factors. Its primary virtue then would lie in its exhaustive consideration of all factors, and an encyclopedic approach to the task. The presence of the explanation and question answering facilities means that the system has also the ability to explain its conclusions to many levels of detail, and may thus become a useful tool in the attempt to understand the problem. Finally, the attempt to write a computer program to perform a complex task often highlights (in intense detail) exactly how imprecise is our understanding of the processes by which people perform the task. More positively stated, the computer provides an excellent environment In which to elucidate and verify the sources of knowledge required. This has been our experience with these three programs; an interesting side effect of the efforts to create the system has been the progressive formalization of knowledge about infectious disease diagnosis and therapy. We speculate that this may be one of the most enduring effects of the system when it is applied to other domains, providing experts in other fields with a medium in which to formalize and test the principles on which their decisions are made.

References

1. Anderson, R. H. and Gillogly, J. J., RAND intelligent terminals agent: design philosophy, RAND R-1809-ARPA, The RAND Corporation, Santa Monica, CA., (1975). 

2 Clancey, W., Transfer of rule-based expertise through a tutorial dialogue, Ph.D. thesis, Dept of Computer Science, Stanford University, (1979). 

3. Davis, R., Applications of meta-level knowledge to the construction, maintenance and use of large knowledge bases, HPP Memo 76-7, Stanford University, (1976). 

4. Davis, R., Buchanan, B. G. and Shortliffe, E. H., "Production rules as representation for a knowledge-based consultation program," Artificial Intelligence (February, 1977), 1545. 

5. Davis, R., "Interactive transfer of expertise: acquisition of new Inference rules," Artificial Intelligence 12, (1979), 121-157. 

6. Edwards, R., "Conservatism in human Information processing," in Kleinmuntz (Ed.), Formal Representation of Human Judgment, Wiley, (1968), 17-52. 

7. Edwards, W., "N = 1, diagnosis In unique cases," In Jacquez, (Ed.), Computer Diagnosis and Diagnostic Methods. C. C. Thomas, Springfield, Illinois, (1972), 139-151. 

8. Gorry, G. A. and Barnett, G. 0., "Experience with a model of sequential diagnosis," Computers and Biomedical Research 1, (1968), 490-5O7. 

9. Gorry, G. A., Kassirer, J. P., Essig, A. and Schwartz, W. B., "Decision analysis as the basis for computer-aided management of acute renal failure," American Journal of Medicine 55, (1973), 473-484. 

10. Hart, P. F., "Progress on a computer-based consultant," Proc. 4th IJCAI, available from MIT AI Lab, Cambridge, MA, (August 1975), 831-841. 

11. Kagan, B. M., Fannin, S. L. and Bardie, F., "Spotlight on antimicrobial agents--1973," JAMA 226, (3) (October 1973), 306-310. 

12. Weiss, S., Kulikowski, C. A., Amarel, S. and Safir, A., "A Model-based method for computer-aided medical decision-making," Artificial Intelligence (August 1978), 145-172. 

13. Meyer, A. V. and Weissman. W. K., "Computer analysis of the clinical neurological exam," Computers and Biomedical Research 3, (1973), 111-117. 

14. Neu, H. C. and Howry, S. P., "Testing the physician's knowledge of antibiotic use," NEJM 293, (18 December 1975), 1291-5. 

15. Pople, H., "The formation of composite hypotheses," Proc. 5th IJCAI, (August 1977), 1030-1037. 

16. Ray, W. A., Federspiel, C. F. and Schaffner, W., "Prescribing of chloramphenicol in ambulatory practice," Ann Internal Med 84, (March 1976), 866-870. 

17. Scott, A. C., Clancey, W., Davis. R. and Shortliffe, E. H., "Explanation capabilities of knowledge based production systems," American Journal of Computational Linguistics microfiche 42, (1976). 

18. Shortliffe, F. H., Buchanan, B. G., "A model of Inexact reasoning In medicine," Mathematical Biosciences 23, (1975), 351-379. 

19. Shortliffe, F. H., Computer-Based Medical Consultations: MYCIN, American Elsevier, (1976). 

20. Silverman, H., A digitalis therapy advisor, MAC TR-143, Project MAC, Mass. Inst. Tech., (January 1975). 

21. Tversky, A. and Kahneman, D., "Judgment under uncertainty: heuristics and biases," Science 185, (18 September 1974), 1129-1131. 

22. van Melle, W., Would you like advice on another horn, MYCIN project Internal working paper, Stanford University, (1974). 

23. Warner, H. R., Toronto, A. F. and Veasy, L G., "Experience with Bayes' theorem for computer diagnosis of congenital heart disease," Annals New York Academy of Science 115, (1964), 558-567.

Notes

(1) This paper describes work done while the author was in the Computer Science Department, Stanford University, Stanford, CA. The work was supported in part by the Bureau of Health Sciences Research and Evaluation, under HEW Grant HS-01544, and by the National Science Foundation under contract MCS 77-02712. Support for some of the basic research underlying this work was provided by ARPA, under ARPA Order 2393. The work was done on the SUMEX-AIM Computer System at Stanford; the system is supported by the NIH under grant RR-00785.

(2)The system was developed by a group that has included: Stanley Cohen, Stanton Axline, Jan Aikins, Bob Blum, Bruce Buchanan, Bill Clancey, Randall Davis, Larry Fagan, Frank Rhame, Carlisle Scott, Ted Shortliffe, Bill van Melle, Sharon Wraith, and Victor Yu. The name Mycin is derived from the suffix common to many antibiotic drug names; Teiresias comes from the name of the prophet in Oedipus the King; Guidon is a French word meaning "handlebars."

(3) However, the reasoning process of human experts may not be the ideal model for all knowledge-based problem solving systems. In the presence of reliable statistical data. programs using a decision theoretic approach are capable of performance surpassing those of their human counterparts. In domains like infectious disease therapy selection, however. which are characterized by "judgmental knowledge," statistical approaches may not be viable. This appears to be the case for many medical decision making areas. See [9] for further discussion of this point


This is part of a Web-based reconstruction of the book originally published as
   Szolovits, P. (Ed.).  Artificial Intelligence in Medicine. Westview Press, Boulder, Colorado. 1982.
The text was scanned, OCR'd, and re-set in HTML by Peter Szolovits in 2000.