Heuristic Methods for Imposing Structure on Ill-Structured Problems: The Structuring of Medical Diagnostics(1)

Harry E. Pople, Jr.

Pople, H. E., Jr. "Heuristic Methods for Imposing Structure on Ill-Structured Problems: The Structuring of Medical Diagnostics."  Chapter 5 in Szolovits, P. (Ed.) Artificial Intelligence in Medicine. Westview Press, Boulder, Colorado.  1982.

Introduction

This chapter presents a conceptual framework for characterizing the essential nature of the diagnostic reasoning process. It was motivated in part by a criticism voiced recently by Allen Newell (in a personal communication) concerning the failure of those of us engaged in AIM (Artificial Intelligence in Medicine) research to articulate clearly exactly what it is that constitutes the task domain of medical diagnosis. This is a question that many of us have puzzled over for some time, and I have come to believe that part of the difficulty in trying to define this task domain in terms that would be generally accepted is that the problem of medical diagnosis is fundamentally ill structured.

One of my objectives in this chapter will be to make the case for considering this an ill structured problem domain, and to assess the consequences of this characterization. In short, I believe that this observation goes a long way towards explaining why the medical profession has been slow to adopt the offerings of diagnostic modelers that are based on assumptions of well structuredness of the task environment. Also, I believe that this provides a useful perspective from which to examine what is truly problematic about what the physician does in diagnosis, and consequently where computer aided tools might prove to be most helpful.

This description draws mainly from the investigation carried out at the University of Pittsburgh, which has led to development of the INTERNIST diagnostic program for internal medicine, though other recent studies of the medical reasoning process lend additional support to this descriptive framework.

This chapter first addresses the nature of the clinical reasoning process and then offers a description, critique, and a status report of the INTERNIST(2) development.

The Task Domain of Medical Diagnosis

The Role of "Differential Diagnoses" in Structuring Clinical Decisions

Designers of computer-based diagnosis programs often view the physician's primary decision-making task as one of 'differential diagnosis." This term refers to a type of analytical task wherein the decision maker is confronted with a fixed set of diagnostic alternatives. His job is to determine whether sufficient data are available to make a decision among elements of this set and if not, to obtain whatever additional data may be required to make a decision.

Over the past two decades, a large number of specialized procedures have been developed to assist the physician in the differential diagnosis of a variety of well defined clinical problems. These have been extensively reported in the medical and computing literature.(3) In addition, algorithms to deal with a host of common medical problems, expressed by means of detailed flowcharts, have increasingly found their way into the leading textbooks of medicine.

Many different techniques have been used in structuring these clinical algorithms. In some cases, special programs have been formulated to capture the logic involved in the workup of particular classes of clinical problems. In other cases, generalized procedures have been adopted that are tailored to a particular application by specification of certain parameters; for example, many diagnostic programs have been developed to use the normative models of statistical decision theory. 

Evaluative studies frequently show that these programs, whatever their basis, generally perform as well as experienced clinicians in their respective domains, and somewhat better than the non-specialist.(4) It is interesting, therefore, to speculate on the reason that such programs have not had greater impact on the practice of medicine.

Resistance in the medical community is sometimes attributed to the natural conservatism of physicians, or to their sense of being threatened by the prospect of  replacement by machine. Some have argued that this can be resolved only on the basis of education and training, and that the next generation will be more comfortable with computer-based decision aids as these become routinely introduced into the medical school curriculum. 

A very different explanation of the failure of acceptance of these systems has been voiced by Alvin Feinstein, who argues rather forcefully that the real reason that physicians have not adopted computer-based decision aids is that these have often been based on unrealistic models, which fail to deal with the physicians' real problems. In [7, p.495], Feinstein delivers a harsh indictment, which I think strikes uncomfortably close to the mark:

Iatromathematical enthusiasts could make substantial contributions to clinical medicine if the efforts now being expended on Bayesian and decision-analytic fantasies were directed to the major challenges of algorithmically dissecting clinical judgment, based on the way the judgments are actually performed. Instead, however, the enthusiasts usually become infatuated with the mathematical processes and with the associated potential for computer manipulations, so that the basic clinical challenges become neglected or evaded....

Perhaps the most shocking aspect of all this ethereal model-building is that the model-builders (and the editors of influential medical journals) have begun to confuse the difference between clinical reality and abstract imagery. In the customary standards of science, a model that failed to fit reality was rejected and the model-builder was sent back to the drawing board. Today, however, in the Procrustean era made possible by confusion about modern technology, a model-builder can avoid the constraints of reality and demand, instead, that reality adapt itself to fit the model.

I believe that the real issue is that the models underlying use of most diagnostic programs require that the physician know--a priori--which is the correct differential diagnosis. While this is often the case, such well structured tasks generally do not constitute situations in which the physician requires diagnostic consultative assistance. The cases where expert assistance is really needed are those that entail diagnostic quandaries, where the physician is unsure as to the structure of his diagnostic problem.

There are many reasons why the physician might experience difficulty in formulating an appropriate differential diagnosis. It may be that the case involves a rare disease or unusual presentation. Often, such difficulties arise in clinical problems where two or more disease processes are at work, generating a plethora of abnormal findings that can be interpreted in a great variety of ways.

It might be argued that patients rarely have two or more significant diseases to be diagnosed. However, even in the case of an uncomplicated instance of disease, there are invariably some discrepant findings that arise due to misrepresentation by the patient or his family, the residual effects of a prior disease episode, or the occurrence of false positive laboratory results. Not uncommonly, apparent discrepancies can be attributed to an incomplete knowledge of the disease on the part of the physician and/or the biomedical community. Thus the process of sorting patient data into even two partitions (the relevant and the "red herrings") is often difficult for the physician. Where there is the potential for multiple concurrent diseases, this partitioning becomes ever more difficult.

Although these ubiquitous sources of error often make it problematic to determine whether or not one is actually working on the right problem, studies of human cognition have often revealed a strong bias towards the maintenance of initial problem formulations, even in some cases in the face of overwhelming evidence to the contrary. Thus it is not surprising that even skilled physicians sometimes need assistance in drawing out the full range of alternative formulations of a diagnostic problem.

What renders current computer-based consultation programs less than useful to the medical community is their failure to deal with this creative aspect of the "art" of medical reasoning.

On Modeling the "Art" of Diagnostic Reasoning

Physicians often refer to their clinical decision making process as more art than science, and suggest that while computers might be programmed to deal with the scientific, analytical aspect of their work, they will never be able to capture the "art" of a skilled clinician.

The question of what it is that distinguishes the expertise of an accomplished clinician has been investigated by a variety of techniques both in our laboratory and elsewhere. One nearly universal finding is that the physician responds to cues in the clinical data by conceptualizing one or more diagnostic tasks which then play an important role in the subsequent decision making process. This conceptualization then governs to some extent the acquisition of additional data and the range of alternatives considered in the eventual diagnostic decision making process. One mark of an expert is his ability to formulate particularly appropriate differential diagnostic tasks on the basis of sometimes subtle hints in the patient record.

Recent studies of clinical cognition shed some light on the role of problem formulation in medical reasoning. While these studies typically show that physicians employ strategies of information gathering that are organized in response to a set of well structured differential diagnostic tasks, these analyses also show that physicians generate tasks early in the patient encounter, long before definitive data are available to ensure that the hypothesized task is indeed an appropriate one to pursue in the case at hand. It appears that the cognitive benefits of having a concrete task as a focus for the patient workup significantly outweigh the occasional need to reconceptualize, discarding one line of attack and invoking another.

In their analysis of the tape recorded behavior of experienced physicians engaged in "taking the present illness" from a simulated patient, Kassirer and Gorry [8, pp. 247-248] have observed:

The most striking aspect of the history-taking process revealed by the protocols is the sharp focus of the clinicians problem-solving behavior. The subjects generated one or more working hypotheses early in the history-taking process when relatively few facts were known about the patient. ... The process of hypothesis activation dominated the early part of the diagnostic session as the physicians searched for a context in which to proceed. ... Identifying a context through the process of hypothesis activation appears to be one of the critical features of the diagnostic process. When a hypothesis was activated, the physician appeared to use this context--that is, his concept of a disease, a state, or a complication as a model with which to evaluate new data from the patient Such a model provides a basis for expectation; it identifies the relevant clinical features that should prove fruitful for farther investigation. 

Similar patterns of behavior have also been discovered by Elstein et al. [4, pp. 113-114] in their study of medical problem solving. Quoting from the summary of their observations of physicians engaged in the analysis of clinical problems:

Diagnostic problems are solved through a process of hypothesis generation and verification. Hypotheses are consistently generated early in a workup when only a very limited data base has been obtained. While any early formulation may be revised or discarded if subsequent data fail to confirm it, there is a high probability that at least some of the formulations of experienced physicians will be correct. Hypotheses serve as organizing rubrics in working memory. They help to overcome limitations of memory capacity and serve to narrow the size of the problem space that must be searched for solution. Since it would be impossible to conduct an efficient inquiry without some hypothetical goal that would tell the inquirer when to stop, hypotheses serve to transform an open medical problem (What is the patient's illness?) into a set of closed problems that are much easier to solve (Is the illness X? or Y? or Z?)

In explaining the observed phenomena, the authors theorize: (p.176)

The early generation of diagnostic problem formulations would appear to be a major strategy that is used by physicians to determine the regions of the potential problem space that are most likely to yield a solution. In sum, we may propose that a set of problem formulations defines the dimensions of the functional problem space in which a physician's search for a diagnosis is conducted.

Context-Dependent Strategies for Clinical Decision Making

The physician's concept of a diagnostic task establishes a context for a number of clinical decisions. For example, if he believes that the diagnostic task consists of selecting one of (X? or Y? or Z?) and he is able to rule out X and Y as possibilities, then (if his premises are correct) it follows that Z must be the case. This form of reasoning-by a process of elimination-is a powerful decision making strategy. By bounding the set of alternatives, then invoking procedures to gather additional information in order to eliminate candidates from the postulated decision set it is often possible to reach a positive conclusion concerning the remaining alternative without the need for definitive confirmatory evidence. This can be important to the patient, particularly when the clinching data would be costly to obtain or entail unnecessary risk of complications.

In some cases, the context of questioning can even turn otherwise nonspecific findings into information items of diagnostic importance. For example, a finding of hepatitis B surface antigen in the blood does not ordinarily carry with it an implication of disease, because a fair portion of the population are unaffected carriers of the virus. In a setting of significant liver disease, however, where the physician's differential diagnosis encompasses various forms of inflammatory liver involvement, the finding of hepatitis B antigen might very well be considered definitive evidence as to the etiology of the illness. Similarly, in any task setting where there are but two alternatives (X? or Y?) if there is some abnormal finding that is a sine qua non for X which is found to be normal in the patient, this normal finding becomes diagnostic for concluding Y. Such reasoning is often at work in deciding between organic and functional (or psychosomatic) etiologies of an illness.

Perhaps the most important consequence of the physician's conceptualization of a diagnostic problem is the role that such a problem solving context plays in determining the strategy to be used for information acquisition: What questions should be asked of the patient and his family? What tests should be run? What therapeutic trials might provide useful information? What invasive procedures are absolutely necessary because of the importance of the data that only they can reveal? What data can safely be ignored?

Often, once the right questions have been asked and the right data obtained, the diagnostic decision becomes obvious to even the least experienced medical trainee. What distinguishes an expert is his ability to sense important omissions in the data that can often be filled in simply by asking the right questions. (The value of an item of information is not always well correlated with the cost of obtaining it.) When he does seek information that entails significant cost or danger to the patient, the expert clinician often has a good idea what the outcome will be, and is mainly seeking confirmation of his diagnosis. The real challenge in dealing with difficult medical problems is to identify and elicit those few critical findings that will clinch the case.

In considering the value of information, the physician often is concerned with an additional dimension, namely the reliability of the source. To what extent, for example, should a physician accept a patient's recollection of hospitalization twenty years previously, as opposed to a careful review of the day-to-day patient record that now resides in some musty archive. Clearly, the patient's recollection can be faulty and based on misconceptions: yet, unless there is reason to question the validity of such "hearsay" testimony, the physician dealing with the present illness rarely finds it necessary to go back to the source.

What determines a physician's value judgment concerning an item of information is its potential to influence the outcome of some decision problem that the physician has determined to be appropriate in the context of the case. If newly acquired data "fit" with the physician's expectations, they tend to be accepted without much scrutiny. If they fail to fit, however, they become subjected to a variety of validation and rationalization procedures. Confirmation will be sought via redundant procedures or a re-examination of original sources. If validated. discrepant findings may be attributed to yet unformulated additional problems not relevant to the diagnostic task at hand, or summarily dismissed as "red herrings." In many cases, underlying physiological and anatomical models will be consulted in an attempt to extend the scope of acceptable expectations. When confronted by a particularly baffling array of what appears to be contradictory data, the physician will sometimes undergo what might be characterized as a "paradigm shift" [9], which results in a reformulation of the diagnostic problem in what might be radically different terms.(5)

If, as most studies of clinical cognition seem to suggest, physicians make decisions concerning the acquisition of data and thereby determine diagnostic judgments on the basis of hypothesized problem contexts, it is clear that a potential exists for serious error due to the influence of the rationalization processes described above. For such context dependent decision strategies as the "process of elimination" to be effective, the physician's concept of the diagnostic task must either be correct at the outset or capable of transformation into the right task definition as additional evidence is developed and the case unfolds. Otherwise, physicians misled by inappropriate problem formulations may screen out and dismiss what might have been seen to be extremely important findings if viewed from another perspective; more likely, they are apt to neglect to inquire about those items of history and physical exam that might have corrected the context, if they were known.

Consultation with other specialists when evaluating difficult clinical problems helps to restore a breadth of perspective to ensure that reasonable alternatives to the primary physician's conceptualization of the problem will not be overlooked. Such formative support is not provided by most existing diagnostic consultation programs, which-as noted earlier-deal primarily with the process of decision making in well structured situations where the differential diagnosis is given a priori. These programs provide little if any assistance with respect to the more challenging business of formulating the differential diagnostic tasks that constitute the clinician's decision making context.

On the Structure of Ill Structured Problems

In discussing "The Structure of Ill Structured Problems" [20], Herbert Simon considers a similar claim often made by critics of artificial intelligence research, namely that the designer of a program that performs some task in a seemingly intelligent manner actually carries out the more difficult part of problem solving while wresting the problem from the realm of ISP's (ill structured problems) to that of WSP's (well structured problems). Simon writes: (op. cit., p.186)

A standard posture in artificial intelligence work, and in theorizing in this field, has been to consider only the idealized problems, and to leave the quality of the approximation and the processes for formulating that approximation to informal discussion outside the scopes both of the theory and of the problem solving programs.

He defends this strategy as being "common to many fields of intellectual endeavor," but acknowledges:

It encourages allegations that the "real" problem solving activity occurs while providing a problem with structure, and not after the problem has been formulated as a WSP. 

After pointing out that even within a well structured task environment, problem solvers can be repeatedly challenged by a sequence of tasks that do not require any reformulation of the problem space, Simon concludes:

Nevertheless, there is merit to the claim that much problem solving effort is directed at structuring problems, and only a fraction of it at solving problems once they are structured.

To show that the process of ill structured problem solving is not totally beyond the methods of artificial intelligence, Simon then proceeds to consider a number of problems that would generally be regarded as ill structured (e.g., composing a fugue and designing a house) and suggests a model that might be used to describe the reasoning process involved.

I should like to quote at some length from what Simon has to say concerning this class of ill structured problems in order to highlight what I find to be a compelling analogy between his characterization of the creative reasoning process of the artist, and what I understand of the reasoning process of the diagnostician. In both cases, the decision maker is engaged in an underconstrained endeavor that often requires the imposition of plausible (though not necessarily essential) constraints so that well understood decision strategies might be applied. The difficulty of starting off with a blank sheet of paper is well appreciated in all such endeavors.

Concerning the reasoning process of the architect in designing a house, Simon writes: (op. cit., p.187-188)

It will generally be agreed that the work of an architect--in designing a house, say--presents tasks that lie well toward the ill structured end of the problem continuum. Of course this is only true if the architect is trying to be "creative"--if he does not begin the task by taking off his shelf one of a set of standard house designs that he keeps there.

The design task (with this proviso) is ill structured in a number of respects. There is initially no definite criterion to test a proposed solution, much less a mechanizable process to apply the criterion. The problem space is not defined in any meaningful way, for a definition would have to encompass all kinds of structures the architect might at some point consider. ... even if we were to argue that the problem space can really be defined--since anything the architect thinks of must somehow be generated from or dredged from, his resources of memory or his reference library--some of this information only shows up in late stages of the design process after large amounts of search; and some of it shows up, when it does, almost accidentally. Hence, the problem is even less well defined when considered from the standpoint of what is actually known at any point in time than when considered from the standpoint of what is knowable, eventually and in principle.

Simon then characterizes the progressive development of an architectural design, as additional data are developed and additional constraints imposed: (op. cit., p.189 190)

We can imagine a design process that proceeds according to the following general scheme. Taking the initial goals and constraints, the architect begins to derive some global specifications from them-perhaps the square footage or cubic footage of the house among them. But the task itself, "designing a house," evokes from his long-term memory a list of other attributes that will have to be specified at an early stage of the design: characteristics of the lot on which the house is to be built, its general style, whether it is to be on a single level or multi-storied, type of frame, types of structural materials and of sheathing materials, and so on. ... Design alternatives can also be evoked in component-by-component fashion. The subgoal of designing the beating system may lead the architect to consider various fuels and various distribution systems. Again, the source of these generators of alternatives is to be found in his long-term memory and reference facilities (including his access to specialists for helping design some of the component systems).

The whole design, then, begins to acquire structure by being decomposed into various problems of component design, and by evoking, as the design progresses, all kinds of requirements to be applied in testing the design of its components. During any given short period of tame, the architect will find himself working on a problem which, perhaps beginning in an Ill structured state, soon converts itself through evocation from memory into a well structured problem.

My observation of the diagnostic reasoning process suggests that the physician, like the architect, may at any one time be observed to be working on a problem, which though initially ill structured has been converted into a well structured task through the evocation and imposition of constraints that progressively bound the set of possible decisions. There is, however, a significant difference between these task domains in that for the architect, there is no one correct solution to the design problem. When the architect imposes an arbitrary constraint-though this reduces the degrees of freedom concerning the range of possible designs-one can generally be assured that the outcome will be a structure that meets at least some minimum set of requirements. If the physician invokes arbitrary constraints, however-so as to make his solution space more manageable-the outcome can be quite seriously compromised.

This is what makes it especially important that the medical decision maker be provided some means for assessing the consequences of alternative problem structuring constraints, a consideration that may or may not be of concern to the architect

The General Model of Ill Structured Problem Solving

In order to account for the creative, problem finding aspect of the ill structured problem solving process, Simon envisions a conceptual framework within which methods for the evocation and synthesis of problem structure would alternate with problem solvers of familiar kinds in the progressive definition, refinement, and eventual solution of such problems. Figure 1 illustrates the interplay between the synthetic and analytic components of the process.

wpe65.gif (26246 bytes)

Fig. 1. Schematic diagram of a system for ill structured problems. It shows the alternation between a problem solver working on a well structured problem, and a recognition system continually modifying the problem space.  (From: Simon, H. A., The Structure of Ill Structured Problems, Artificial Intelligence, 4 (1973, 181-201).)

This model distinguishes two major facets of the reasoning process: a problem solver, which in a typical AI application might consist of a theorem prover or heuristic problem solving program such as GPS [6], and a noticing and evoking mechanism, which monitors what is going on in working memory and modifies the problem space by fetching from long term memory (and/or the external environment) new constraints, new subgoals, new specifications, new generators of solution alternatives.

In order to characterize a particular ill structured reasoning system in terms of this general model, there are a number of features that must be specified more fully. One of the main things that needs to be defined is the mechanism by which elements of task structure are to be evoked from the system's long term memory. Depending upon the strategy chosen for this purpose, there must then be a determination of the goals and constraints that govern the selection and assembly of these structural fragments into complete (perhaps partial or multiple) task definitions. Attention must also be given to the design of a problem solver that can be applied to the various complexes of tasks that might emerge.

A particularly important aspect of the overall design is what in the terminology of artificial intelligence programming is referred to as the "control regime," or strategy by which the focus of attention of the problem solver will be directed first at one alternative and then another, perhaps with provision for resumption of consideration of those options initially rejected. In situations that admit alternative conceptualizations of a task complex, there must be some machinery to generate the space of feasible task formulations, and to manage the "shift of paradigm" that takes place when moving from one such conceptualization to another. Moreover, criteria must be developed by which to judge the various formulations of the task so as to guide the search process towards that task characterization which is actually correct. In addition, there must be some means of determining when this meta goal of the problem formulation process has been achieved.

The Structuring of Medical Diagnostics

The model presented in the preceding section can be used as a basis for characterizing and contrasting a number of approaches to computer aided diagnosis that have been developed over the past two decades. The primary features distinguishing these diverse approaches include: 

The Degenerate cases

The great majority of diagnostic programs fall into a degenerate category, in that the entire computer-based procedure consists of the single module labeled "problem solver." The premise underlying use of such procedures is that the clinical problem can be encompassed by a fixed differential diagnosis; i.e., there is a given set of diagnostic possibilities, and it is a presupposition in using the program that the patient has one and only one of the listed disorders. Another way of saying this is that the list of diagnostic possibilities is exhaustive and mutually exclusive, a necessary condition for the application of many statistical decision making strategies, notably Bayes' rule.(6)

That part of the overall process that Simon refers to as "noticing and evoking," which deals with the evocation or formulation of diagnostic tasks in response to cues in the data, must be performed by the physician user of such a system without benefit of computer assistance. There are alternative approaches to the design of diagnostic programs in which the computer plays a more active role in the generation of diagnostic tasks. The two principal methods that have been employed for computer-assisted evocation of diagnostic tasks are discussed in the following sections.

The Evocation of Binary choice (True/False) Tasks

One way to arrange for the evocation of diagnostic tasks would be to use what in the terminology of artificial intelligence is referred to as the method of "hypothesize and test." Using this approach, the decision maker would hypothesize in some systematic fashion all possible diagnoses, and for each, create the task of determining whether this hypothesized diagnosis can be verified as contributing to the patient's illness. Such tasks would not be in the form of differential diagnoses (is the illness X? or Y? or Z?). Instead, each such task would entail a choice between the presence or absence of a particular disease (is the case X? or not X?).

This approach to task structuring has the virtue of ensuring that each task considered consists of an exhaustive and mutually exclusive set of alternatives (the disease either is or is not present, and it cannot be both). For this reason, it can be used in the design of Bayesian diagnostic procedures that are suitable for use in clinical problems having the potential for multiple concurrent diagnoses.(7)

Ordinary Bayesian task formulations, structured as differential diagnoses, require the assumption that one and only one of the diseases in the differential list is present in the case. This conventional problem structure entails one task and one decision, whereas a formulation based on a binary choice (between X and not X) entails n tasks (where n is the total number of diseases known to the system) and permits as many as n positive conclusions. 

Problem solving procedures for making true/false judgments concerning evoked hypotheses may be of many forms. As already noted, a method commonly employed is that of statistical decision theory. An interesting variant is that of MYCIN, which employs an adaptation of the theorem proving methods of Alto "prove" (up to a specified certainty level) the truth or falsity of each considered hypothesis.

There is nothing particularly demanding about the control strategy required for managing the consideration of a succession of binary choice tasks. The program can be set up in a straightforward fashion to proceed sequentially through the list of possibilities, considering each in turn, accumulating evidence for and against the hypothesis in accordance with the specified problem solving procedure.

The main difficulty with the binary choice approach is that it fails to aggregate diagnostic possibilities into decision sets; instead, each considered diagnosis is evaluated as though independent of all other alternatives. This requires absolute criteria for decision making, as the problem solver is denied access to the powerful heuristics discussed previously that enable decisions to be rendered relative to a postulated decision set.

These heuristics, which provide guidance to the physician concerning the need for additional discriminating information and permit use of efficient decision strategies such as "the process of elimination," can be used only in the context of tasks structured as differential diagnoses.

An interesting study was completed recently by Sherman [18], in which be compared the performance of two algorithms, one patterned after the MIT/Tufts Present Illness Program (PIP) [15], the other using the problem structuring and decision making strategies of INTERNIST-I. The former uses a decision strategy consistent with a binary choice task structure, while the latter employs a differential diagnostic strategy. Both algorithms were provided access to the same knowledge base of birth defects,(8) and were run on the same set of diagnostic decision problems. Sherman's findings tend to corroborate the thesis advanced in the preceding discussion concerning the relative power of these two models of diagnostic decision making: (op. cit., pp.63-64) 

PIP and INTERNIST take a simple solution to the problem of when to conclude that a syndrome is present. They use the scores of the hypotheses and a predetermined threshold value. As stated before, PIP requires that a hypothesis' score exceeds the threshold while INTERNIST requires that the difference between the scores of the top two hypotheses exceeds the threshold. To evaluate these two methods, the hypotheses concluded by each system were examined.

The results showed that INTERNIST concluded the correct syndrome in 34 of the 35 cases and in one case no syndrome was concluded. PIP, on the other hand, concluded the correct syndrome in thirty cases, failed to conclude any syndrome in four cases, and concluded an incorrect syndrome in two cases. (In one case PIP concluded both a correct and an incorrect syndrome.)

The differences in the number of incorrect syndromes concluded and the number of cases where no syndromes were concluded, although not statistically significant, can be explained by the differences in the algorithms.

In both of the cases where PIP concluded an incorrect syndrome there was another hypothesis which had a score that was very close. INTERNIST would not have concluded the syndrome at that point since it uses the difference between the scores of the leading hypotheses and not the actual magnitude of the leading hypothesis score.

In the three cases where PIP failed to conclude any syndrome but INTERNIST did conclude the correct syndrome, PIP was pursuing the correct hypothesis but due to stray and/or absent findings (quite common in birth defects) the correct hypothesis' score was not greater than the threshold for confirmation, In INTERNIST, although the magnitude of the correct hypothesis was not very large, the difference between its score and those of the other hypotheses was great enough to conclude the correct hypotheses.

Perhaps the most troublesome aspect of employing a binary choice task structure for clinical decision making is the difficulty of interpreting the output of such a diagnostic procedure in cases where multiple diagnostic decisions are possible. Because disease hypotheses are not organized into decision sets by such a procedure, there is no guidance provided to the user concerning the assembly of a diagnostic complex; i.e., when the program prints out a list of possibilities with associated scores, there is no indication whether they are to be considered conjunctive or disjunctive sets, or some combination thereof.  Thus in complex cases--where decision support is presumably most needed--diagnostic procedures based on binary choice decision strategies offer little assistance in shaping the physician's decision making context.

The Direct Evocation of Differential Diagnostic Tasks

There is an alternative approach to the structuring of medical decisions that directly yields differential diagnostic task structures. In its most elementary form, this method would have the "noticing and evoking" mechanism of the decision making program consider every abnormal finding encountered during the patient workup, and for each, create the task of determining the disease or pathological condition that is responsible for causing this abnormality.

For example, an observation of "fever" would evoke a differential diagnostic task encompassing a large number of diagnostic possibilities, including infections of all types, many forms of cancer, blood abnormalities, cardiovascular disorders, and a variety of diseases associated with abnormal immunity.

As this example suggests, the set of disease entities comprising a differential diagnosis tend to be aggregated-often on the basis of some common characteristic or similarity of presentation-into general diagnostic categories. One reason for such grouping is so that entire subcategories may be ruled in or ruled out on the basis of particularly discriminating questions. In the example of fever given above, one might narrow the range of diagnostic possibilities considerably by inquiring whether the patient had experienced shaking chills. An affirmative answer would usually serve initially to constrain the physician's differential diagnosis to some form of infectious process.

Through a combination of training and experience, the clinician acquires a rich network of associations that serve to partition the space of possible diagnoses. These knowledge structures enable the clinician to develop his conceptualization of a clinical problem by working from what are initially broad descriptive characterizations through a succession of refinements until eventually specific disease entities are considered and assessed. This serves to limit the number of decision alternatives that need to be dealt with in the context of any particular evoked task.

Another factor serving to limit the apparent magnitude of the cognitive task associated with a differential diagnosis is that physicians invariably exploit the truism that "common things are common," which argues for consideration of diseases in the differential diagnosis on the basis of their prevalence, or frequency of occurrence in the population. Thus, while the complete differential diagnosis associated with a particular finding may encompass hundreds of diseases, only a few of the more common possibilities on this list are likely to occur to the physician when first encountering this complaint. The remaining alternatives hover in the background, to be brought into view only when the major contenders fall from favor for one reason or another.

The main problem with the direct evocation of differential diagnostic tasks, as outlined above, is that there would be as many diagnostic tasks to deal with as there are abnormal findings in the data. Although each such task has prima facie validity since every finding has to be explained in one way or another, no physician would seriously consider himself to be confronted with twenty diagnostic tasks in a patient with as many findings. Instead, he would most likely consider most of these data to be manifestations of the same underlying cause, and initially at least, proceed as though faced with but one diagnostic task.

This suggests that it might be useful to structure differential diagnostic procedures around certain combinations of findings, which, when they occur together in a patient, may be presumed to be part of the same underlying process. This approach has been investigated by Patrick et al. [14], who suggested use of "activation rules" that would be employed to evoke selected differential diagnostic tasks on the basis of patterns of findings.

There are difficulties with this approach to the structuring of diagnostic algorithms, which in certain respects are more serious than those discussed previously. In the case where a differential diagnostic procedure is designed to deal with a single abnormal finding, one can at least be assured that all possible explanations of that abnormality are potentially accessible to die algorithm: i.e., the differential diagnosis constitutes an exhaustive enumeration of all known causes of die given finding.

When algorithms are designed to be invoked on the basis of patterns of clinical or laboratory findings, however, many potential diagnoses become inaccessible: namely, those that fail to account for all of the data in the evoking pattern. Thus there is the danger that plausible explanations of a clinical problem that might involve alternative partitionings of the input data, with corresponding differences in the patterns of task evocation, will not be brought into "conscious" consideration.

Another way to attack this problem would be to maintain the discrete task definitions associated with each individual finding, but provide for the tentative merging or "synthesis" of multiple elementary task definitions into a single unified task, for purposes of guiding the workup and focusing the acquisition of additional information, The essential mechanism involved in the construction of such a "synthesized" differential diagnosis would be that of set intersection. If some basis can be found for focusing on a particular subset of the data, then the differential diagnosis lists associated with each of the selected findings can be intersected to derive a much narrower list of prospective diagnoses, each capable of explaining that entire subset of the data.

Whenever the problem solver invokes an attention focusing maneuver aimed at narrowing in on the most probable region of the problem space, it must then set about to acquire additional evidence that will clinch the case. In this setting, the acquisition of additional information can be undertaken to serve two purposes. As in any case of a decision among elements in a decision set, information may be sought to help discriminate among these alternatives and to narrow the focus even further by eliminating elements from the set In addition, the premises underlying the choice of the selected decision problem may be put to the test by eliciting additional information that might help in either confirming or rejecting the problem focus as a whole.

It is important to keep in mind that the principle of parsimony, while often a useful heuristic guide to pursue in narrowing the focus of attention, is not in itself a valid basis for making final clinical judgments. Therefore, any diagnostic program that permits narrowing of the problem solver's focus on the basis of purely heuristic maneuvers must make provision for those maneuvers to be undone. This means that a diagnostic problem structuring and solving procedure, which incorporates heuristic task combination methods, must also adopt the type of sophisticated control strategy often found necessary in other artificial intelligence programs that rely on heuristic methods.(9)

The INTERNIST/CADUCEUS Perspective

The previous section has outlined a number of approaches to designing a system for assisting the physician in structuring and solving problems of medical diagnosis.

We first considered the degenerate case, instances of which abound in the literature, for which the diagnostic task is taken to be given. and the process of recognition and task evocation is assumed to be the responsibility of the user. The difficulties encountered in using such limited models, also amply documented in the literature, were briefly outlined.

We then discussed the use of a simplified task structure in which the decision to be made is between the presence or absence of a particular disease. While permitting machine aided task evocation, this approach was seen to be deficient in that the binary-choice task structure does not admit the range of powerful decision making strategies that a differential diagnostic task structure encourages.

Finally, we considered the advantages and problems associated with the direct evocation of differential diagnostic task structure, and outlined methods for managing the size and complexity of the space of task definitions that might result in the course of analyzing a complex clinical problem.

The work of the INTERNIST/CADUCEUS project has concentrated on the latter approach, with investigation of heuristic methods for imposing differential diagnostic task structure on the clinical decision making problem. The thrust of this work has been motivated by our conviction that this is consistent with the way in which expert clinicians approach their task. While modeling the expert decision maker is not necessarily the only way to approach the design of an "expert system," in the medical decision making domain there has been general disappointment with the so-called normative decision models, whereas observation of human expert behavior has proved to be a most fruitful source of insights into the decision making process.

One aspect of our study has been investigation of knowledge structures that can be used to aggregate or "chunk" elements in a differential diagnosis. We have explored two conceptual frameworks that are commonly used to organize medical knowledge, both of which can be exploited to derive categories for classifying the alternatives in a diagnostic decision set.

One of these is the "causal graph," in which the concept of causality or pathophysiology is used to define a network of interrelated pathological states that might arise in the course of a disease. The other type of structure used to aggregate disease entities is the taxonomy of disease, also called a "nosology," which is used to classify disease entities on the basis of anatomical locus, etiological agent, or other discriminating characteristic.

The first programs developed in our laboratory employed an elaborate causal network to structure elements in a differential diagnosis: given a set of abnormal findings, a search procedure was used to scan the causal network, seeking opportunities to "synthesize" unifying hypotheses.(10) For reasons that will be discussed presently, this approach encountered serious computational barriers to the process of task synthesis: i.e., the combining of multiple discrete tasks into unified task formulations. In subsequent developments, the incorporation of explicit causal mechanisms was dc-emphasized and an alternative approach to task structuring based on the use of nosologies. or taxonomies of disease categories, was pursued. This led to more efficient formulation of complex, synthesized task definitions, but proved deficient in the precision with which the attribution of findings to particular diseases could be assessed.

As our experience with both approaches to the structuring of medical knowledge has identified problems with each taken separately, current work on CADUCEUS is aimed at investigating the conceptualization of differential diagnostic problems in both causal and taxonomic terms.

To give somewhat more substance to the concepts being discussed, the following section provides some specifics concerning the implementation of the task structuring and decision making methods employed in the original INTERNIST-I system. This system constituted the point of departure for the investigation of issues in artificial intelligence and the representation of medical knowledge that has led to the general model of diagnostic reasoning presented in the preceding section. We shall return to consideration of the general model after sharpening the issues with a concrete case analysis and critique of INTERNIST-I.

INTERNIST-I: Example of the Model of Diagnostic Reasoning

The INTERNIST-I program employs a heuristic procedure that composes differential diagnoses, dynamically, on the basis of clinical evidence. It does this by assembling what in context appears to be an exhaustive and mutually exclusive subset of disease entities that can explain some significant portion of the observed manifestations of disease. Such a conjectured differential diagnosis then serves as the basis for selecting strategies of information acquisition and decision making relative to that diagnostic task. During the course of an INTERNIST-I consultation, it is not uncommon for a number of such conjectured problem foci to be proposed and investigated, with occasional major shifts taking place in the program's conceptualization of the task at hand. 

This system was demonstrated for the first time in 1974, and has since been used in the analysis of hundreds of difficult clinical problems, often with notable success. A review of the information structures and heuristic processes that underlie the performance achieved by this version of INTERNIST follows. Behavior of the system will be illustrated by means of a case run, followed by a critical appraisal of the system based on our experience over the past several years. Plans to deal with the problems that have been identified will then be discussed, suggesting the outline of the knowledge structures and procedures of a new version of the program.

The INTERNIST-I View of Medical Knowledge

The knowledge base underlying the INTERNIST system is composed of two basic types of elements: disease entities and manifestations (history items, symptoms, physical signs, laboratory data). In addition, there are a number of relations defined on these classes of elements. At present, there are over five hundred disease entities encoded in the knowledge base and thirty-five hundred manifestations. 

Each manifestation defines an elementary differential diagnostic task by means of the EVOKES data structure. This structure records with each manifestation the list of diseases in which that manifestation is known to occur, along with a weighting factor (on a 0-5 scale) intended to reflect the strength of association. We refer to this weight as the "evoking strength" by which a manifestation is related to each disease on its "evokes-list."

Although not strictly interpretable as a measure of probability, for reasons that will be discussed presently, this weighting factor can be viewed as somewhat analogous to a posterior probability because it describes the relative likelihood of disease entities evoked on the basis of a single observation. As suggested by this analogy, the ordering of a differential diagnosis provided by evoking strength tends to reflect the a priori probability or prevalence of disease in the population.

The inverse of the EVOKES relation is also recorded explicitly in the knowledge base. By means of the MANIFESTS data structure, each disease entity is profiled with an associated list of manifestations known to occur in that disease, recorded along with an estimate (in this case recorded on a scale of 1-5) of the frequency of occurrence. Although it is not employed in the usual way to calculate posterior probabilities (the actual calculus used in combining these weights will be sketched in the following section), this weighting factor is strictly analogous to the conditional probability of a manifestation given a disease.(11) 

Other relations are defined on the set of disease entities to record the causal, temporal, and other patterns of association by which the various disease entities and distinguished pathological states are interrelated; these relations also incorporate weighting factors analogous to the evoking strength and frequency weights mentioned above.

The extent of pathophysiology represented in this fashion has been deliberately limited, however, because of the difficulties--alluded to earlier--which are encountered when the process of combining multiple elementary task definitions into unified task constructs requires search of a detailed causal graph.(12) Thus, for the most part, the INTERNIST-I knowledge base has been structured so that abnormal findings evoke disease hypotheses directly without calling for consideration of any intermediate pathological condition.

Conceptually, this can best be thought of as the result of embedding information concerning the manifestations of the various pathological states associated with each disease within the profile of that disease. There are exceptions, however, as certain important syndromes and pathological states have been recorded in the INTERNIST-I knowledge base as separate entities (e.g., "ascites," "portal hypertension," " heart failure," "nephrotic syndrome") because of their well established clinical significance as points of departure for differential diagnosis.

The INTERNIST-I knowledge base also contains a nosologic hierarchy of disease categories, organized primarily around the concept of organ systems, having at the top level such categories as "liver disease," "lung disease," "kidney disease," etc. Each of these areas is divided into more specific categories, which may in turn be further subdivided any number of times until the terminal level representing individual disease entities is reached.

There are also several auxiliary relations defined on the class of manifestations to record properties of interest, such as the type of a manifestation (history, symptom, sign, or one of three designations of laboratory finding that reflect progressively greater cost and/or danger to the patient), its global importance (a measure of how important it is to account for that manifestation in a final diagnosis, recorded on a scale of 1-5), and its relation to other manifestations (such as the derivability of one from another).

The Problem-Formation Method of INTERNIST-I

The heuristic problem structuring procedure of INTERNIST-I is invoked repeatedly during the course of a diagnostic consultation in order to deal sequentially with the component parts of a complex clinical problem.

The process is as follows. First, during the initial input phase, patient data consisting of both positive findings and pertinent negative findings may be entered in any order and any amount. Each positive finding is used to evoke an elementary differential diagnostic task that may contain a mixture of individual disease nodes and higher level nodes of the disease hierarchy. Where a manifestation evokes one of the higher level category nodes, it is because that finding can be explained by all subnodes of the given category.

In order to detect the possibility of combining multiple tasks, each associated with a single abnormal finding, the INTERNIST-I program uses a scoring procedure that awards credit to disease hypotheses on the basis of the number and the importance of elementary tasks that are unified by those hypotheses. In this scoring process, the evoking strength and importance of manifestations explained by a disease are counted in its favor; frequency weights count against those disease hypotheses in which the corresponding manifestations are expected but not found present in the case.(13)

Given a ranked list of disease hypotheses, a synthesized task definition is then formulated on the basis of the most highly rated of these items, using the following heuristic criterion: two disease entities are considered to be alternatives to one another (hence part of the same task definition) if, taken together, they explain no more of the observed findings than are explained by one or the other separately.(14)

The set of alternatives so determined, with scores within a fixed range of the top ranked disease hypothesis on the list, are then composed into a differential diagnostic task, which becomes the focus of problem-solving attention.

The program then selects questions that will help to discriminate among entities in the problem set. There are three major strategies that may be employed in selecting questions to be asked and identifying tests to be performed. When the set of decision alternatives contains five or more elements, a RULEOUT strategy is invoked, whereby the program seeks negative findings that will help to disconfirm one or more elements in the set. If the decision set contains between two and four elements, the program fixates on the leading two contenders and seeks information that will help separate them in score; this is referred to as a DISCRIMINATE information acquisition strategy. When only one alternative remains in the decision set, the program embarks on a PURSUING strategy, whereby it seeks confirmatory data that will extend the separation of the leading contender from its nearest competitor above a given threshold value.

Once the program has inquired about the selected information items and acquired additional positive or negative patient data, it then re-evaluates all diseases evoked (whether in or out of the current problem focus) on the basis of new information obtained, and then reformulates the differential diagnosis. Depending on which disease entity emerges as most highly rated on successive iterations of the process, the focus of attention may shift from one diagnostic task to another--but at any one time, there is a single problem under active consideration.

Whenever a diagnostic task becomes solved, its result is entered into a list of concluded diagnoses, all manifestations explained by that disease are marked "accounted for," and the process recycles until all problems present in the case have been uncovered and solved. Because of causal, temporal, or other interrelationships, certain combinations of disease entities are more likely to occur than others. This fact is recognized in INTERNIST-I by the scoring algorithm, which, on each iteration of the process, gives additional weight to any disease entity that is in any way linked to some already concluded disease.

In the course of gathering additional information, it sometimes happens that the program runs out of questions deemed useful for the strategy being pursued. If this happens while engaged in a PURSUING strategy, the program renders a peremptory conclusion even though the desired threshold of separation has not been achieved. If the program exhausts its list of useful questions during a DISCRIMINATE or RULEOUT phase, however, it performs what we refer to as a "deferral." At this point, the set of findings used to define the current differential diagnostic task are set aside for the time being (as though a positive decision had been made that some unknown member of that decision set is actually correct) so that some other decision problem might be brought into consideration. The hope, often borne out in practice, is that by solving a second--possibly related--decision problem, the program might then go back to the deferred differential diagnostic task and make somewhat more progress in its resolution. 

By virtue of this deferral mechanism, INTERNIST-I exhibits a limited capability for concurrent problem-formation and problem-solving, an important characteristic of human diagnostic behavior about which more will be said later in the critique of INTERNIST-I performance.

Example of INTERNIST-I Case Analysis

The following is a transcript of the interaction that took place during a recent INTERNIST-I case study. The data were taken from records of a patient admitted to the hospital with a severe acute febrile illness, which was correctly diagnosed by the house staff as systemic leptospirosis. Not a particularly challenging diagnostic problem, this case was selected for inclusion in this chapter because it illustrates a number of important aspects of the INTERNIST-I approach.

The lengthy transcript is linked here.

Critique: Key Elements of the INTERNIST-I Logic

Based on our observation of clinicians interacting with this model of diagnostic logic, we have come to understand what are the most important features of the model. The following elements have been found to affect in significant ways both the diagnostic behavior of the system and its potential acceptability to the intended user community.

Focus of Attention on the Most Highly Scored Hypothesis and Its Competitors. This feature has both advantages and disadvantages. Most of the time, this heuristic focusing scheme tends to single out for consideration that subset of diseases which can account for the most important patient data. Among those selected hypotheses, the more favored are the more common, other things being equal; this results from the use of evoking-strength weights of each observed manifestation for each disease hypothesized, and also the global importance weight of each finding, in the scoring of disease hypotheses.

The scoring mechanism does not always lead to an appropriate task definition, however. as the procedure can be sensitive to the preponderance of data as well as the more relevant measures of specificity and importance. Thus it sometimes happens that important, specific data are "disregarded" by INTERNIST's problem focusing heuristic while less significant facts of the clinical problem are selected for investigation on the basis of a large volume of data, much of which might be of limited importance. 

This brings us to consideration of the second most significant aspect of the INTERNIST-I logic, namely:

Sequential Problem Formation and Problem Solving. The principal advantage of the single-problem focus of INTERNIST-I is the simplicity of control this permits. By virtue of its frequent reformulation of the task definition, INTERNIST-I exhibits a responsive "problem-hunting" behavior that almost always converges, eventually, on the appropriate conceptualization of a clinical problem.

The only difficulty with this approach is that in complex cases, the program often begins its analysis by considering wholly inappropriate tasks, on which it may spend an inordinate amount of time. This rarely leads to a false conclusion, but does prolong the sessions of terminal interaction unnecessarily. This phenomenon also dictates the use of a very conservative approach to the patient workup, as more aggressive strategies might seek costly items of information pertaining to an ill-advised initial conceptualization of a clinical problem.

Although it has the capacity to deal with multiple concurrent problems in due course, because of its sequential task formulation and decision making strategy, INTERNIST-I is generally not conscious at the outset of evidence concerning any problem other than the one initially formulated. This phenomenon is exemplified by the first few rounds of task definition in the preceding case analysis. Most clinicians reviewing the initial data would probably characterize the case as an acute febrile illness having hepatic, renal and systemic involvement. Given the further observation that the patient had recently been exposed to small wild animals, it is highly probable that most experienced physicians would "put it all together" and consider the possibility of leptospirosis involving multiple organ systems from the outset.

Unfortunately, one of the things that limits INTERNIST-I to the sequential formation and consideration of problems is also one of the features of the system most responsible for its robust behavior, namely:

The ad hoc Formation of Problem Structure. Although the INTERNIST-I knowledge base includes a hierarchy of disease categories which was originally intended to allow decision making to proceed from general characterization of disease process (via descriptors corresponding to higher-level nodes of the hierarchy) to the more specific characterizations corresponding to terminal level disease states, in practice this category structure is not effectively used to develop problem foci during the course of an INTERNIST-I analysis. Instead, as described previously, the program uses a heuristic partitioning algorithm in order to group together those diagnostic entities "thought" to be alternatives to one another.

An example might help clarify the nature of the difficulty involved in attempting to establish milestones in a diagnostic workup on the basis of categorical judgments concerning nodes of the disease hierarchy. Assume that a patient presents with jaundice, pruritis, light colored stools, and on laboratory evaluation is found to have significantly elevated alkaline phosphatase--in short, a fairly typical pattern of cholestatic jaundice. Despite the fact that the INTERNIST hierarchy of disease categories has a node called "cholestasis," it would not do for a program to conclude (or even entertain as a very serious hypothesis) that the patient's real problem is to be found among the diseases catalogued under the "cholestasis" node.

The reason for this is that under the heading "cholestasis" are recorded only those disease entities whose predominant mode of presentation is cholestatic. Other diseases that may present variously with some degree of cholestasis are scattered throughout the "liver disease" subtree and other regions of the hierarchy; e.g., alcoholic hepatitis resides under "hepatocellular inflammation" (its predominant facet), primary biliary cirrhosis is classified under "hepatic fibrosis," infectious mononucleosis and toxoplasmosis are structured under "infectious lymphadenopathies," etc.

As the INTERNIST-I hierarchy requires that each disease be recorded in one and only one place, the partitioning procedure was devised to group diagnoses when the basis for aggregation is other than the one used in structuring the hierarchy. Thus, whenever it postulates a differential diagnostic task, INTERNIST-I often gathers together disease entities from throughout the hierarchy, without regard for the a-priori groupings recorded there.

Limitations of the INTERNIST-I Decision Strategy. Experience has shown that the INTERNIST-I policy of decision making relative to a postulated task definition is one of the great strengths of the system. Diagnostic decisions are rendered whenever sufficient separation is obtained for the leading hypothesis, as compared with its putative competitors. Thus, provided an appropriate differential diagnosis set has been identified, it is possible for the program to come to the correct diagnosis by ruling out all but one of the diagnoses in the set, then recording the remaining contender as its default judgment. In this fashion, the program often manages to solve difficult clinical problems even in the absence of clinching data (obtainable only by biopsy or autopsy, perhaps) that are unavailable at the time of the workup.

The main difficulty with this decision strategy is that the program does not, and cannot in its present implementation. consciously question the validity of the postulated problem focus in the context of which its decision making process is being carried out. As mentioned previously, the program often generates incorrect task definitions at the outset, but the relative ease and frequency with which the problem focus is reformulated generally guarantees that eventually an appropriate decision problem will emerge. There is nothing in the logic to ensure that this will happen, however, and this sometimes leads to incorrect diagnostic results.

The reason that the problem cannot be dealt with effectively within the INTERNIST-I framework is the ephemeral nature of the task conjectures employed there. These ad hoc constructs have no substance outside the context of the particular case in which they emerge. Although clinicians viewing a decision problem formulated by INTERNIST-I may have no difficulty attaching an appropriate clinical label, and may even have very specific ideas about the proper way to work up such a problem--INTERNIST-l has no access to such problem-specific advice.

The only way to make such clinically relevant knowledge available to the program would be to incorporate new knowledge structures containing explicit information about well structured differential diagnostic tasks, the disease entities they comprise, criteria useful in ruling alternatives either in or out, and specific advice where useful concerning die decision process that is to be followed relative to the postulated diagnostic task.

Inability to Test Conditional Attributions in the Associative Knowledge Structure of INTERNIST-I. Finally, there is a fairly serious problem that derives from the fact that in the INTERNIST-I knowledge base, much of the detail has been "compiled out," especially concerning the mechanisms by which particular abnormal findings happen to occur in the various diseases with which they are associated. It is not uncommon, therefore, to find within the same disease profile references to manifestations that indicate presence of a particular pathological condition, and also references to findings that might be expected to occur in the given disease only if that pathological state obtains. For example, the profile of "sinusoidal portal hypertension" includes the finding of "esophageal varices" (a type of abnormal vasculature in the walls of the esophagus), and also the finding of "hematemesis" (vomiting of blood), which can be explained by the portal hypertension only if the patient does indeed have varices.

This is a serious problem for several reasons. At times, it can cause the program to give inordinate weight to a disease hypothesis because of a presumption concerning its ability to explain data, which a careful pathophysiological analysis would show to be incorrect. This can lead to inaccurate decisions. Moreover, even when it does not foster an erroneous result, mistaken attribution can prevent INTERNIST-I from pursuing additional diagnoses because it "thinks" everything has been adequately explained, when in fact this is not the case.

Another problem that results from the commingling of manifestations from many sources in a master disease profile is that this prevents the incorporation in INTERNIST-I of an effective time line, for assessing the fit of a patient's illness with the temporal pattern of a disease. For adequate characterization of many disease processes, it is essential to be able to express temporal constraints with respect to the various facets of the illness; e.g., those elements comprising the prodrome should precede those of the active stage of the disease, otherwise the pattern does not fit. To enforce such conditions on the attribution process would require the ability to make judgments concerning clinical and pathological descriptors which can be said to be true of the patient at certain points in time.

The problem with the INTERNIST-I representation is that there is not a convenient way to make the attribution of a finding to a disease conditional on the occurrence (or time of occurrence) of an intermediate state, without making the latter an explicit node in the causal graph. But as we have already seen, the incorporation of additional intermediate states tends to obscure the potential for discovering unifying hypotheses, because of the sequential nature of the INTERNIST-I problem formulation and problem solving procedure.

We shall return to this point in more detail in the following sections, but for now the experience with INTERNIST-I in this regard can be summed up in the following somewhat paradoxical observation: a diagnostic reasoning program must have access to detailed pathophysiological knowledge in order to permit the test of hypothesized attributions; however, if the program is forced into consideration of the detailed pathophysiology, there is the danger that unifying gestalts may fail to emerge.

Towards a More Highly Structured Knowledge Base

One of the major lessons learned from our experience with INTERNIST-I is that it is necessary for the program to be able to make decisions with respect to descriptors that partially constrain the space of alternatives; these decision points constitute loci of well structured sub-problems about which special procedures and other advice concerning decidability, significance, treatability, and other such matters may be organized.

In particular, it should be possible for the program to establish milestones in the workup of a clinical problem by making judgments concerning nosological categories into which the illness falls. Also, as is presently the case in a somewhat limited form, the program should be able to make judgments concerning pathological descriptors.

The design of a successor to INTERNIST-I that would deal with these issues has been under investigation for some time. In Pople [17], the general outline of an approach was provided. The idea proposed there was to exploit what came to be called "constrictor" relationships in the data that allow decisions to be made with respect to pathological and nosological descriptors, thereby setting in motion a number of parallel or concurrent differential diagnostic analyses.

Initially, it was hoped that the necessary constrictor patterns could be extracted from the original INTERNIST-I knowledge structure using heuristic mapping rules. This has not proved to be the case, however, because of two reasons: use of a strict hierarchy in INTERNIST-I for defining categories of diseases, which has served to make diffuse what would otherwise be strong constrictor relations; and elimination of pathophysiological detail which cannot be restored without a considerable investment of medical expertise. Both of these call for major revision to the structure of medical knowledge underlying the INTERNIST program.

In this section, we review the major knowledge structures employed in clinical reasoning. We consider first a pure "causal" model of medical knowledge, detailing both the benefits provided by such a structure and the problems encountered in using this for conceptualizing the overall picture of what might be happening in a complex case.(15)

We then consider a nosological structure, which provides for better general characterization of a clinical problem but, as noted previously, fails to provide sufficient detail of pathophysiology to enable critical evaluation and justification of hypotheses. Finally, we show how these two approaches can be combined to achieve a synergistic blend, yielding much of the benefit provided by each of the knowledge structuring principles. Using this combined knowledge structure, a general approach to the problem of task synthesis is then provided.

The Causal Network

In a typical pathophysiological model, manifestations of disease are organized on the basis of the pathological states giving rise to such observations, and these states are in turn organized into a causal network (a fragment of such a network is illustrated in Figure 2).

wpe37.gif (75132 bytes)

Fig. 2.  A portion of a causal network, which depicts the attribution pathways by which manifestations of disease can be accounted for in pathophysiological terms.

In this graph, arcs (links) are marked by arrowheads in the "caused-by" direction; thus, the links emanating from the node "jaundice" indicate that jaundice can be caused by either conjugated or unconjugated hyperbilirubinemia (a condition characterized by elevated levels of bilirubin in the blood). In order to emphasize the incompleteness of the graph, most of the nodes have dashed arrows coming from them which are to be interpreted as "and so forth." In the few cases where such ellipses are not shown, it should be understood that the set of causes explicitly shown constitutes an exhaustive set; e.g., in the case of "jaundice," and of "caput medusae.(16)

Nodes in the network having no successors in the "caused-by" relation represent specific disease entities (e.g., "common duct stone," "hepatitis," "micronodal cirrhosis," etc.); Those having no predecessors (e.g., "jaundice," "pallor," "caput medusae," etc.) stand for possible findings: symptoms, signs, abnormal lab values, etc. in between these extreme points of the network lie chains of nodes representing syndromes, various pathological derangements, and other entities that are significant facets of the disease process.

One of the major uses of such a knowledge structure is as a basis for aggregating elements in a differential diagnosis, thereby reducing the apparent number of alternatives to be considered at any one decision point. We refer to a differential diagnosis comprising pathological descriptors, which in turn point to more underlying causal diagnoses as a "refined" differential. This is in contrast to the "raw" differential diagnosis, which for any finding can be found by compiling a list of all disease entities that appear as terminal nodes anywhere to the right of the point of entry of the finding into the causal graph.

Thus, the raw differential diagnosis for "jaundice," based on the simplified graph of Figure 2, would encompass the set of alternatives: ("common duct stone"? or "biliary cirrhosis"? or "Gilbert's disease"? or "hemolytic anemia"? or "viral hepatitis"? or "macronodal cirrhosis"? or "micronodal cirrhosis"? or "hepatic vein obstruction"?) However, the "refined" differential, based on descriptors of the causal graph, would encompass only the two items ("conjugated hyperbilirubinemia"? or "unconjugated hyperbilirubinemia"?). The advantage of such a refined differential diagnosis is the opportunity it provides for establishing milestones in a diagnostic workup. The presence of pathological states represented by many of the nodes in the network can often be suggested and even confirmed on the basis of fairly commonplace clinical data. This is typically not true of the more remote (i.e., less observable) disease entities that inhabit the far reaches of the causal net, which often require for their confirmation some definitive, often costly procedure such as biopsy, arteriography, CAT scan evaluation, etc.

In dealing with the pathological descriptors of a causal network, it is often the case that manifestations which are not diagnostic as to the decision among a subset of disease entities, are nonetheless sufficient to permit conclusions concerning intermediate pathological states. We refer to such one-to-one associations between manifestations and nodes of the network representing partial descriptions of a patient's illness as "constrictor" relationships, as they serve to constrict the range of alternatives in a decision set that will need to be pursued further.(17)

For example, if a patient is observed to be jaundiced, the initial decision suggested by the graph would be to choose between conjugated and unconjugated hyperbilirubinemia.(18) This can generally be decided by determining the so-called indirect to direct ratio, which if it is elevated suggests a preponderance of unconjugated bilirubin in the blood, or if low the converse. Assume that the result of this test indicates that the patient's jaundice is due to an excess of conjugated bilirubin in the blood. Then a new decision problem arises: to make a choice between "cholestasis" and "hepatocellular dysfunction" as cause of the conjugated hyperbilirubinemia. For this purpose, assays of liver enzymes (simple laboratory values often obtained on routine blood screening) may be able to settle the issue.

As this example shows, the confirmation of an intermediate node of the network often signals the resolution of one (refined) differential diagnostic task, and the initiation of another: namely, the task of identifying which of the disease entities or pathological states causally linked to the concluded node is actually responsible for the pathological condition that has been decided.

Such decisions can become "propagated" through the network by means of a sequence of choices among the immediate causes of each concluded pathological state. By proceeding in this systematic fashion, the search for an underlying cause can often be progressively constrained on the basis of relatively easy to acquire clinical or paraclinical data. This sequential decision making process can often serve to postpone the necessity for undertaking costly, invasive procedures in the search for pathognomonic data until the set of diagnostic alternatives has been reduced to a small number.

One of the difficulties encountered in using a detailed causal network is that the traversal of some nodes entails decisions that are difficult if not impossible to make, without engaging in some degree of "look ahead." For example, in Figure 2 the finding of "severe arterial hypotension" is associated with a refined differential diagnostic task encompassing three forms of shock. As it happens, the type of shock is determined, by definition, on the basis of the type of pathologic condition causing it. Thus it would be virtually impossible to solve the decision problem posed by the finding of severe hypotension without looking past the shock nodes so as to bring into consideration such underlying causes as myocardial infarction (a cause of cardiogenic shock), blood loss (a cause of hypovolemic shock) or bacteremia (a cause of pyrogenic shock).

Another difficulty encountered in using a detailed causal network is that the raw differential diagnostic tasks evoked by two or more abnormal findings might have common elements (at the level of disease entities, say) which are not easy to detect while focusing attention on the refined differential diagnosis. Thus the use of intermediate pathological descriptors to structure the differential diagnosis, while serving to reduce the cognitive load associated with processing the alternatives evoked by a single abnormal finding, actually makes more difficult the perception of opportunities for combination of task definitions when two or more abnormal findings are involved.

An example may help clarify the issue. Consider again the graph of Figure 2 and assume that a patient has been observed to exhibit both pallor and jaundice. Here, chains of causal links emanating from "jaundice" and "pallor" come together at the node "hemolytic anemia"; other chains of links converge on the three "cirrhosis" nodes, and on "hepatic vein obstruction."

Note that the nodes representing possible underlying causes of the observed findings are as many as five links removed from the point of entry of the findings in this simplified graph; in a more complete causal graph the distance could be a dozen links or more.(19)

In order to discover the potential to combine differential diagnoses on the basis of common causal nodes that are far removed from the starting nodes, a program would have to incorporate a search procedure that would scan the subgraphs emanating from each of the nodes selected as starting differential diagnoses, looking for points of convergence of the causal chains arising there. The problem with this is that the discovery procedure would entail multiple exponential searches of the causal network, which--depending on the level of resolution of the graph--could be extremely costly in terms of computation time, and the expenditure of other scarce resources such as physician waiting time.

In the following section, we turn our attention to consideration of an alternative knowledge structure, which can also serve to refine the elementary differential diagnoses associated with individual findings, but without complicating the discovery of synthesis opportunities as severely as in the case of a causal network.

The Taxonomy of Disease

As an alternative (or perhaps adjunct) to the detailed causal network, a taxonomy of disease categories can be used to develop partial characterizations of a clinical problem. Use of such a hierarchical structure--in medicine referred to as a nosology--would enable the development of differential diagnoses in a top down fashion, with higher level nodes of the hierarchy acting, like nodes of the causal network, as a milestones in the diagnostic process. This structure has the inherent advantage of permitting the conceptualization of a clinical problem to be formulated in the most general terms consistent with the data, with refinement of the concept taking place as additional evidence is developed.

One of the lessons learned from our experience with INTERNIST-I is that it is too restrictive to require that a nosology be organized as a strict hierarchy, for there is no one right hierarchy of disease categories. The value of any such knowledge structure depends on the use for which that structure is intended, and for purposes of assisting in the diagnostic reasoning process, the value of any given hierarchic structure is dependent on the availability of strong diagnostic relationships between possible abnormal findings and the descriptive categories of the hierarchy. A structure is useful if it can be exploited to make explicit whatever significant diagnostic implications there may be contained in the patient data.

One basis for structuring a nosology would be around the concept of organ system involvement, with the highest level categories of the hierarchy comprising such descriptors as "hepatobiliary involvement," "cardiovascular involvement," "renal involvement," etc. Each of these could be further subdivided into more specific categories that would define more precisely the nature of the involvement. In the liver area for example, subcategories of "hepatobiliary involvement" might be "hepatocellular involvement," "hepatic vascular involvement," "biliary tract involvement," etc. (See Figure 3a.)

wpe4F.gif (27484 bytes)

wpe51.gif (48296 bytes)

Fig. 3.  Portions of the nosologic "tangled hierarchy" of disease categories, organized on the basis of (a) organ system involvement, and (b) etiology.  (Blank cells denote one or more omitted nodes.)

In this, we have intentionally deviated from the terminology used in INTERNIST-I, where the nosology was defined in terms of categories of disease. Here, we speak of categories of involvement; thus, any given disease can be classified in as many descriptive categories of the nosology as are appropriate.(20) Figure 3a shows one example of a disease entity that is multiply classified with respect to the involvements hierarchy: biliary cirrhosis, which is a form of intrahepatic biliary tract involvement, is also classified as a form of hepatic fibrosis. 

Note that there are many other organizing principles that might be employed in structuring a taxonomy of disease types. An alternative to the organ system orientation of the preceding discussion would be a nosology based on etiological considerations. Such a structure might have as its most general categories such descriptors as "infection," "cancer," "abnormal immunity," etc. These categories may in turn be subdivided into more specific classifications; e.g., the "infection" node might have sub-categories such as "viral infections," "bacterial infections," and "fungal infections." (See Figure 3b.) The potential for multiple classification of diseases is also demonstrated in this figure, where "viral hepatitis" and "infectious mononucleosis"--previously classified in the taxonomy of Figure 3a as inflammatory hepatocellular disease--are here additionally characterized as viral infections.

Like the causal graph, a hierarchy of disease categories can be used as a basis for aggregating elements in a differential diagnosis, thereby reducing the apparent number of alternatives to be considered at any one decision point. For this purpose, we introduce a new type of link, referred to as a "generalized link" or "planning link": within any major sub-category of the nosology in which a manifestation M is to be causally associated with one or more nodes, a planning link is used to identify the most specific nosological descriptor (i.e., the lowest level, though typically non-terminal node) that subsumes all other nodes in that category which are causally linked with the given finding M. This can be interpreted as a graphical expression of existential quantification.

The advantage of the planning link structure is that it enables formulation of parsimonious "refined" differential diagnoses--in many cases permitting a single category to be selected as the scope of a manifestation (in which case, the relationship is that of constrictor). For example, the finding "liver enlargement," whatever else it may connote, surely suggests hepatobiliary involvement, and is a constrictor for this nosologic descriptor, based on a planning link association.

The use of planning links is illustrated in Figure 4. Here, the finding "jaundice"--which was shown in the previous section to be associated with a large number of diseases of the hepatobiliary system, as well as certain hematological disorders--has been linked into the hierarchy of organ system involvements at the top level of the hepatobiliary subtree (via planning link PL1) and at the level of hemolytic anemia in the hematological subtree (link PL2). Similarly, "pallor" has been linked into the hepatobiliary subtree at the level of "fibrotic hepatocellular involvement" (link PL3) and into the hematologic subtree (link PL4) at the level of "anemia," undifferentiated as to type.

wpe6E.gif (37745 bytes)

Fig. 4.  Planning links (e.g., PL1) from manifestation nodes to superordinate nodes of the nosology connote the existence of one or more direct causal pathways (here L1, L2, & L3) directed to subordinate nodes.

For purposes of illustration, let us assume that jaundice can be caused by virtually any type of hepatobiliary involvement except in the area of vascular involvement, where the only associated descriptor is assumed to be hepatic vein obstruction. Thus the planning link PL1 can be considered to be an abstract summarization of the set of relations shown in the figure as links L1, L2, and L3; conversely, the latter may be regarded as instances, or possible "instantiations" of the general link PL1.

The set of possible instantiations of a planning link can comprise either additional lower level planning links, or as in this case "direct links." Note that links L1 and L2, which impinge on the nosologic structure at non terminal nodes, are not considered to be of the same genre of generalized link as PL1. This is because of the stated assumptions, which asserted that jaundice was to be considered compatible with all types of hepatobiliary involvement except for those in the vascular area. Thus, the links connecting "jaundice" to "hepatocellular involvement" (L1) and to "biliary tract involvement" (L2) are expressions of universal quantification. As in the case of the link structures employed in INTERNIST-I, these direct links are attached to a higher level node of the nosology if and only if they apply to all of the subordinate nodes as well.

The "raw" differential diagnosis associated with a finding would be found by compiling a list of all disease nodes that appear as terminals in the tree structures below the point of entry of direct links from the finding into the hierarchy. Thus the raw differential for jaundice would include the same set of disease entities as mentioned previously in our discussion of the causal graph. However, the refined differential for "jaundice" would encompass just the two items identified by the planning links, i.e. ("hepatobiliary disease"? or "hemolytic anemia"?).(21)

As in the case of pathological descriptors, decisions concerning these nosologic categories can often be made on the basis of constrictor relationships in the data that are not actually pathognomonic in the conventional sense of that term. Proceeding systematically in a top down fashion, it is often possible progressively to narrow the set of feasible solutions to a clinical problem by means of a sequence of decisions, each of which partially constrains the outcome on the basis of available clinical evidence.

Continuing the previous example of a patient with jaundice and pallor, as was mentioned, an observation of liver enlargement would be conclusive evidence of a hepatobiliary component to the patient's illness. If, moreover, it is further determined that the liver edge is hard and finely nodular, the character of hepatic involvement might be more specifically described as hepatocellular and even more precisely as fibrotic.

Such a sequence of decisions would be consistent with explanation of the jaundice and pallor in terms of cirrhosis; but it provides no ability to pin down the mechanism by which these manifestations are caused. Recall that in the discussion of the causal graph of Figure 2, we identified nine different attribution pathways by which pallor could be associated with some form of cirrhosis. With the elimination of pathophysiological detail, we have lost the ability to test the postulated attribution by seeking more specific evidence of the operative mechanism. Thus, while the use of a nosological structure can assist significantly in sharpening the problem solver's focus of attention, reliance on that structure alone prevents effective evaluation of proposed solutions.

In the following we consider ways to combine the pathophysiological and nosological knowledge structures in order to obtain both rapid focusing of attention, and critical assessment of postulated attribution pathways.

Task Structuring with Both Pathological and Nosological Descriptors

To summarize the elements of the model of diagnostic reasoning so far presented, it has been suggested that a diagnostic program could make use of a number of knowledge structures that enable the partial characterization of a patient's illness on the basis of cues in the clinical record. Each such partial description would identify an elementary differential diagnostic task. If based on a decision concerning a node in a causal graph, the differential diagnostic task would be to discover which of the possible causes of the decided pathological condition is in fact the case; in the following discussion, such tasks will be referred to as "causal tasks." If based on a nosological knowledge structure, the differential diagnostic task would be to discover which sub-category of that node in the hierarchy is in fact the correct classification; such tasks will be referred to as "subclassification tasks."

In the discussion of the preceding section, problems have been identified with both of the "pure" problem structuring and problem solving schemes. Use of too detailed a causal graph makes it hard to detect task synthesis opportunities and forces consideration of decision problems that may not, in the final analysis, be judged to have been appropriate. Use of a nosological structure allows for rapid convergence on unified task formulations, but does not allow for detailed examination of postulated attributions.

Throughout the foregoing discussion, one type of diagnostic problem solving procedure was considered that could be used to solve elementary differential diagnostic tasks. The approach contemplated was to explore the immediate successors of any decided node in order to move progressively, by means of a sequence of decisions, from what might start as a very general characterization of a clinical problem to the eventual determination of the precise disease or diseases accounting for the patient's illness. Such a procedure would resemble the branching logic of many contemporary computer based diagnostic programs.

One of the ground rules assumed in the description of these sequential decision making procedures was that one decision problem must be resolved before its successor problems might be undertaken. This had the effect of blinding the diagnostic reasoning program to the potential for synthesizing two or more task invocations on the basis of common nodes, unless those nodes had been brought into conscious consideration as part of a refined differential diagnosis.

One way to achieve somewhat more prescient behavior, while continuing to operate within these ground rules, would be to base the problem structuring component of the system on a knowledge base having both pathophysiological and nosological descriptors. The main advantage of such an enriched knowledge structure is that it would allow greater flexibility in the selection of diagnostic tasks used to negotiate portions of the network. 

This is particularly true in certain cases where individual descriptors can be associated with both causal and subclassification tasks. An example would be the pathological descriptor "portal hypertension," which denotes a state of elevated pressure in the portal vein that supplies blood to the liver. As shown previously in Figure 2, there are three recognized sub-types of portal hypertension: presinusoidal, sinusoidal, and postsinusoidal, where the qualifiers are used to designate the site of the lesion that is responsible for causing the elevated pressure. These sub-types of portal hypertension appeared in Figure 2 as constituents in the differential diagnosis of the finding "caput medusae" and the pathological descriptor "esophageal varices."

These variants of portal hypertension can also be viewed as a differential diagnosis of the subclassification type. This would require introduction of a higher level taxonomic descriptor "portal hypertension," as indicated in Figure 5a. Also shown in the figure is a second differential diagnostic task associated with portal hypertension: namely, the causal task of determining what disease process is responsible for producing the lesion. Among others, this differential diagnosis list would include such descriptors as '~portal vein obstruction" (a frequent cause of presinusoidal portal hypertension) "fibrotic hepatocellular involvement" (often responsible for sinusoidal and also occasionally presinusoidal portal hypertension) and "hepatic vein obstruction" (a common cause of postsinusoidal portal hypertension).

 

Fig. 5.  Task constructs associated with (a) Caput Medusae; (b) Jaundice; (c) Pallor; (d) Hematemesis; (e) Arterial Hypotension; (f) Anemia

The advantage of this formulation is that the condition of portal hypertension (unqualified as to type) can often be determined to be present on the basis of a simple clinical observation. This "constrictor" relationship is indicated in the figure by the diamond studded line connecting the finding ("caput medusae") and the "portal hypertension" node. This special type of arc is used to denote what amounts to a diagnostic association, where if the finding is observed, the node pointed to can be concluded with a high degree of certainty.

Note that the finding of a caput medusae, while it permits the conclusion of portal hypertension, does not give any basis for deciding among the sub-types of this pathological condition.

Indeed, to resolve the subclassification task associated with this clinical descriptor would require use of an invasive procedure: i.e. the hepatic vein wedge pressure test that measures the venous pressure inside the liver by means of a catheter wedged into the hepatic vein. Because of the complexity of this test, it is often easier to solve the causal task associated with this descriptor, which in turn may eliminate all but one of the alternatives to be considered in the subclassification task. Sometimes, however, when a quandary arises as to the cause, the diagnostician will undertake the hepatic vein wedge pressure test in order to resolve the subclassification task, thereby constraining the feasible set of alternatives in the causal task.

The significant point about this example is that the introduction of high level taxonomic descriptors can sometimes provide a convenient way to enable a causal reasoning program to skip over what might be a difficult or costly decision problem, now viewed as a subclassification rather than a causal task. To some extent, the incorporation of the additional dimension allows for a foreshortening of the causal pathways, without eliminating the richness of detail that might be of importance in certain decision making situations.

Other pathological and nosological descriptors, many having both causal and subclassification tasks with associated constrictor findings, are illustrated in Figures 5b through 5f.(22) This set of structures captures most of the detail found in the original causal graph of Figure 2. These will be used in the following section to illustrate methods by which two or more such structured tasks may be synthesized on the basis of a set of heuristic combination operators.

Heuristic Operators for Combining Independent Tasks

The One-Step Operators

What distinguishes the general model being described in this chapter is the possibility of having evoked multiple concurrent tasks, possibly derived from a multiplicity of knowledge structures. While it would be possible to proceed as though each evoked task is independent of all the others, and to set in motion a number of parallel invocations of the task solving procedure sketched above, training and experience generally lead the physician to assume-until proven otherwise-that multiple partial descriptions of a clinical problem relate to the same underlying disease process. Therefore, multiple elementary tasks tend to be merged into a small number of unified tasks.

Operationally, the synthesis of two or more independent clinical descriptors requires a method for the combination of their corresponding differential diagnostic task definitions. As suggested previously, the basic operation involved in this process is that of set intersection: i.e., the sets of disease entities (or higher level descriptors) making up the differential diagnoses associated with each of the selected findings can be processed to discover which diagnostic entities occur in all of the selected sets. These common elements constitute the differential diagnosis of the synthesized complex task. 

Depending upon the types of descriptors used to define the differential diagnoses being combined, a number of special cases of the set- intersection synthesis operator can be distinguished (see Figure 6):(23) 

wpe64.gif (36958 bytes)

Fig. 6. A variety of synthesis operators by which multiple task definitions may be combined into unified task complexes.

a) Descriptors P and Q might define sub-classification tasks that are related via some nosological structure; e.g.:

O1) P might be a sub-classifier of Q, in which case we say that P is a specialization of Q. In this case, the result of applying the intersection operator to P and Q is just the descriptor P. 

O2) If neither P nor Q is a specialization of the other, but their differential diagnosis lists have sub-nodes in common(24) then the result of intersection is just the list of common sub- nodes. 

b) Descriptors P and Q might define causal tasks that are related via the pathophysiological network structure; e.g.: 

O3) P might describe a state that is a cause of Q. As in O1 above, the result of applying the intersection operator to these two descriptors would he just the descriptor P. 

O4) P and Q might not be causally related to one another, but have common causes among elements of their differential diagnosis lists. The synthesized differential diagnosis list would contain all and only these common elements. 

c) Descriptors P and Q might be related through some combination of causal and subclassification tasks; e.g.,

O5) P might be causally linked to one or more sub-classifier nodes of Q. The resulting synthesized differential diagnostic task would be to decide among the selected sub-classifiers of Q. There would also be a reduced causal task associated with P.

O6) P and Q or their sub-classifiers might be causally related to identical nodes, or to nodes that are specializations of one another in some nosology. The causal tasks associated with P and Q resulting from application of the intersection operator would be the most specialized set of common causes. 

Two of these operators were employed in the task formulation process of INTERNIST-I discussed previously. In the case where two findings are associated with only terminal level disease nodes, the operation involved is that indicated in the discussion of operator O4 above. Operator O6 characterizes the case where one of the differential diagnoses contains a high level taxonomic descriptor while the other contains a lower level specialization of that concept.

To see how these operators would be applied using the structured task examples of the preceding section, consider first the possible results of combining the elementary tasks associated with the findings of "pallor" and 'jaundice" (Figures 5b and 5c). The only operator that applies in this case is operator O6, which yields the synthesized task shown in Figure 7a. This would give rise to the conjecture that a hemolytic anemia is the cause of both the jaundice and the pallor. Though not indicated in this abbreviated structure, this descriptor, if it were to be concluded, would serve as the basis of additional differential diagnostic tasks: one causal: to find the cause of the hemolytic anemia; the other subclassification: e.g., to determine whether the hemolytic anemia is characterized by warm or cold reacting antibodies. Note that the other potential syntheses discussed in connection with Figure 2, the convergence of the somewhat longer causal chains from "jaundice" and "pallor" onto the "cirrhosis" nodes, cannot be detected on the basis of the one-step operators of Figure 6.

Assume now that an observation of hematemesis (vomiting blood) is also given. As shown in Figure 5d, hematemesis constricts to gastrointestinal blood loss, more particularly localized to the upper G.I. tract. If the hemorrhage is severe enough to bring on hypovolemic shock, this can be a cause of the pallor as well as the hematemesis--though this observation cannot be perceived on the basis of any of the operators of Figure 6. Indeed. the task structure associated with the finding of hematemesis cannot be combined with that of any other finding known at this point. It would not be possible, in particular. to discern the potential synthesis of the hematemesis and jaundice on the basis of some form of fibrotic liver involvement because nothing is known concerning the intervening descriptor of portal hypertension. 

If now a finding of caput medusae is also asserted, leading to a conclusion of portal hypertension, several pieces of the puzzle begin to fall into place. The task structures of jaundice (Figure 5b) and the caput medusae (Figure 5a) can be combined using operator O6 to provide an alternate to the explanation of the jaundice hypothesized in Figure 7a. Moreover, this resulting structure can now be combined with the task structure of hematemesis (using operator O5) to obtain the synthesis shown in Figure 7b.

wpe66.gif (15432 bytes)

wpe69.gif (30499 bytes)

Fig. 7.  Examples of combined task structures resulting from the application of selected synthesis operators (from Fig. 6) to the elementary task constructs of Fig. 5.

At this point, a physician reviewing these data might be inclined to conjecture that this patient's liver disease could be responsible for the pallor as well as the other findings, either via hypersplenism or because of shock associated with the bleeding varices. The reason that this possibility cannot be perceived in the environment defined by the operators of Figure 6 is that the ground rules stated at the outset prevent the consideration of tasks associated with any such descriptor until that condition has been positively concluded. Thus, in order for the overall gestalt to emerge, one would have to establish either

  1. that the pallor is due to hypersplenism, then unify that portion of the structure in Figure 5c with the construct of Figure 7b (using detail provided by Figure 5f), or 
  2. that the patient is indeed in shock--which, once concluded, would justify the linkup of Figures 5c and 7b via the "hypovolemic shock" node. 

Note that at this point, most of these constructs remain largely conjectural, and it is entirely possible that neither of the above is true. Instead, it might be that the bleeding is due to a peptic ulcer (not an uncommon concomitant of cirrhosis, especially in alcoholics), and that the pallor is due to the anemia of chronic disease. Once again, these possible unifying concepts are not elicited by the given set of operators because of the appearance of intermediate nodes in the causal network about which neither positive nor negative conclusions have been reached.

Use of Multi-Step Operators

The combined pathophysiological/nosological knowledge structures outlined in the preceding section made it possible to frame several of the diagnostic tasks is such a way that difficult decision problems could be avoided, or at least postponed, while the search for unifying hypothetical formulations was conducted. However, the reorganization of knowledge contemplated did not carry all the way, for we found many situations where potential unifying constructs could not be detected because of intervening nodes that served to obscure the unification pathways. As it happens, in each of those instances where the potential unifying hypothesis failed to emerge, the opportunity for synthesis would have been detected if we were to allow use of the two-step operator shown in Figure 8a. In this, the nodes labeled P and Q are assumed to have been previously concluded, and the intervening node R stands for some pathological state lying between P and Q, for which no decision can be made on the basis of current information. What this operator would permit is the recognition of a potential unification of P and Q, with R treated as a qualifying assumption, or supposition.

Fig. 8.  A "Spanning Link" (SL1) can be used as a basis of a multi-step synthesis operator that can be used to combine task constructs P and Q even though the status of intervening node R is unknown.

For example, if we were to apply this new operator to the constructs of Figures 5c and 7b (using the additional detail of Figure 5e), it would be possible to derive the unifying hypothetical task construct of Figure 9a. To illustrate the range of possible formulations that can result when using this more remote synthesis operator, Figure 9b shows the various unifications that are possible having "anemia" as the conjectural missing link in the application of this two-step operator.

One way to implement such a "remote" combination operator would be on the basis of an enriched link structure, that would include a set of explicit "spanning links" (not to be confused with the "planning links" discussed previously) one such link to connect each pair of nodes of the graph that were originally two steps removed from one another. If such a higher level link (SL1, say) were to be introduced to connect the nodes P and Q as in Figure 8b, we would speak of the jointed path (Ll-L2) connecting P and Q as a sub-path instantiation of the spanning link SL1.

wpe6D.gif (34826 bytes)

wpe6F.gif (37663 bytes)

Fig. 9. Examp1es of combined task structures resulting from the use of multi-step operators.  Note the occurrence of "supposition" nodes (shown via dashed lines) having unknown status.

While the two step operator of Figure 8 is sufficient to enable the detection of all interesting synthesis opportunities in this simplified example, it is clearly not a general solution to the problem. One would not have to provide much more detail in the knowledge base in order to uncover the need for three-step, four-step, and indeed n-step operators, for arbitrary n. The alternative would be either to restrict the level of resolution of the causal graph to some arbitrary degree so that a given finite family of multi-step operators might satisfy the search requirements, or else to accept the somewhat anomalous behavior that was observed in INTERNIST-I, where the program was often unable to perceive unifying gestalts that were obvious to clinician observers of the system.

It may have occurred to the reader that the general solution to the problem might be obtained by extending the set of spanning links described above to encompass what in mathematical terms is referred to as the "transitive closure" of the relation defined by the original set of causal links. This set would include a spanning link connecting nodes X and Y of the graph any time there is an intermediate node Z such that X is linked to Z and Z is linked to Y, whether by direct causal links or by spanning links. Such an addition to the knowledge base would indeed facilitate the identification of all possible syntheses of task structures; however, it also presents some serious difficulties with respect to the definition of what constitutes a differential diagnostic task.

In our previous discussions, we have defined the causal task associated with a manifestation or pathological descriptor as the set of nodes associated with the source node via "caused-by" links. Such a set of nodes is considered to constitute a task in the sense that the decision maker must choose one of the elements in the decision set as the true cause of the abnormal finding. An anomaly arises, however, if the set of spanning links is used to define such decision sets. In this case the differential diagnostic task defined by a pathological descriptor becomes a strange mixture of causes and effects, including everything eventually reachable by following all causal paths leading outward from that node in the graph. Thus, the differential diagnosis of "jaundice" would include "conjugated hyperbilirubinemia," "cholestasis," "hepatocellular dysfunction," "cirrhosis," "common duct stone," etc. (See Figure 2.) 

The fallacy is that this list of nodes, although all appropriately identified as possible causes of jaundice, cannot be considered a decision set among which a single element is to be selected as the major cause of the observed finding. It would be a mistake, for example, to structure a decision problem to choose between "cholestasis" and "common duct stone" as the cause of a patient's jaundice, for the former is in reality the mechanism by which the latter produces the effect. Thus, there must be some means for singling out from the set of all spanning links a decision subset that is exhaustive, and if not mutually exclusive, at least not patently interdependent.

The Generalized Multistep Unification Operator

By restricting the set of spanning links explicitly recorded in the knowledge base and providing additional information concerning the interdependencies among them, this problem in the formulation of differential diagnoses can be resolved without sacrificing any of the detail and task synthesis capability provided by the full transitive closure. For this purpose, we have developed a generalized multistep unification operator, which requires for each node of the network only that portion of the transitive closure containing the following two classes of links:

a. those that identify the various immediate causes of die pathological state represented by that node. These might, for example, correspond to the relations defined by the low level causal graph illustrated previously in Figure 2.

b. those that identify the various underlying causes of the abnormality. These correspond to the planning links introduced in connection with Figure 4; recall that planning links are one-step associations, used to connect nodes representing manifestations or pathological states to descriptors (often higher level) of the nosologic involvements structure. 

In addition to these classes of links, we introduce an additional data structure that expresses the hierarchy of subpath instantiations associated with any given planning link. This is simply a list, which for each high level planning link, enumerates the set of connected sequences of lower level link structures that reach from the same starting point to the same termination point as the given link.

This hierarchic link structure is illustrated in Figure 10, with the set of subpath relations defined as in Figure 11.(25)

wpe11.gif (51408 bytes)

wpe2.gif (61772 bytes)

Fig. 10. An extract of the new CADUCEUS model of medical knowledge. which combines the separate knowledge structures of Fig. 5 with a detailed causal graph, multiple nosologies, and a network of planning links.  (The right half of the figure is below the left half.)

Generalized Link Subpath Instantiations
PL1
(L6, PL12)

              
(L7, PL4)
PL2
(L2)

              
(L3)

              
(L4)

              
(L5)
PL3
(L1)
PL4
(PL5)
PL5
(L9)

              
(L8, PL7)
PL6
(L10, PL7)
PL7
(PL9)

              
(PL8)
PL8
(L12)

              
(L14)
PL9
(L11)

              
(L13)
PL10
(L15)

              
(L16)

              
(L17)
PL11
(L18)
PL12
(L19, PL6)
PL13
(L19, PL11)
PL14
(L6, PL13)

Fig. 11.  Generalized Links of Figure 10 with Subpath Instantiations.

Figure 10 includes the major components of the nosologic hierarchy of "hepatobiliary involvement," and a small portion of the gastrointestinal hierarchy, limited to "gastric involvement." In addition, there are hierarchic classification structures associated with many pathologic descriptors; e.g., "hyperbilirubinemia," "anemia," "portal hypertension," etc.

Basically, this figure combines the task structures of Figures 5a-5f. For the most part, the terminology, link identification, and pattern of associations recorded there have been maintained. The major new factor is the inclusion of planning links leading from each node representing a pathologic descriptor to every nosological category which subsumes disease entities capable of causing that pathological condition. Note that in many cases, these planning links span large portions of the network, often subsuming many alternate attribution pathways by which a given cause and effect may be associated.

For example, the path PL1 from "pallor" to "hepatobiliary involvement," if expanded progressively through the various subpath instantiations, ultimately encompasses all nine of the low level attribution pathways identified previously, some passing through the "anemia of chronic disease" node, others passing through "portal hypertension" then either "hypersplenism," or "gastrointestinal hemorrhage" and "shock."

Using this knowledge structure, the process of task synthesis can be formulated as a two stage procedure. The first stage involves the detection of unification opportunities, as indicated by the convergence of planning links from two or more manifestations or pathologic descriptors onto the same node of the involvements hierarchy. Such coincident pointers give rise to a conjectural task synthesis, which is then subjected to a second stage of analysis by means of the path unification algorithm. This procedure examines the network of subpaths associated with any given pair of planning links in order to make more explicit the exact nature of the conjectured causal association.

There are several possible patterns of association that might be extracted on the basis of this second stage analysis, which can be construed as multi-step generalizations of the synthesis operators defined in Figure 6. Figure 12 illustrates three of the major patterns identified by the path unification algorithm; they differ primarily with respect to the degree of overlap exhibited by subpaths of the planning links being unified.

wpe72.gif (33437 bytes)

Fig. 12. Three patterns of planning lint instantiation that can result from generalized multi-step synthesis operations of the path unification algorithm.

a. The two planning links may have subpath instantiations suggestive of the unified structure shown in Figure 12a. Here, the two descriptors "shock" and "portal hypertension" project via planning links PL12 and PL7 into the "hepatobiliary involvement" node. Closer examination reveals that the link from "shock" to "hepatobiliary involvement" (PL12) can be instantiated by means of the subpath (L19, L10, PL7), which overlaps in its last element the planning link from "portal hypertension" to "hepatobiliary involvement." This establishes that there is at least one valid pattern of association relating "shock" and "portal hypertension," namely the latter being the cause of the former (via subpath (L19, L10)). Unification of task structures on the basis of this pattern of association could be considered to be an operation that is the multi-step generalization of operator O3 (or in some cases operator O5) of Figure 6

b. In Figure 12b, the pattern is somewhat different. Here, "pallor" and "upper gastrointestinal hemorrhage" project into the "hepatobiliary involvement" node via planning links PL1 and PL6. In this case, the unification algorithm would discover the subpaths (L7, L8, PL7) for PL1, and (L10, PL7) for PL6--which overlap in their rightmost element. This pattern of association reveals the existence of an intermediate pathological state (in this case "portal hypertension") capable of causing each of the initial observations; hence, this pathologic descriptor--rather than some unspecified liver disease--can be viewed as the real basis for unifying these tasks. A synthesis operation based on this pattern of association can be construed as a multistep generalization of operator O4.

c. Yet another pattern of association of two planning link structures is illustrated in Figure 12c. Here, the planning links from "anemia" (PL4) and "portal hypertension" (PL7), which converge on 'hepatobiliary involvement," suggest that there might be a common cause of these two conditions. However, the convergence of planning links--being expressions only of the existence of a link to some, but not all lower level nodes--cannot be taken as guaranteeing that these task structures can actually be unified. In a situation such as that of Figure 12c, the path unification algorithm would discover the most general subpaths such that at least one of the members terminates in a direct link into their common target node. In this example, although planning link PL9 merely expresses the fact that there is some form of "fibrotic hepatocellular involvement" capable of causing "portal hypertension," direct link L9 expresses the generalization that all forms of "fibrotic hepatocellular involvement" can cause "anemia of chronic disease." Thus, it is assured that there is at least some subclassifier of "fibrotic hepatocellular involvement" that is a common cause of the given pathologic conditions. A synthesis operation based on this pattern of association can be viewed as a multistep generalization of operator O6. This generalized synthesis operation, dealing as it does with the coincidence of subclassification tasks, can also be construed as a generalization of operators O1 as well.(26)

Because of the rapid focusing potential provided by the planning links, use of these generalized synthesis operators (referred to in the following as G1-G6, to correspond to their one-step analogues of Figure 6) allows for incorporation of an arbitrarily fine grained causal graph in the knowledge base, thus enabling critical evaluation of the various attribution pathways that instantiate hypothesized unifications. This synergistic blend of the two basic knowledge structures is not obtained without some cost, however, as the potential for proliferation of unifying constructs requires use of a sophisticated control strategy, as discussed in the following section.

Searching for the Right Task Formulation

The knowledge structures and synthesis operators outlined in the preceding section provide a rich environment in which to conduct both the problem formulation and problem solving aspects of diagnostic reasoning. 

One of the major benefits of the knowledge structure is that it provides access to a large number of pathologic and nosologic descriptors, about which conclusions can often be reached on the basis of constrictor patterns derived from clinical or paraclinical data. Such concluded descriptors become accumulation points, where many elementary tasks--both causal and subclassification varietals--become provisionally "solved" on the strength of hypothesized attributions, which credit the concluded descriptor with the power to explain all manifestations causally associated with it. This has the potential to reduce significantly the number of separate tasks to be considered. 

Then, in a fashion resembling the progressive design process described by Simon, multiple partial descriptions of a clinical. problem can be used mutually to constrain one another, in many cases yielding a significant reduction in the number of alternatives to be considered in the decision sets of each of the tasks remaining to be solved. In some cases, the application of a synthesis operator reduces the decision set to a unique alternative, proposing thereby a specific solution to the problem. More commonly, the range of alternatives is progressively constrained as more and more elementary task constructs are assembled into a unified structure.

Still there may remain a number of decisions to be made and issues to be investigated. For example, there may continue to be alternatives within the decision sets of those elementary task structures selected for synthesis. In the example of Figure 9a, three of the four original possibilities remain as explanations of the portal hypertension after all synthesis steps have been taken; this decision task therefore continues to require attention and further investigation. In addition, there are invariably in such a complex task structure a number of "supposition" nodes embedded in multistep pathways about which no conclusion can be made without additional data. These must also become the target of subsequent information gathering and decision making as confirmation of the overall pattern of attributions is pursued.

It is important to bear in mind the observation made earlier that the principle of parsimony is merely a heuristic guide; while it is useful in narrowing the focus of attention, it is not a valid basis for making final clinical judgments. Thus, for example, while a conclusion of portal hypertension may be taken as presumptive cause of a patient's ascites, if subsequent analysis of the ascitic fluid should show findings inconsistent with this being a transudate, then the tentative synthesis maneuver by which portal hypertension and ascites were combined must be undone (or at least substantially penalized).

It should be evident from the previous discussion that the operation of task synthesis is not an algorithmic process whereby the elements of a differential diagnostic hypothesis fall together as unambiguously as the pieces of a jigsaw puzzle. On the contrary. there are invariably a large number of options concerning the assembly of a unified construct, and the resulting formulation depends substantially upon the elements that are selected for application of synthesis procedures, and the order in which they are combined.

To keep track of the choices made and to make explicit the consequences of these various maneuvers, we can employ a model of search commonly used in artificial intelligence applications. Although, as noted earlier, differential diagnostic tasks are them- selves not of the form usually thought of as requiring use of artificial intelligence techniques, the higher level process of task definition fits well the classical Al paradigm of "state space search."(27) Use of state space methods enables the maintenance of a concurrent set of complex hypothesis states in which alternative configurations of the set of diagnostic tasks can be created and explored systematically. 

In developing a state space representation of a problem domain, it is necessary to identify three major components of the model:

  1. A specification of the elements that constitute an initial state description (S0) of the problem to be solved. 
  2. A set of transformation operators (Ti) by which any state may be transformed into one or more successor states. 
  3. An operational definition of the goal state (Sg), which can be used to test states as they are generated in order to discover whether the goal has been attained. 

In the context of the task-definition problem of medical diagnosis, the following associations can be made between elements of the state-space model and those of the problem domain. 

  1. An initial state (of the diagnostic task definition) consists of the set of partial descriptors that have been evoked on the basis of constrictor relationships in the data, along with their elementary causal and subclassification task definitions. 
  2. the set of transformation operators are those outlined at the end of the preceding section. 
  3. the goal is defined as a state in which every task is decided, with no decisions depending for their justification on heuristically imposed constraints. 

The procedure involved in searching for a state satisfying the goal criteria can be interpreted in light of the Simon model of Figure 1 as a high level control program, which:

The space of alternative configurations of the set of diagnostic task definitions can be depicted graphically as in Figure 13. In this, it is assumed that four partial descriptors ("pallor," "gastrointestinal bleeding," "hyperbilirubinemia," and "portal hypertension") have been decided on the basis of constrictor relationships in the given patient data. In the initial state S0, the elementary differential diagnostic tasks associated with these descriptors are shown as unrelated to one another, as indeed they might prove to be in the final analysis. Other states, which assume certain allowable interdependencies among the elementary tasks, are generated by application of the synthesis state transformation operators one by one.

wpe23.gif (41832 bytes)

Fig. 13. Alternative conceptualizations of a clinical problem are maintained by means of a state space control structure, which keeps track of the incremental synthesis steps (here denoted as G3 or G4) by which the elementary task constructs of the initial state (SO) become combined into the unified "gestalts" of the terminal states (S7, S8, and S9).

The process of generating new states by applying selected synthesis transformations can--in principle--be continued until all allowable combinations are formed. It is important to realize, however, that in complicated clinical problems, it is generally not feasible to consider all possible combinations. With ten descriptors, the potential number of combinations is of the order of ten million; with twenty descriptors, this number increases to more than a trillion.

The impossibility of searching such enormous spaces exhaustively is the reason that heuristic evaluation rules are commonly used in AI programs to assess the merit of states as they are generated. The focus of attention is then directed towards exploration of those regions of the state space having the greatest presumed merit.

Although there are many criteria that might be employed to gauge the merit of a complex diagnostic task definition, the most commonly accepted heuristic criterion in diagnostic reasoning is Occam's razor. This is the rule that states that the simpler of competing hypotheses is to be preferred to the more complex. Operationally, this rule could be used to assign merit scores to states on the basis of their cardinality: i.e., the number of unconnected subgraphs they contain. Alternatively, the merit of a state might be made dependent on the number of elementary tasks unified there into a single complex task definition.

By these measures, the most meritorious states in Figure 13 would be S7, SB, and S9. Note that the evaluation of merit can change as additional information becomes available. For example, in a clinical situation such as that represented in Figure 13, assume that esophagoscopy fails to reveal any sign of varices. Recall that in the simplified knowledge base of Figure 10, "esophageal varices" is a "supposition" state implicitly entailed in the attribution of hematemesis to portal hypertension. Thus if it is determined that confirmatory evidence for varices is not present, then the synthesis maneuver performed in going from state S1 to S4 must be judged invalid, eliminating from further consideration state S4 and all of its successors (here only S7). Similarly, the synthesis step applied to obtain state S5 from S2 must be rejected, thereby eliminating state S5 and its successors (S8 and S9). Under these revised circumstances, the evaluative criteria would cause the focus of attention to be redirected to states S3 or S6, where the gastrointestinal bleeding is no longer presumed to be part of the same clinical problem as the pallor, jaundice and portal hypertension.(28)

As a further component in the assessment of merit of hypothesis states, it might be useful to incorporate a weighting scheme for links (analogous to that used in INTERNIST-I) so that consideration could be given to such measures as the average link weight, the strength of the weakest link, etc. Other candidate measures might incorporate some assessment of a priori probability of elements in a differential diagnostic decision set.

In most applications of AI state space search methods, the only basis for comparing states is the merit score assigned by the heuristic evaluation function. States are ranked on the basis of merit, and the attention of the problem solver is directed to whatever state appears best on the basis of the assigned measure of merit. There is rarely the need to maintain explicit representation of more than one state (the others can always be regenerated as necessary), as the problem solver has no occasion to examine more than one state at a time.

Unlike these other applications of AI state space methods, in the diagnostic task formulation problem it is often useful to be able to make comparisons across state descriptions. For one thing, this might help in identifying discriminating questions that can assist in the meta level problem of deciding which state represents the right conceptualization of the diagnostic task. In addition it would be helpful to the physician using such a system to be able to see a comparison of the major alternative conceptualizations--perhaps evaluated on the basis of different heuristic complete with a critique, telling what fits, what does not fit, and what is required to settle the issues that remain.

Based on comments offered by a number of physicians who have used INTERNIST-I to solve real clinical problems, it seems reasonable to expect that this improved mode of user interaction would go a long way towards gaining acceptance of the program's recommendations and decisions. The physician needs to understand the reasons why the program is considering the problem in the way it does, and where its questions are coming from. Moreover, he needs to be reassured that reasonable alternative conceptualizations-most notably his own-have been considered and dealt with appropriately. Given the state space formalism discussed above, it would not be difficult to devise man/machine interaction facilities to enable the physician to express whatever interpretations he may have come to concerning a clinical case and ask the system for a critical evaluation of his conceptualization. In this way, the dialogue between the physician and the computer based consultant could begin to approach the familiar patterns of interaction between the physician and his professional colleagues.

Summary and Conclusions

In this chapter, I have attempted to provide a conceptual framework within which to characterize the many approaches to computer aided medical diagnosis. The discussion has largely centered around the type of task structure that physicians refer to as a differential diagnosis. As studies of clinical cognition have typically revealed evidence that physicians formulate such tasks early in the patient encounter and use them as contexts in which to organize their search for discriminating data, it is not surprising that builders of computer-based diagnostic systems have mistakenly come to believe that the essence of diagnostic expertise consists of a corpus of procedures for dealing with well structured differential diagnostic tasks.(29) In contrast, I have argued that it is the process whereby these differential diagnostic tasks are formulated which poses the interesting intellectual and computational challenge. Accordingly this chapter has dealt with general issues in the design of heuristic methods for achieving structure within this ill structured domain.

Two major approaches to machine aided formulation of diagnostic tasks have been outlined in this chapter. One of these uses a simple "generate and test" control strategy to evoke systematically a succession of binary choice tasks formulated not as differential diagnoses but rather as true/false decision problems. Because this leads to undifferentiated lists of hypotheses and fails to exploit the decision making power of differential diagnostic procedures, we concentrated instead, on the articulation of methods for the direct evocation of differential diagnostic tasks.

One approach to this was illustrated by means of the INTERNIST-I program, which causes the evocation of a differential diagnostic task on the basis of each observed manifestation. These tasks are then merged into unified constructs using a partitioning algorithm that mimics the process of multiple set intersection. We noted many behavioral deficiencies of this system-not all of which impacted on the eventual decision reached-but which on the whole tended to render the system less than fully acceptable to the medical community. Many of these problems derived from the fact that the INTERNIST-I knowledge base employs a very shallow causal graph, deliberately limited in order to compensate for the program's sequential problem formation and problem solving procedure.

Through analysis of these shortcomings, we have come to a new design for the medical knowledge to be incorporated in the successor system, now renamed as CADUCEUS. This new knowledge representation provides multiple nosologic structures, by which disease entities may be classified in as many descriptive ways as appropriate. In addition, there is provision for a representation of detailed pathophysiology, by means of a causal graph having no restriction as to level of resolution. These basic structures are supplemented by a set of generalized links--a subset of the transitive closure of the causal graph--which provides for as rapid convergence on tentative unifying hypotheses as in INTERNIST-I, while at the same time enabling access-via a sub-path instantiation mechanism--to as much detail as is available in the underlying causal graph.

This result has been facilitated by means of a path unification algorithm used to combine elementary task definitions into unified complexes. As application of this synthesis operator cannot be considered irrevocable, it is necessary to envelop these heuristic maneuvers within a sophisticated control regime. Thus we have discovered within the task environment of medical diagnosis a core problem, the solution of which requires some of the most powerful methods available in the armamentarium of artificial intelligence.

It soon becomes apparent to anyone who sets about to study the medical reasoning process that there is a considerable intellectual component to medical expertise. Outstanding practitioners of the art exist, and they are known for their ability to converge rapidly on the essential nature of a diagnostic task, sensing "instinctively" the variety of ways in which the pieces of a complex puzzle can fall into place. A review of the record of clinical pathology conferences, where experts are asked to structure and solve difficult clinical problems, often reveals virtuoso performance in this creative endeavor.

In view of this richness of human behavior in diagnostic reasoning, it should come as no surprise that the simple algorithmic approach to computer aided diagnosis has proved unsatisfactory. What has misled many investigators is the undeniable prominence of the differential diagnosis as a primary structuring feature of the reasoning process. As has been demonstrated many times, this aspect of diagnostic reasoning does not require artificial intelligence methods. What I have tried to show in this chapter is that there is, indeed, an intellectually challenging aspect of clinical reasoning generally overlooked in the design of algorithmic diagnostic procedures. This is the meta level problem involved in the structuring of medical diagnostics: the heuristic search for the right task formulation.

References

1. Ben-Bassat, M., Carlson, R. W., Puri, V. K., Davenport, M.D., Schriver, J. A., Latif, M., Smith, R., Portigal, L. D., Lipnick, F. H., and Weil, M. H., "Pattern-Based Interactive Diagnosis of Multiple Disorders: the MEDAS System," IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2, (2) (March 1980), 148-160. 

2 de Dombal, F. T., Leaper, J. R., Staniland, J. R., et al., "Computer-Aided Diagnosis of Abdominal Pain," Brit. Med. J. 12, (1972), 9-13. 

3. Doyle. J., "A Truth Maintenance System," Artificial Intelligence 12, (1979), 231-272. 

4. Elstein, A. S.. Shulman, L. A., and Sprafka, S. A., Medical Problem Solving: An Analysis of Clinical Reasoning, Harvard University Press, Cambridge, Mass., (1978). 

5. Engle, R. L., Flehinger, B. J., Allen, S., Friedman, R., Lipkin, M.. Davis, B. J. and Leveridge, L. L., "HEME: A Computer Aid to Diagnosis of Hematologic Disease.," Bull. N. Y. Acad. Med. 52, (June 1976).

6. Ernst, G. and Newell, A., GPS: A Case Study in Generality and Problem Solving, Academic Press, New York, (1969). 

7. Feinstein, A. R., "Clinical Biostatistics XXXIX, The haze of Bayes, the aerial palaces of decision analysis, and the computerized Ouija board," Clinical Pharmacology and Therapeutics 21, (4) (April 1977), 482-496. 

8. Kassirer, J. P., and Gorry, G. A., "Clinical Problem Solving: A Behavioral Analysis," Ann. Int. Med. 89, (1978), 245. 

9. Kuhn, T. S., The Structure of Scientific Revolutions, Second Edition, Vol.2, No. 2, International Encyclopedia of Unified Science, U. of Chicago Press, Chicago, Ill. (1970). 

10. Ledley, R. S., and Lusted, L. B., "Reasoning Foundation of Medical Diagnosis: Symbolic logic, Probability and Value Theory and our Understanding of How Physicians Reason," Science 130, (1959). 

11. Nilsson, N. J., Problem Solving Methods in Artificial Intelligence, McGraw-Hill Book Company, New York, (1971). 

12. Nilsson, N. J., Principles of Artificial Intelligence, Tioga Publ. Co., Palo Alto, Cal, (1980). 

13. Patrick, E. A., Stelmack, F. P., and Shen, L. Y-L., "Review of Pattern Recognition in Medical Diagnosis and Consulting Relative to a New System Model," IEEE Trans. on Systems Man and Cybernetics SMC-4 1, (1974), 1-16. 

14. Patrick, E. A. and Shen, L. Y-L, A Systems Approach to Applying Pattern Recognition to Medical Diagnosis, TR-EE 75-12, Purdue University Medical Computing Program, (May 1975). 

15. Pauker, S. G., and Kassirer, J. P., "Therapeutic Decision Making: A Cost-Benefit Analysis," New Engl. J. Med. 293, (July, 1975), 229-234. 

16. Pople, H. E.. Jr., "On the Mechanization of Abductive Logic," Proc. Third Intl. Joint Conf on Artificial Intelligence, Stanford Research Institute, Menlo Park, California, (1973). 

17. Pople, H. E., Jr., "The Formation of Composite Hypotheses in Diagnostic Problem Solving: an Exercise in Synthetic Reasoning," Proc. Fifth Intl. Joint Conf. on Artificial Intelligence, Carnegie-Mellon U., Pittsburgh, Pa, (1977). 

18. Sherman, H.. A Comparative Study of Computer-Aided Clinical Diagnosis of Birth Defects, S.M. thesis, Dept. of Electrical Engineering and Computer Science, MIT, Cambridge, Ma, (January 1981). 

19. Shortliffe, E. H., and Buchanan, B. G., "A Model of inexact reasoning in medicine," Mathematical Biosciences 23, (1975), 351-379. 

20. Simon, H. A., "The Structure of Ill Structured Problems," Artificial Intelligence 4, (1973), 181-201. 

21. Stallman, R. M. and Sussman. G. J., Forward Reasoning and Dependency-Directed Backtracking in a System for Computer-Aided Circuit Analysis, (1977), 135-196. 

22. Weiss, S. M., Kulikowski, C. A., Amarel, S., and Safir, A.. "A Model-Based Method for Computer-Aided Medical Decision-Making," Artificial Intelligence 11, (1978), 145-172.

Notes

(1) This investigation has been a collaborative effort primarily involving the author, a computer scientist, and Dr. Jack D. Myers,. a specialist in internal medicine, who--in addition to fabricating the INTERNIST knowledge base--has provided an exemplary model of clinical expertise and a standard against which system performance might be measured. Significant medical input has also been provided by Dr Randolph A Miller who played a major role in defining the form and substance of the INTERNIST-I knowledge base, and the late Dr. Zachary Maraitis whose skepticism and critical insight helped set the stage for INTERNIST-II. Implementation of the computer models has been greatly assisted by Kenneth W Quayle, Craig D. Dean, and Charles E. Oleson--all graduate students in computer science at Pitt.  Funds for this work have been provided in part by grants from the Division of Research Resources of the National Institutes of Health (grant no. R24 RR 01101), the Bureau of Health Manpower of the Health Resources Administration (grant no. R01 MB 00144) and the National Library of Medicine (grant no. R01 LM 03710).

(2) For a variety of reasons, including a request from an agency alleging a prior claim on the name, future generations of the diagnostic program originally called INTERNIST will subsequently be referred to as CADUCEUS. This universal symbol of the medical profession seems appropriate to the expanded role we see for this type of program in the years to come. To avoid confusion in this chapter, the original program will continue to he called INTERNIST-I, while references to the successor system, originally called INTERNIST-II, will now employ the new name.

(3) See Patrick, et al., [13] for a thorough review of this literature, circa 1974.

(4) See, for example, de Dombal, et al., [2]. 

(5) While based on my own observations. this characterization of the clinician's handling of discrepant data is consistent with the findings of Kassirer and Gorry [op. cit., p. 252].

(6) See Ledley [10] for a discussion of Bayesian decision methods and their application to medical diagnosis.

(7) This approach has been investigated by Engel et al. [5] and by Ben-Bassat et al. [1]. The hypothesis generation mechanism of MYCIN [19] can also be interpreted as a binary choice task formation strategy. 

(8) Although he refers to his program as "INTERNIST," Sherman's version does not actually use either the code or the knowledge base of INTERNIST-I.

(9) Nilsson [12] provides a good overview of a variety of control strategies useful in heuristic problem solving.

(10) The process of hypothesis formation using a synthesis operator viewed as a model of abductive reasoning was first introduced in Pople [16]. 

(11) Unlike the evoking strength, which tends to be a subjective estimate that requires extensive clinical experience, the frequency weights can often be supported on the basis of somewhat more objective data, obtained from a careful review of the literature. Still, these estimates must be characterized as largely subjective, due to the high degree of variability in the literature.

(12) The reason for this difficulty in task synthesis will be discussed in some detail in later sections.

(13) The scoring process employs a simple geometric mapping, used to reflect the intention of the knowledge base designers in assigning weights; namely, that two manifestations with an evoking strength of four would be considered equivalent to one with an evoking strength of five, two with evoking strength of three are equivalent to one of four, etc. Similar considerations apply in the assignment of the frequency and importance weighting factors; hence, they, too, are subjected to a geometric transformation before being combined algebraically as outlined above to obtain the total score assigned to each evoked disease hypothesis.

(14) It should be noted that this procedure accomplishes approximately the same effect as the performance of multiple set intersections (of the differential diagnosis lists of those findings explained by the most highly ranked disease hypothesis), but is much more efficient computationally. It differs in that certain diseases, capable of explaining a significant portion but not all of those findings explained by the leading contender, may also be included in the resulting synthesized differential diagnosis.

(15) The use of causal models in diagnostic reasoning is also discussed in Weiss et al. [22] in relation to the CASNET system.

(16) Medically knowledgeable readers may wonder at the choice of clinical and pathological descriptors employed in this graph. The purpose in selecting those elements was to provide a moderately realistic basis for illustration of a number of benefits and problems involved in the use of this type of knowledge representation: these examples have been chosen to highlight computer science--not medical--issues in the design of the next generation of an INTERNIST knowledge base.

(17) This definition of constrictor is a generalization of the concept of "pathognomonic" association, a term commonly used to characterize a finding that is distinctively characteristic of a particular disease.

(18) In some cases, where the total bilirubin in the blood is greatly elevated, the mechanism of jaundice may involve both conjugated and unconjugated forms of hyperbilirubinemia. In the present discussion, we assume for the sake of argument that this complication does not arise.

(19) Note, too, that between the entry node "pallor" and the three cirrhosis nodes, one can trace out nine distinct attribution pathways. One of our objectives in the following discussion will be to find ways of preserving this richness of detail without allowing it to obscure the problem focusing process.

(20) This type of acyclic graph structure, which allows any given node to have an arbitrary number of parent nodes, is sometimes referred to as a "tangled hierarchy."

(21) Compare these with the refined differential diagnoses given previously, which were based on pathological descriptors of the causal graph.

(22) As in all illustrative examples of this chapter, the knowledge structures of Figure 5 are intended to be suggestive, not definitive. Strictly speaking, for example, the pathologic descriptor "hypersplenism," which is shown in the figure to be a subclassifier of "anemia," should be classified under "splenic involvement." This would then be causally linked to a descriptor "pancytopenia," in the "hematologic involvement" area, which in turn would be linked to "anemia" as one of its major facets. This additional detail, essential in the definitive CADUCEUS knowledge base, was deleted here in the interest of clarity.

(23) In this figure. the small boxes are stylized representations of the differential task structures that might be associated with any pathologic or nosologic descriptor. The arrows emerging from the right hand edge of a box stand for elements in the associated causal task; branches leading downward from the bottom of a box (we refer to these as "descenders") represent the descriptor's subclassification task.

(24) This can only happen if the nosological structure is not strictly hierarchical, but allows lower level nodes to have more than one parent node. As noted previously, such a structure is sometimes referred to as a "tangled hierarchy."

(25) In Figure 10 constrictor links are shown as diamond studded arcs; planning (existential) links are shown as beaded arcs; direct (universal) links are unadorned. Hierarchical subclassification is indicated by means of shaded lines. In tracing out subpaths for any given planning link it is sometimes necessary to follow a direct link to a non-terminal node--then drop down along a subclassification descender to reach a more specific instance of that non-terminal before the continuation of the pathway being traced can be found. For example. one way to instantiate planning link PL1 would be to follow link L6 from "pallor" to "shock," then drop down to "hypovolemic shock" and proceed along L19 to "upper gastrointestinal bleeding." Here, follow the descender to "bleeding esophageal varices" and follow L15, L16, or L17 to some form of "portal hypertension." Assume that L15 is chosen, in which case the final leg in the pathway is L11. Summarizing, one complete low level pathway instantiating planning link PL1 is the sequence (L6, L19, L15, L11). The same sequence could have been discovered by recursively expanding the PL1 instance (L6, PL2) shown in Figure 11.

(26) The only multistep operator not specifically illustrated by the preceding examples is that which is a generalization of operator O2. This synthesis operation involves the detection of common nodes within the subclassification hierarchies of two high level nosologic descriptors. As this process is very similar to the process of detecting common causes in a causal graph, it follows that another application of the methods outlined above for the causal operators can be used to implement multi-step synthesis operations in the tangled hierarchy. For this purpose, A second set of planning structures--these projecting from higher level nodes of the nosologic hierarchy to appropriate lower level nodes--is being incorporated into the CADUCEUS knowledge base. These "planning descenders" permit use of the path unification algorithm described above to provide a generalization of operator O2.

(27) A good discussion of state space search methods can be found in Nilsson [11]. 

(28) This type of search--in a space of hypotheses, some of which are contingent on suppositions that may ultimately prove to be incorrect--is similar to that described by Stallman & Sussman in their paper on dependency directed backtracking [21]. For a detailed discussion of the methods required for this type of "non-monotonic" reasoning, see Doyle [3]. 

(29) No doubt a part of the reason for this emphasis on procedures for solving well structured problems is that in this arena, as in all areas of problem solving and decision making, the analytic process is much better understood than the synthetic.


This is part of a Web-based reconstruction of the book originally published as
   Szolovits, P. (Ed.).  Artificial Intelligence in Medicine. Westview Press, Boulder, Colorado. 1982.
The text was scanned, OCR'd, and re-set in HTML by Peter Szolovits in 2000.