[Reprinted from the New England Journal of Medicine 316:685-688 (March 12), 1987]
After hearing for several decades that computers will soon be able to assist with difficult diagnoses, the practicing physician may well wonder why the revolution has not occurred. Skepticism at this point is understandable. Few, if any, programs currently have active roles as consultants to physicians. The story behind these unfulfilled expectations is instructive and, we believe, offers hope for the future.
Research on computer-aided diagnosis began in the 1960s with high hopes that difficult clinical problems might yield to mathematical formalisms. Most work therefore centered on the application of flow charts Boolean algebra, pattern matching, and decision analysis to the diagnostic process. Except in extremely narrow clinical domains, each of these techniques proved to have little or no practical value. Most observers came to believe that for a program to have expert capability, it must in some fashion mimic the behavior of experts. Early work on computer-aided diagnosis was thus largely discarded, and in the early 1970s attention shifted to the study of the actual problem-solving behavior of experienced clinicians.2,5 The resulting insights have subsequently been used to construct models of clinical problem solving that, in turn, have been converted into so-called artificial-intelligence programs or expert systems.1,6,7
The new generation of programs designed to emulate clinical expertise has developed along two quite different paths. One, called rule-based systems, has incorrectly become almost synonymous with the term 'artificial intelligence" in the minds of most physicians. These systems, such as MYCIN,8 are based on the hypotheses that expert knowledge consists of a large number of independent, situation-specific rules and that computers can simulate expert reasoning by stringing these rules together in chains of deduction.9,10 Each rule consists of an If statement followed by a Then statement. The former identifies a situation in which the conclusion or action specified by the latter can be carried out. For example, in a patient who has an infection, the initial rule might be, If there is an organism that makes therapy necessary, Then determine the best recommendation for therapy. The system then seeks out other rules that will help it decide whether a pathogenic organism is present. If a pathogen is found, the system mobilizes other rules to arrive at a treatment recommendation. The conclusion reached by a group of rules is accepted when a numerical scoring factor exceeds some critical threshold.
The pioneering work on rule-based systems was designed to deal with clinical problems,9 but ironically, nearly all successes with such systems have occurred outside medicine. Rule-based systems have proved useful in a variety of commercial tasks, such as evaluating trouble with telephone lines, laying out preventive maintenance programs for large power stations, and configuring computer systems. But it is clear that these successes have been possible primarily because the domains that have been addressed are limited and because the programs are valuable despite their inability to perform at a nearly perfect level; failure to identify a defect in a telephone network can be tolerated more readily than a misdiagnosis in a seriously ill patient.
In contrast to phone systems and power stations, the domain of even a single major field such as internal medicine is so broad and complex that it is difficult, if not impossible, to capture the relevant information in rules. Furthermore, other important difficulties in rule-based systems prevent even relatively small programs from performing effectively and with an acceptable degree of reliability. Although rules appear to be free-standing entities, their interactions with other rules are not always consistent or predictable. To achieve the desired overall behavior from a system, the author of the rules must anticipate the ways in which each rule will interact with every other. Moreover, as the domain encompassed by a rule-based system is expanded, new knowledge often interferes with information already available, in ways that are unexpected and difficult to remedy.'1"2 Such problems are hardly surprising, given that there is no explicit overall diagnostic strategy governing the flow of reasoning by the program. Rules capture only the surface behavior of experts, not the reasons they behave as they do. The obvious shortcomings of rule-based systems have constrained their practical application to a few limited clinical situations, such as the evaluation of pulmonary-function tests13 and the interpretation of findings obtained by electrophoresis.14
During the 1970s, in parallel with the work on rule-based systems, there evolved a second and very different approach to the modeling of human clinical expertise. This school of thought views diagnostic acumen as the ability to construct and evaluate hypotheses by matching a patient's characteristics with stored profiles of the findings in a given disease. A numerical value is used to indicate how often a particular finding is encountered in a given disease a second value indicates how strongly a particular finding should arouse suspicion that the given disease is present. The finding is also weighted according to its clinical importance: e.g., massive gastrointestinal bleeding or a high fever is assigned more importance than low back pain or mild leukocytosis. Programs relying on the approach just described produce a diagnostic ranking of the various hypotheses by using a scoring method that considers these three weights.15
In one such program, the Present Illness Program,16 all disease entities are considered competitors—a weakness that leads to poor performance when two diseases coexist. But another program, INTERNIST, overcomes this problem by using a strategy that allows it to identify a set of competing hypotheses and to postpone consideration of other diseases that may be present.17
Programs that match clinical findings with stored profiles of diseases often perform in an impressive fashion but nevertheless demonstrate serious weaknesses, which preclude their practical application in consultation. First of all, they are virtually unable to cope with variations in the clinical picture. In particular, they have difficulty in recognizing variations in the way that a disease can present, in terms of both the spectrum of findings and severity. They are also unable to cope with the evolution of a disease over time, as in the case of acute glomerulonephritis. Furthermore, they cannot recognize how one disease may influence the presentation of a second, or how the effects of previous treatment can alter the patient's illness. Finally, programs based on simple matching strategies are unable to explain to the physician how they have reached their conclusions. Some of these deficiencies were noted in an editorial in the Journal several years ago.18
By the late 1970s, disappointments with both rule-based systems and matching strategies stimulated investigators to move in directions that have led to new insights, even if not to clinically useful programs. Studies of problem-solving strategies employed by experts made it ever more apparent that clinical expertise in difficult medical cases is to a considerable extent reliant on causal, pathophysiologic reasoning. Investigators became aware that the key deficiencies in most previous programs stemmed from their lack of pathophysiologic knowledge. Only programs relying on such reasoning would be able to cope with the enormous number of ways in which diseases can present, evolve, and interact with each other.
In an effort to simulate expert performance, new programs build specific models of a given patient's illness by linking clinical findings with pathophysiologic knowledge stored in the program's memory.19-21 Such knowledge is organized as nodes of information connected by links that specify causal relationships. Nodes contain knowledge such as the range of clinical and laboratory findings that can be anticipated at the onset of an illness and during its evolution. One program also incorporates information on how the clinical picture of disease varies with its severity.20 Some nodes deal with effects and others with causes, and the links between them allow the program to use patient-specific information about the severity and temporal stage of an illness to determine whether the findings in the nodes match the disease hypothesis. Links not only reason forward, from cause to effect, but backward, from observed effects to their expected causes. The program can thus use observed findings to reason about causation, and vice versa.
When confronted with a case, be it a chief complaint or a larger body of information, the program constructs a small set of hypotheses that is consistent with the available information.19-21 Pathophysiologic knowledge provides a powerful mechanism for establishing constraints that allow identification of the most reasonable hypotheses. In one program, the chief organizing principle consists of linking all important abnormalities by a chain of causal reasoning. Only the limited set of hypotheses that is compatible with the resulting logical structure then need be considered.21
To further the process of differential diagnosis, each hypothesis, as embodied in the computer-generated model of the patient's illness, is expanded to create a scenario that projects the consequences to be expected if that particular disease is present. On the basis of these scenarios, the program identifies additional information that could differentiate among the various diagnostic possibilities.20 For example, the urinary sodium concentration would be singled out as a feature that can help to distinguish between oliguria due to acute tubular necrosis and that due to volume depletion. Scenarios thus provide a powerful strategy for efficient acquisition of further information, be it historical data, laboratory findings, or other relevant data.
Although detailed pathophysiologic knowledge has greatly increased the ability of a program to handle complexity, it has also added enormously to the computational task.20~22 When a program employing causal reasoning is asked to explore each case in great detail, including straightforward cases that do not merit such attention, the process is so slow that it is impractical even with modern high-speed computers. To deal with this difficulty, a strategy has been developed that allows reasoning at multiple levels of detail.22 In the straightforward cases the program begins by simply looking at shallow associational information (e.g., that pulmonary insufficiency causes hypercapnia). But when such a relatively simple strategy fails to resolve the problem, the program moves to deeper levels of reasoning that allow detailed evaluation of each observed abnormality and its contribution to the clinical picture. For example, a reported loss of blood that is not sufficient to account for an observed degree of anemia will alert such a program to look for other causes of bleeding or, in the absence of such a cause, to consider the possibility of a laboratory error.
Programs based on causal, pathophysiologic reasoning also have the great virtue of leaving a trail that can be converted into an English-language explanation of their diagnostic activities.23,24 Without such explanations, it is obviously unreasonable for the physician to rely on such programs; ultimately, a program, like any consultant, must justify its conclusions to the physician responsible for the patient's care.25
Ironically, now that much of the artificial-intelligence research community has turned to causal, pathophysiologic reasoning, it has become apparent that some of the earlier, discarded diagnostic strategies may have important value in enhancing the performance of new programs. Programs that use causal reasoning typically have only general-purpose strategies for exploring competing hypotheses and playing out scenarios. But such strategies are often quite inefficient because the exploration of the diagnostic possibilities in particular clinical situations can be embodied in rules or flow charts that are highly specific to the problem at hand, such as gastrointestinal bleeding or chest pain.26'27 Moreover, if further diagnostic information can be obtained only at the cost of some risk or pain, a component of decision analysis will almost certainly be a useful addition to a program, allowing systematic balancing of medical costs against medical benefits.28,29 An extensive research effort is required, however, before all these techniques can be incorporated into a single program.
It is obvious from the foregoing discussion that we have not reached the point at which artificial-intelligence programs can act as reliable consultants on a wide range of medical problems. But, quite logically, attempts are under way to use specific components of recent research in artificial intelligence to implement programs that can provide simple but potentially valuable diagnostic assistance to the physician. For example, the enormous data base of INTERNIST17 may provide a considerable advantage over textbooks to the physician who is searching for facts about a particular illness.30 The data base for a given disease in medical textbooks typically consists simply of a list of manifestations, accompanied by ambiguous and relatively unhelpful qualitative descriptions—e.g., that a given finding occurs "frequently" or "uncommonly." The INTERNIST data base has the advantage, as we have noted earlier, of including numerical information on the frequency of findings and on the diagnostic importance of each finding. Furthermore, by applying relatively simple computing strategies to such a data base, the program can generate a list of hypotheses that may deserve consideration. Unfortunately, such lists are usually quite long and thus do not greatly narrow the diagnostic focus; instead, they provide a checklist that helps the user make certain that no diagnostic possibility has been overlooked.
Relatively simple systems such as those just described can also be used for critiquing diagnostic hypotheses or plans for treatment.31'32 It is far easier for a program to examine the reasonableness of a plan constructed by a physician than to create such a plan itself.
In 1970 an article in the Journal predicted that by the year 2000 computers would have an entirely new
role in medicine, acting as a powerful extension of the physician's intellect.33 At the halfway point,
how realistic does this projection seem? It is now clear that great progress has been made in understanding how
physicians solve difficult clinical problems and in implementing experimental programs that capture at least a
portion of human expertise. On the other hand, it has become increasingly apparent that major intellectual and
technical problems must be solved before we can produce truly reliable consulting programs. Nevertheless, assuming
continued research, it still seems possible that by the year 2000 a range of programs will be available that can
greatly assist the physician. It seems highly unlikely that such a goal will be achieved much before that time.
Tufts University School of Medicine
Boston, MA 02111
WILLIAM B. SCHWARTZ, M.D.
Massachusetts Institute of Technology
Cambridge, MA 02139
RAMESH S. PATIL, PH.D.
Supported by grants from the Commonwealth Fund and the Robert Wood Johnson Foundation (Dr. Schwartz) and by grants from the National Institutes of Health: R24-RR-01320 from the Division of Research Resources, RO1-RL-33041 from the National Heart, Lung, and Blood Inslilute, and RO1-LM-04493 from the National Library of Medicine (Drs. Patil and Szolovits). The views expressed are those of the authors and do not necessarily represent the views of any of the granting agencies.