A differential diagnosis is a set of hypotheses that could account for what is known about the patient. The process of differential diagnosis starts with the input findings and the nodes with known values as determined by the previous reasoning operators. Each hypothesis in the differential is an explanation of these findings in terms of causal pathways through the model. The differential consists of the hypotheses found that have the highest probability.
The appropriate mechanism for finding the differential depends on the nature of the diagnostic causal model. The model is similar to those investigated by Pearl[19] as Bayesian probability networks. The difference is that this model has forward loops (excluded by Pearl) and nodes with multiple paths between them (handled only in exponential time by Pearl's methods). We investigated modifications to Pearl's algorithm and to our model. However, even eliminating the forward loops in an earlier model version there were still about 40 links that would have to be cut to analyze multiple paths between nodes. Thus, Pearl's algorithm would require weighted summing of about solutions, which is completely infeasible. Lauritzen and Spiegelhalter's algorithm[11] might improve on this as well as more recent efforts to provide fast implementations[1], but the computation would remain infeasible. Indeed, Cooper has shown that the problem is NP-hard[3]. Thus heuristic methods are necessary to handle the complex networks required to represent this domain.
We have developed a mechanism to generate the differential diagnosis based on the causal paths from primary nodes to the findings. Since only about 50 of the 150 nodes in the model are primary (having a non-zero probability of existing without some other cause), all of the paths from these nodes to all others are generated and stored when the model is loaded. The probabilities along those paths are computed when the patient data is entered. Given these pathways, the task is to find sets of pathways that together account for all of the findings that need a causal explanation. In comparing hypotheses we discovered that the natural notion of different hypotheses requires that they differ in some significant node, nodes which we have labeled diagnostic. The algorithm is as follows: 1) collect the abnormal values from the input and the abnormal nodes asserted to be true in the PSM; 2) find all of the diagnostic or primary nodes that could account for each finding; 3) rank the diagnostic and primary nodes by the number of findings they account for; 4) use the nodes that account for the largest or nearly largest number of findings as seeds for small covering sets of primary nodes; 5) for each covering set, order the findings by the difference between the first and second highest probability path to it; 6) for each finding, find the best path from the partial hypothesis and add it to the hypothesis; 7) and then prune the hypothesis of unneeded primary nodes and extra paths that decrease the probability. Finally, the probabilities of the hypotheses are computed by multiplying the probabilities of the nodes given the other nodes in the hypothesis and they are rank ordered and presented to the user. These probabilities could be normalized by the probability of the findings but that is an intractable problem and unnecessary as long as we are only rank ordering hypotheses. The algorithm is discussed in detail in a paper[12].
This approach to diagnosis differs considerably from others that have appeared in the literature. Reggia's minimal set covering approach[20] ignores the fact that the best hypothesis may not be minimal and would not find the hypothesis in figure 2. Other approaches to diagnosis based on digital circuit analysis[21][4] assume that every node is primary and every node can be measured. If every node were treated that way, a network of this size would be computationally intractable.
Figure 2 goes here.
Our mechanism is effective for producing a meaningful set of hypotheses for the findings in the cases we have tested and it usually takes a couple of minutes on a Symbolics 3650 workstation. The user can compare the hypotheses, see explanations, and consider the differences. Figure 2 is the display of the first of five hypotheses in the differential generated for an actual patient with findings that included rales (fluid sounds in the lungs), pedal edema (ankle swelling), high BUN (poor renal function), nausea, S3 (abnormal heart sound), and runs of VT (arrhythmia). The display graphically presents the complete explanation for the findings and provides a textual summary of the case at the bottom of the screen. In the display the findings are in lower case, intermediate nodes in upper case, primary nodes in bold face, primary probabilities in parentheses, causal probabilities on links and W+ indicating worsening factors that increase the probability and P- indicating correcting factors that decrease it. This hypothesis accounts for the findings with congestive cardiomyopathy and renal insufficiency while the second hypothesis accounts for the findings with congestive cardiomyopathy alone. Those hypotheses nicely capture the actual physician's initial dilemma: whether the high BUN was the result of renal disease or inadequate blood flow to the kidney. Other hypotheses included valve disease, which is an important consideration. This hypothesis illustrates several features of the algorithm: 1) it handles multiple causes; 2) it handles multiple pathways between nodes; 3) findings can be left unexplained (the murmur); and 4) findings caused by therapies (digitalis toxicity here) are handled. (This example is discussed in more detail in[15].)
This method of generating hypotheses is heuristic and indeed it is possible to construct networks where it does not find the best answer. However, only the search is heuristic, not the use of probabilities in ranking hypotheses. Thus, if a better diagnosis is found, it is easy to test that it is better. As a result, it is possible to add hill-climbing techniques or even alternative diagnosis generators and be able to compare the probabilities of the diagnoses. We have tested over 100 actual cases thus far as well as many created cases and have found the algorithm to be effective. On one set of 42 cases, collected while developing the algorithm, the performance was tabulated. In 31 of these the program produced reasonable hypotheses. In five, the hypotheses were almost right but parts of the mechanisms were inappropriate. In the other six cases the best hypothesis was missed. There were two main reasons for these problems: 1) the program did not reason appropriately with the temporal relationships between cause and effect, and 2) it did not handle severity relations appropriately. These problems are part of our research agenda.