Next: Example Problem Up: Heart Disease Program Previous: Problem Domain

Diagnosing Heart Disease

Having characterized the diagnostic problem as generating causal physiologic explanations for the given findings, the computational mechanism with the best fit to generate those explanations is a Bayesian probability network. To first approximation, the best hypothesis or explanation is the subset of the network that is true in the maximum likelihood state of the network. This computational characterization of the problem assumes that the links in the network, which represent the conditional probabilities in the domain, can also represent the causal relations needed to characterize a scenario accounting for the findings.

In the heart disease domain, this fit has some problems. First, there are situations in which the appropriate causal characterization is that A can cause B and (perhaps with intervening links) B can cause A. Keeping the Bayesian network faithful to the sense of causality results in forward loops in the network, which are inconsistent with the mathematics of a Bayesian network. Because of this, the HDP uses a pseudo-Bayesian probability network. The knowledge base has forward loops in it, although any particular hypothesis does not. To reason with such a network, heuristic methods are necessary[9].

A probability network assumes that nodes are completely characterized by their truth (the conditional independence assumption that gives the network its power). That is, if the node is true (or has one of a small fixed set of values), it isolates its causes from its effects unless there are other paths between them. Thus, a node only needs to know about its immediate causes. Unfortunately, this assumption is false if links are intended to represent causality. For example, if low cardiac output has only been true for a few hours, its effects, whether immediate or reached through a number of causal steps, can only have been true for a few hours. In other causal relationships, it takes time for effects to develop and the duration of the cause may rule out effects further down the causal chain. This problem could be solved by having multiple low cardiac output nodes representing different periods of time. However, in the heart disease domain this proliferation of nodes would have to happen over the whole model, increasing the size and complexity of the model enormously.

The strategy of duplicating nodes to represent different times has been successfully applied in the domain of diabetes therapy. In that domain the combination of diagnostic and temporal reasoning has been handled by having a copy of the Bayesian network for each hour over a 24 hour period[6]. In the heart disease domain, there are no well-defined convenient time periods to divide up the past, since minutes, hours, days, and years are often pertinent to the reasoning. Even if 20 or 30 suitable time periods could be devised, a model with a couple hundred nodes for each time period would make the reasoning intractable at current computational speeds.

For this reason we have added temporal relationships as constraints on the probabilistic network. For example, if the cause for low cardiac output is an acute MI that took place over the past four hours, the temporal constraints also determine what effects low cardiac output and anything further down the causal chain can have. Since we are already using heuristic methods to reason with the probability network, the addition of temporal reasoning does not compromise the mathematical integrity of formal methods.



Next: Example Problem Up: Heart Disease Program Previous: Problem Domain


wjl@MEDG.lcs.mit.edu
Fri Nov 3 16:57:00 EST 1995