Patil81, Chapter 3

Illness can be described as a change in the normal state or function in a patient. To describe an illness, we need a formalism to represent the states, the state changes, the normal and the abnormal functions and their interactions in terms of the primitives known to the system. This knowledge is organized in the program with the help of (1) an anatomy component, which includes a part-of hierarchy for organ systems, contained-in and position relations for major anatomical features, and a connected-to relation which provides material flow information. (2) A physiology component, where our concentration has been only on the fluid and electrolytes, describes the fluid compartments of the body, the spaces of distribution of various solutes, and the relative distribution of losses and gains in the various compartments under different conditions. (3) A pathophysiology component, which contains some primitive knowledge about disease etiologies, a taxonomy of disease processes, and causal relations which describe how the changes in a given state influence other states.

It is also important to recognize commonly occurring constellations of abnormal states as special composite situations. Conceptualization of these composite situations in a diagnostic system is important because it provides us with the ability to reason at a high level of abstraction, and to organize a large number of seemingly unrelated facts into a coherent whole. We have argued that it is crucial for any diagnostic system to have the ability to reason simultaneously at a high level of abstraction consisting of phenomenological knowledge as well as at a physiological level. We accomplish this with the help of a multi-level model for representation of diseases and causal phenomena. This is motivated by the observations made by Lynch while studying the conceptual maps of metropolitan regions. He notes

The structure of the cognitive map described above is a product of the necessity to cope with large-scale maps; maps that are too large to be perceived at once, too large to be stored in the short-term memory by their users at a single instance of time, and too complex to be computationally tractable in solving problems (such as finding an efficient path between two points on the map). An important observation in formulating cognitive maps is that they are organized around landmarks. The conceptualization can be achieved by expanding the denotation of a landmark to subsume the local topology surrounding the designated location. If this conceptualization is carried out carefully, so that the areas subsumed by these landmarks overlap and cover the entire detailed map, it is possible to maintain sufficient coherence (mapping) to be able to move between different levels of description.

Based on these observations and similar observations of a physician's use of medical knowledge, we have developed a hierarchical multi-level representation scheme to describe medical knowledge. The lowest level of this description consists of pathophysiological knowledge about diseases, which is successively aggregated (summarized) into higher level concepts and relations, gradually shifting the content of the description from physiological to syndromic knowledge. The aggregate syndromic knowledge provides us with a concise global perspective and helps in the efficient exploration of the diagnostic alternatives. The physiological knowledge, on the other hand, provides us the capabilities of handling complex clinical situations arising in patients with multiple disturbances, evaluating the physiological validity of the diagnostic possibilities being explored, and organizing a number of fragmented and seemingly unrelated facts into a coherent causal description.

3.1 Anatomical Knowledge

The anatomical knowledge of the system includes (1) a part-of hierarchy for organ systems, (2) connected-to relations, which provide the material flow information, and (3) contained-in and position relations which provide gross anatomical relations between anatomical entities.

3.1.1 Anatomical Taxonomy

The part-of hierarchy defines the various anatomical parts of the body by defining each organ system in relation to the body, and each sub-organ in relation to the organ-system containing it. The part-of hierarchy provides us with the taxonomic hierarchy for anatomical parts. A small section of the part-of hierarchy⁸ and its graphical representation is shown in figure 7.

Fig. 7. The part-of hierarchy

[body
   [urinary-system = (anat-entity*s "urinary-system")^u
      [kidney = (anat-entity*s "kidney")^u
         [cortex = (anat-entity*s "cortex")^u]
            [medulla = (anat-entity*s "medulla")^u]
         [nephron = (anat-entity*s "nephron")^u
            [tubule = (anat-entity*s "tubule")^u
               [proximal-tubule
                = (anat-entity*s "proximal-tubule")^u]
               [loop-of-henle
                = (anat-entity*s "loop-of-henle")^u]
               [distal-tubule
                = (anat-entity*s "distal-tubule")^u]]
            [glomerulus = (anat-entity*s "glomerulus")^u]]
         [collecting-duct
          = (anat-entity*s "collecting-duct")^u]]
      [ureter = (anat-entity*s "ureter")^u]
      [bladder = (anat-entity*s "bladder")^u]
      [urethra = (anat-entity*s "urethra")^u]]]

3.1.2 Material Flow Pathways

Material flow (e.g. the flow of glomerular filtrate) is represented by the connected-to relation. For example, the path of the filtrate in the kidney can be described as shown in figure 8. As can be seen from the figure, the material flow relation is specified at various levels of detail. The rationale for this multiple level description is provided later on in this section.

Fig. 8. Material flow relations

[((connected*b nephron)*e collecting-duct)]
[((connected*b glomerulus*e tubule)]
[((connected*b tubule)*e collecting-duct)]

[((connected*b glomerulus)*e proximal-tubule)]
[((connected*b proximal-tubule)*e loop-of-henle)]
[((connected*b loop-of-henle)*e distal-tubule)]
[((connected*b distal-tubule)*e collecting-duct)]

The anatomical knowledge that follows in the remainder of this section has been included to provide a fuller description of ABEL's knowledge base. However, this knowledge is currently not used by the program in its diagnostic reasoning.

3.1.3 Anatomical Spaces

Various anatomical parts of the body are distributed in different spaces. These spaces are generally isolated from one another by membrane barriers which prevent the free flow of various electrolytes, proteins etc. Thus, the composition of the fluid surrounding organs in a given compartment can be different from that in other compartments. These general characteristics of the compartment can be useful in diagnosis and management of various diseases. Examples of such a compartment are the cranial-cavity and the peritoneal-cavity. Although the anatomical part-of relation and spatial containment relation are very similar, a distinction between the two must be made. For example, the Cortex and the nephron are two different parts of the kidney and the nephron has two parts, the glomerulus and the tubule; however, the glomerulus is contained in the anatomical space of the cortex while the tubule is contained in the anatomical space of the / medulla. A graphical representation of this can be seen in figure 9.

Fig. 9. The containment relation

[body-space
    [((contains*b body-space)*e cranial-cavity)] 
    [((contains*b body-space)*e abdominal—cavity)] 
    [((contains*b body-space)*e oropharynx-cavity)] 
    [((contains*b body-space)*e thoracic-cavity)]
     ....]

[abdominal-cavity
    [((contains*b abdominal-cavity)*e stomach—space)] 
    [((contains*b abdominal-cavity)*e spleen—space)] 
    [((contains*b abdominal-cavity)*e liver-space)] 
    [((contains*b abdominal-cavity)*e kidney—space)]
     ....]

[kidney-space
    [((contains*b kidney-space)*e cortex-space)]
    [((contains*b kidney-space)*e medulla—space)]]

[cortex- space
    [((contains*b cortex-space)*e glomerular-space)]]

[medulla-space
    [((contains*b medulla-space)*e tubular-space)]]

3.1.4 Miscellaneous Gross Anatomical Relations

Fig. 10. Gross anatomical relations

A few additional anatomical relations are useful in common sense reasoning in medicine. An example of such a relation is the relative positioning of various anatomical spaces in supine position (lying face up in bed), erect position (standing up or ambulatory), etc. The use of this information can be illustrated by the following example. Let us consider a patient with nephrotic syndrome. A common symptom in nephrotic syndrome is periorbital edema (accumulation of fluid under the skin around the eye). In ambulatory patients, the periorbital edema can be observed only in the morning (after the patient has been lying down for some period of time); this accumulation of fluid can gravitationally move into other spaces once the patient has been up and around for some hours in the day. Thus the symptom is observable only in the morning and tends to disappear later in the day. Exactly an opposite effect can be observed in the case of pedal edema (accumulation of fluid in the feet) which tends to appear towards the evening and disappear in the mornings. This information can be used to explain away the absence of pedal edema in an edematous patient who is comatose. This information is encoded in the program with the use of positional relations as shown in figure 10.

We would like to note that the use of the anatomical knowledge in the current implementation of ABEL is limited to the use of anatomic taxonomy. However, we believe that the knowledge described here will be useful for further development of the diagnostic component as well as the therapy and prognosis components of the project.

3.2 Etiological Knowledge

Disease categories are primarily organized around the organ systems; e.g., renal diseases, pulmonary diseases, liver diseases. In the previous section we have provided the basic framework of anatomical knowledge needed to provide such a categorization. The diseases of a given organ system tend to produce many symptoms associated with the loss of function of that system. For example, regardless of the cause of renal failure, all the diseases causing renal failure share common symptoms.

Another important criterion for organizing diseases is the underlying mechanism causing the clinical disorder, i.e., the etiology of the disease. Similar to the anatomical categorization, the diseases with common etiology share symptoms common to the disease mechanism. For example, most infectious diseases cause fever. The taxonomy of etiologies in the program is shown in figure 11.

Fig. 11. Etiological hierarchy

[etiology = (medical-entity*s "etiology")
    [infectious = (etiology*s "infectious")]
    [immunologic = (etiology*s "immununologic")]
    [degenerative = (etiology*s "degenerative")]
    [toxic = (etiology*s "toxic")
        [biologic-toxins = (toxic*s "biologic-toxins")]
        [chemical-toxins = (toxic*s "chemical-toxins")]]
    [metabolic = (etiology*s "metabolic")
        [genetic = (metabolic*s "genetic")]
        [congenital = (metabolic*s "congenital")]
        [endocrine = (metabolic*s "endocrine")]]

    ....]

3.3 Physiological Knowledge

Knowledge about the normal functioning of the body and its adaptive response to abnormalities in body function plays an important role in the understanding and recognition of diseases. The need for this understanding is even more acutely felt in complex clinical settings involving the simultaneous presence of multiple abnormalities. Emphasizing this need, Dr. Jordan Cohen notes:

In the physiological component of the program we have concentrated on the knowledge necessary in dealing with fluid, electrolyte and acid-base disorders. The physiological knowledge about fluids and electrolytes in the program deals with fluid compartments of the body and the distribution of body fluids in various fluid compartments, the composition of fluid in each compartment, the space of distribution of solutes, exchange of fluid and electrolytes between these compartments, and the homeostatic mechanism for regulating the quantity and composition of the body fluids.

For example, let us look at the definition of the Serum-Potassium concentration:

The above expression defines serum-K (serum potassium) to be the concentration of potassium ion (K) in the extracellular fluid compartment (ecf), which is one of the components of the body fluid (body-fluid). The serum-K is further categorized as being either low (i.e., [(serum-k*f low)]), normal or high. Each of these categories is also associated with its default value, range and the acceptable amount of variance associated with its value (standard-error, in this case ±1.0). The next example shows the encoding of the normal composition of the lower-GI-fluid. The lower-GI-fluid contains, in addition to water, Na, K, Cl and HCO3. The quantities of these electrolytes and their variations are further specified in terms of the total quantity of the fluid. For example, the quantity of K is specified to be equal to 40.0 ± 10.0 meq/L of water in the lower-GI-fluid.

In the previous three sections we have described the anatomical, physiological and etiological knowledge which, along with the temporal characterization, forms a basis for the taxonomic organization of diseases discussed in the next section.

3.4 Disease Knowledge

This section deals with the use of anatomical, physiological, etiological and temporal knowledge in defining a taxonomic disease hierarchy. With this taxonomic hierarchy in place, we will have completed the study of the basic medical concepts needed in ABEL for the description of disease pathophysiology.

A disease is defined in terms of its anatomical involvement, its temporal characteristic, its etiologic characterization and its pathophysiology. As each of the anatomic, etiologic, and physiologic knowledge is hierarchically organized, the locus of a disease along each of these dimensions can be selected at an appropriate level. A hierarchic organization for the disease definitions can then be derived from these loci.⁹ For example, acute renal failure caused by nephrotoxic drugs could be specified as

The example above defines renal-disease to be a disease of the renal-system (anatomical locus). Renal-failure is then defined as a renal disease characterized by low urine output (physiological locus). Acute-renal-failure is defined to be renal-failure with an acute temporal characteristic, and finally, the drug-induced-acute-renal-failure is defined to be acute-renal-failure of chemically-toxic etiology. Note that each step of the above definition defines a disease which is further specialized by one of its primary characterizations. This provides a more specific placing of the diseases in the taxonomic hierarchy. In the next example we show how the disease definitions can be taxonomically organized along a single locus:

As can be seen from the above two examples, the basic medical knowledge about anatomy, etiology etc., provides us with a framework for describing and organizing the disease hierarchy. We believe in the need for such a knowledge structure in the organization of any medical consulting program capable of expert level performance. However, we must note that this development is tentative and the details of the knowledge representation described above are likely to evolve considerably as its use in the diagnostic and therapeutic algorithms is better understood.¹⁰

In the next section we will study the representation mechanisms for describing the causal (pathophysiological) knowledge relating different diseases.

3.5 Causal Link

A causal link specifies the cause-effect relation between the cause (the antecedent) and the effect (the consequent) states. In the previous generation of programs (i.e., PIP, INTERNIST and GLAUCOMA), causal relations were described by links specifying the type of causality (e.g., may-be-caused-by, complication-of, etc.), and a number or a set of numbers representing in some form the likelihood (conditional probability), importance, etc., of observing the effect given the cause or vice versa. We believe that this simple representation of the relation between states is inadequate. The form of presentation of an effect and the likelihood of observing it depend upon various aspects of the presentation of the cause instance such as severity and duration, as well as on other factors in the context in which the causal phenomenon is manifested (such as the patient's age, sex and weight, and the current hypothesis about the patient's illness). To illustrate this, let us consider a (simplified) causal relation between diarrhea and dehydration. A rule-based description of this causal relation can be specified as follows:

From the above simple example, it is apparent that the conditional probability of observing dehydration and its severity and duration depend on the severity and duration of diarrhea and the fluid replacement therapy. Even this simplified example clearly demonstrates the need for information on how a cause relates to an effect, as well as other contextual information influencing the causal relation. To capture this information, the description of a causal link has associated with it a multivariate relation between attributes of the cause and the effect, the context, and the assumptions which constrain the causal relation. A schematic description of a causal link and its representation in the data-base are shown in figure 12.

An example of the causal relation between total extracellular stores of potassium (ecf-K) and its serum concentration (serum-K) is described below.

The causal relation between ecf-k stores and serum-k is specified by a causal link with cause (source) ecf-K and effect (destination) serum-k. The mapping relation describing this link is divided into two parts. The first part is associated with the source of the link and describes procedures for computing the attributes of the source (cause) given the attributes of the destination (effect). and the second part is associated with the destination given the attributes of the source. For example, the total quantity of potassium in the extracellular compartment (value of the source) is characterized as being the product of the quantity of the extracellular water (value of the context, total-ed-water) and the concentration of the potassium ion in it (destination, serum-k).

Strictly speaking, it would not be appropriate to call all relations of this kind “causal,” as some of the relationships are more matters of definition or association than cause. A more rigorous analysis, perhaps following the lines of [Rieger77], would further distinguish potential cause from actual, enabling conditions from true causations, etc. Such an expansion would, however, be orthogonal to our present argument, that any such link must connect several aspects of its source and destination.

3.6 Multi-Level Causal Description

Medical knowledge about different diseases and their pathophysiology is understood to varying degrees of detail. Our understanding of medical expert reasoning also suggests that an expert physician may have an understanding of a difficult case in terms of several levels of detail. For example, “serum creatinine concentration of 1.2 mg per cent” is at a distinctly different level than “high serum creatinine”,¹¹ and “lower gastrointestinal loss” than “salmonellosis”. For our program to reason at a sophisticated level of competence, it will need to share such a range of representations. In order to be effective the program must be able to describe the problem briefly yet still be able to take low level detail into consideration. We have attacked this problem by representing the program's medical and case-specific knowledge at five distinct levels of detail,¹² ranging from a pathophysiological level to a phenomenological level of knowledge.

Fig. 13. Schematic description of the node structure

The patient description developed here provides us with the ability to describe the patient's illness at various levels of detail. Each level of the description can be viewed as a semantic net describing a network of relations between diseases and findings. Each node represents a normal or abnormal state of a physiological parameter and each link represents some relation (causal, associational, etc.) between different states. A state in the system is represented as a node in the causal network. Associated with each node is a set of attributes describing its temporal characteristics, severity or value, and other relevant attributes. A node is called primitive if it does not contain internal structure and is called composite if it can be defined in terms of a causal network of states at the next more detailed level of description. One of the nodes at that more detailed level is designated as the focus node and the causal network is called the elaboration structure of the composite node. Figure 13 shows a schematic of the elaboration structure for a composite node labeled X. Nodes A through F and links between them form X's elaboration structure. Node X and F are connected together by a focus link making F the focus of the elaboration structure. The focus node identities the essential part of the causal structure of the node above it. The collection of focal nodes acts to align the causal networks represented by different levels of the PSM. We note that very often a composite node and its focal description at the next level share the same name.¹³ Nodes that do not play a role as the focal definition of any node at a higher level are called non-aggregable nodes. They represent a detailed aspect of the causal model which is subsumed under other nodes with different foci at less detailed levels of description.

Fig. 14. Comparison of lower GI fluid and of plasma

Lower GI fluid		Plasma
Na	100-110	138-145	mEq/L
K	30-40	4-5	mEq/L
Cl	80-90	100-110	mEq/L
HCO3	30-60	24-28	mEq/L

To illustrate the description of a state at various levels of aggregation, let us consider the electrolyte and acid-base disturbances that occur with salmonellosis, which causes excessive loss of lower gastrointestinal fluid (lower GI fluid loss). In comparison with plasma, the lower GI fluid is rich in bicarbonate (HCO3) and potassium (K) and is deficient in sodium (Na) and chloride (Cl). The composition of lower gastrointestinal fluid and plasma are shown in figure 14. The loss of lower GI fluid leads to the loss of corresponding quantities of its constituents as shown in figure 15.

Fig. 15. The loss of electrolytes in lower GI fluid

Therefore, an excessive loss of lower GI fluid without adequate replacement of fluid and electrolytes leads to a net reduction in the total quantity of fluid in the extracellular compartment (called hypovolemia). Because the concentration of K and HCO3 in the lower GI fluid is greater than in the plasma, there is also a corresponding reduction in the serum concentration of K (called hypokalemia) and HCO3 (called hypobicarbonatemia) in the extracellular fluid. Finally, because the concentration of Cl and Na in the lower GI fluid is lower than that in the plasma, there is corresponding increase in the concentration of Cl (called hyperchloremia) and Na (called hypernatremia) in the extracellular fluid. A graphic representation of this information at the next higher level of aggregation is shown in figure 16.

Fig. 16. Consequences of lower GI loss described at
next higher level

Figure 17 shows the aggregation of this information along with some additional causes and consequences of lower GI loss at the next more aggregate level of detail. Hypobicarbonatemia is interpreted as metabolic acidosis at the next higher level of detail. Note that the hypernatremia and hyperchloremia have not been encoded at this level.¹⁴ The hyperchloremia was not encoded because it is not clinically significant. The hypernatremia, however, is not encoded because it is not a common finding in the presentation of lower GI loss. The lower GI loss at this level is a non-aggregable state and therefore does not have a focal aggregation at the next level above. Figure 18 shows the description of the aggregate effects of salmonellosis (one of the causes of lower GI loss).

Links can be categorized into two types, as nodes are: the primitive links and the composite links. To illustrate the concept of elaborating causal links to form a causal chain, let us consider the causal relation between salmonellosis and dehydration shown in figure 19. The causal mechanism of dehydration caused by salmonellosis can be elaborated as follows: salmonellosis causes lower GI loss, which in turn causes dehydration. Expressed at the next level of greater detail, the lower GI loss leads to water loss which results in reduction in the extracellular volume. The state of reduced extracellular volume is called dehydration.

Because the causal relations specified by links are not guaranteed to be true under all circumstances (they represent strong associations, not logical truth), the validity of deductions degrades with every additional intermediate link. That is, a causal pathway containing a large number of links is less likely to be valid than one using only a few links. Therefore, in order to explore a large diagnostic space, we must reduce the lengths of commonly occurring chains of causal relations. One way of achieving this is through the multi-level description proposed in this chapter. The multi-level description scheme allows us to aggregate the diagnostic space to a level where each link represents an aggregate causal phenomenon covering large distances and thus minimizing the possibility of error in the deduction.

However, the multi-level description proposed above can not solve this problem completely. For example, there are situations where all the intermediate nodes in a given causal chain cannot be suppressed due to limited number of levels of description. Stated differently, because of the fixed number of levels in the multi-level description, the programs ability to aggregate causal description is limited. To overcome this problem we introduce the notion of a compiled link which represents a causal pathway.¹⁵ The compiled links provide us with the ability to selectively explore commonly occurring causal paths more deeply than others without degrading the quality of deduction. This also provides us with the additional ability to activate¹⁶ nodes which are not immediate neighbors of the node under consideration. For example, severe salmonellosis causes dehydration sufficient to cause hypotension (lowering of blood pressure). This fact can be represented in the data base by the compiled causal link as shown in the figure 20.

Fig. 20. Compiled link

[((caused-by*b salmonellosis)*e hypotension)
    #path [((caused-by*b salmonellosis)*e dehydration)],
          [((caused-by*b dehydration)*e hypotension)]]

An important function of diagnostic reasoning is to relate causally the diseases and symptoms observed in a patient. These causal relations play a central role in identifying clusters that can be meaningfully aggregated in developing coherent diagnoses. The presence or absence of a causal relation between a pair of states can change their diagnostic and prognostic interpretations. Therefore, the system should and does have the capability of hypothesizing the presence or absence of a causal relation. This is the primary reason why links are considered objects in their own right rather than simply an ordered pair of states.

In this section we have described the representation of the anatomic, physiologic and etiologic medical knowledge around which the disease taxonomy and pathophysiology is organized. We have also discussed a multi-level hierarchic description of causal knowledge. In the next section we will study the operations that use this knowledge in describing the individual patient's illness by patient-specific instantiation of relevant medical knowledge and by combining the effects of multiple disease phenomena.

3. Representation of Medical Knowledge