Classification Trees: What They Are and an Example from a Clinical Domain

2/20/98


Click here to start


Table of Contents

Classification Trees: What They Are and an Example from a Clinical Domain

Outline

An Example (from Winston, Artificial Intelligence, 3rd ed.)

Beach Data Name Hair Height Weight Lotion Result Sarah blonde avg light no sunburned Dana blonde tall avg yes none Alex brown short avg yes none Annie blonde short avg no sunburned Emily red avg heavy no sunburned Pete brown tall heavy no none John brown avg heavy no none Katie blonde short light yes none

How to Classify New Cases?

Can Conveniently Arrange Tests in a Tree Format

Classification Tree: Definition

Classification Tree: Definition, continued

Classification Trees A.K.A.

Another Possible Tree for Beach Data (No Prior Sunburn Knowledge)

Occam’s Razor, Specialized to Classification Trees

How to Construct the Smallest Classification Tree?

Understanding the Disorder Formula

Example 1

Example 2

Extending Measure of Disorder in One Set to All Sets of a Branch

Deciding Amongst Possible Tests

Split on Height

Split on Weight

Split on Lotion

Greedy Selection of Tests

Repeat Partitioning for Subsets Containing More than One Class

Overfitting Data

Tree Simplification

Tree Pruning

Tree Pruning, continued

Pruning Example (Quinlan, C4.5 Programs for Machine Learning, p.38)

Predicting Error Rates

Error Rate Prediction Methods

Using a New Set of Cases

Using Only the Training Set from Which the Tree was Built

From Trees to Rules

Can Conveniently Arrange Tests in a Tree Format

Beach Example: Trees to Rules

Rule Simplification

Beach Data Name Hair Height Weight Lotion Result Sarah blonde avg light no sunburned Dana blonde tall avg yes none Alex brown short avg yes none Annie blonde short avg no sunburned Emily red avg heavy no sunburned Pete brown tall heavy no none John brown avg heavy no none Katie blonde short light yes none

Rule Simplification, continued

Application in a Clinical Domain

Motivation

Motivation, continued

Decision Aids for Diagnosis of MI

Previous Work

Limitations of Previous Work

Goal

Roadmap

Methods: Data Collection

Patient Attributes Collected

Listing of Patient Attributes age smoker ex-smoker family history of MI diabetes high blood pressure lipids retrosternal pain chest pain major symptom left chest pain right chest pain back pain left arm pain right arm pain pain affected by breathing postural pain chest wall tenderness sharp pain tight pain sweating shortness of breath nausea vomiting syncope episodic pain worsening of pain duration of pain previous angina previous MI pain worse than prev. angina crackles added heart sounds hypoperfusion heart rhythm left vent. hypertrophy left bundle branch block ST elevation new Q waves right bundle branch block ST depression T wave changes ST or T waves abnormal old ischemia old MI sex

Final Diagnosis

Tree Building: Splitting of Data

Tree Building: Specifics

Confidence Level

Tree Comparisons

Logistic Regression Model Building

Logistic Regression Comparisons

Performance Metrics

Sensitivity

Specificity

Positive Predictive Value

Accuracy

Receiver Operating Characteristic (ROC) Curve

ROC Curves: Details

ROC Curves, continued

Results

PPT Slide

PPT Slide

ST elevation = 1: 1 (40.7/49.0 = 83.1%) ST elevation = 0: | New Q waves = 1: 1 (4.1/7.0 = 58.6%) | New Q waves = 0: | | ST depression = 0: 0 (329.4/345.0 = 95.5%) | | ST depression = 1: | | | Old ischemia = 1: 0 (3.2/6.0 = 53.3%) | | | Old ischemia = 0: | | | | Family history of MI = 1: 1 (6.8/11.0 = 61.8%) | | | | Family history of MI = 0: | | | | | age <= 61 : 1 (4.0/8.0 = 50.0%) | | | | | age > 61 : | | | | | | Duration of pain (hours) <= 2 : 0 (14.1/22.0 = 64.1%) | | | | | | Duration of pain (hours) > 2 : | | | | | | | T wave changes = 1: 1 (7.0/10.0 = 70.0%) | | | | | | | T wave changes = 0: | | | | | | | | Right arm pain = 1: 0 (3.4/5.0 = 68.0%) | | | | | | | | Right arm pain = 0: | | | | | | | | | Crackles = 0: 0 (3.0/8.0 = 37.5%) | | | | | | | | | Crackles = 1: 1 (4.9/9.0 = 54.4%)

ST elevation = 1: MI ST elevation = 0: | New Q waves = 1: MI | New Q waves = 0: | | ST depression = 0: not MI | | ST depression = 1: | | | Old ischemia = 1: not MI | | | Old ischemia = 0: | | | | Family history of MI = 1: MI | | | | Family history of MI = 0: | | | | | age <= 61 years: MI | | | | | age > 61 years: | | | | | | Duration <= 2 hours : not MI | | | | | | Duration > 2 hours: | | | | | | | T wave changes = 1: MI | | | | | | | T wave changes = 0: | | | | | | | | Right arm pain = 1: not MI | | | | | | | | Right arm pain = 0: | | | | | | | | | Crackles = 0: not MI | | | | | | | | | Crackles = 1: MI

STelev or Qwave = 1: MI STelev or Qwave = 0: Duration >= 42 hr = 1: | STorTwave = 1: MI | STorTwave = 0: notMI Duration >= 42 = 0: | Shoulder,neck,arms = 1: | | LocalPressure = 1: notMI | | LocalPressure = 0: | | | age >= 40 = 1: | | | | PrevAngina = 1: | | | | | Duration >=10 =1: MI | | | | | Duration >=10 =0: notMI | | | | PrevAngina = 0: | | | | | LeftShoulder = 1: MI | | | | | LeftShoulder = 0: | | | | | | age >=50=1:MI | | | | | | age >=50=0:notMI | | | age >=40 = 0: notMI | Shoulder,neck,arms = 0: | | PainWorse = 1: MI | | PainWorse = 0: | | | Diaphoresis = 1: | | | | age >= 70 = 1: MI | | | | age >= 70 = 0: notMI | | | Diaphoresis = 0: notMI

STchange = +2 STchange = normal ncpnitro = yes | chpainer = no | chpainer = yes | | s1 = arm,neck,shoulders | | s1 = SOB | | s1 = stomach | | | sex = male | | | sex = female | | s1 = pressure,pain,discomfort in chest | | | sex = female | | | sex = male | | | | age > 81.5 years | | | | age < 81.5 years | | | | | age < 45.5 years | | | | | age > 45.5 years ncpnitro = no | chpainer = no | chpainer = yes | | twave = normal | | twave = -1 | | | sex = female | | | sex = male | | | | age < 73.5 years | | | | age > 73.5 years

STchange = -2 | ncpnitro = yes | ncpnitro = no | | systolic BP > 202 mmHg | | systolic BP < 202 mmHg | | | qwave = asmi | | | qwave = normal | | | | systolic BP > 178 mmHg | | | | systolic BP < 178 mmHg | | | | | age > 83.5 years | | | | | age < 83.5 years | | | | | | heart rate < 77 bpm | | | | | | heart rate > 77 bpm | | | | | | | heart rate < 89 bpm | | | | | | | heart rate > 89 bpm STchange = -1 | s1 = stomach | s1 = rapid,skipping heartbeats | s1 = pain in arms,neck,shoulders | s1 = SOB | s1 = fainted,dizzy,lightheaded | | age > 74 years | | age < 74 years | | | hxmi = yes | | | hxmi = no

| s1 = pressure,pain,discomfort in chest | | heart rate > 131 bpm | | heart rate < 131 bpm | | | systolic BP > 197 mmHg | | | systolic BP < 197 mmHg | | | | heart rate < 111 bpm | | | | heart rate > 111 bpm STchange = -0.5 | ncpnitro = yes | ncpnitro = no STchange = flat | ncpnitro = yes | ncpnitro = no STchange = +1 | age > 87.5 years | age < 87.5 years | | chpainer = yes | | chpainer = no | | | qwave = ami | | | qwave = normal | | | | heart rate < 69 bpm | | | | heart rate > 69 bpm

STchange = -2 | ncpnitro = yes | ncpnitro = no | | systolic BP > 202 mmHg | | systolic BP < 202 mmHg | | | qwave = asmi | | | qwave = normal | | | | systolic BP > 178 mmHg | | | | systolic BP < 178 mmHg | | | | | age > 83.5 years | | | | | age < 83.5 years | | | | | | heart rate < 77 bpm | | | | | | heart rate > 77 bpm | | | | | | | heart rate < 89 bpm | | | | | | | heart rate > 89 bpm STchange = -1 | s1 = stomach | s1 = rapid,skipping heartbeats | s1 = pain in arms,neck,shoulders | s1 = SOB | s1 = fainted,dizzy,lightheaded | | age > 74 years | | age < 74 years | | | hxmi = yes | | | hxmi = no

STchange = -2 | ncpnitro = yes | ncpnitro = no | | systolic BP > 202 mmHg | | systolic BP < 202 mmHg | | | qwave = asmi | | | qwave = normal | | | | systolic BP > 178 mmHg | | | | systolic BP < 178 mmHg | | | | | age > 83.5 years | | | | | age < 83.5 years | | | | | | heart rate < 77 bpm | | | | | | heart rate > 77 bpm | | | | | | | heart rate < 89 bpm | | | | | | | heart rate > 89 bpm STchange = -1 | s1 = stomach | s1 = rapid,skipping heartbeats | s1 = pain in arms,neck,shoulders | s1 = SOB | s1 = fainted,dizzy,lightheaded | | age > 74 years | | age < 74 years | | | hxmi = yes | | | hxmi = no

Snapshot from the Long Tree | | systolic BP > 202 mmHg | | systolic BP < 202 mmHg | | | qwave = asmi | | | qwave = normal | | | | systolic BP > 178 mmHg | | | | systolic BP < 178 mmHg | | | | | age > 83.5 years | | | | | age < 83.5 years | | | | | | heart rate < 77 bpm | | | | | | heart rate > 77 bpm | | | | | | | heart rate < 89 bpm | | | | | | | heart rate > 89 bpm

Snapshot from the Long Tree | | systolic BP > 202 mmHg | | systolic BP < 202 mmHg | | | qwave = asmi | | | qwave = normal | | | | systolic BP > 178 mmHg | | | | systolic BP < 178 mmHg | | | | | age > 83.5 years | | | | | age < 83.5 years | | | | | | heart rate < 77 bpm | | | | | | heart rate > 77 bpm | | | | | | | heart rate < 89 bpm | | | | | | | heart rate > 89 bpm

Tree Attributes Goldman: FT: Long: ST elevation ST elevation ST change or Q waves New Q waves Q waves Duration Duration ST or T wave T wave T wave Shoulder, neck, arm Right arm Arm,neck,shoulder Age Age Age Local Pressure ST depression Stomach pain Previous angina Old ischemia Fainted, dizzy, lightheaded Left shoulder Family history Systolic BP Pain worse Crackles Heart rate Diaphoresis Rapid/skipping beats Chest pain History of MI Nitroglycerin use Shortness of breath Sex

Goldman, FT, and Long Trees: Performance on Each’s OWN Test Set Goldman FT Tree Long Tree Sensitivity = 90.9% 81.4% 66.1% Specificity = 69.7% 92.1% 85.8% PPV = 35.4% 72.9% 68.3% Accuracy = 73.1% 89.9% 80.1%

Goldman Tree vs FT Tree on Edinburgh data, p < 0.0001

Goldman Tree vs. FT Tree on Sheffield data, p < 0.01

Logistic Regression Results FT LR Equation Coefficients: Constant -2.14 ST elevation 2.96 New Q waves 2.00 ST depression 1.76 Crackles 0.807 Old ischemia -0.86 Family history 0.43 Age -0.016 Duration -.0046 T wave changes 0.805 Right arm pain -0.22

Comparison of LR Models Kennedy FT LR Selker LR Constant -3.07 -2.14 ST elevation 3.16 2.96 New Q waves 1.37 2.00 ST depression 1.95 1.76 LV Failure (Crackles) 1.54 0.807 Old ischemia -0.86 Family history of MI 0.43 Age -0.016 Duration -0.0046 T wave 0.805 Right arm -0.22 Vomiting 0.68 Hypoperfusion 0.47 Chest pain #1 Sx 0.71 Chest pain/24h 1.00 T wave nl/flat 1.13 Nitroglycerin use 0.51 Previous MI 0.42 STchange nl/flat 0.77 STchange normal 0.83

LR Results: Comparison of ROC Areas FT LR Kennedy Edinburgh: 94% 94% p = 0.50 Sheffield: 89% 91% p = 0.17 (ROC curve area for Selker LR model = 89%)

ROC Curves for Trees vs. LR on Edinburgh data

ROC Curve for Trees vs. LR on Sheffield data

Trees vs. Logistic Regression Model: Edinburgh: Sheffield: FT Tree 94% 90% Goldman Tree 84% 84% FT LR 94% 89% Kennedy LR 94% 91% - Differences between FT Tree and Kennedy LR not significant (p = 0.41 Edinburgh; p = 0.17 Sheffield)

Discussion

Additional Benefits of Classification Trees

Clinical Benefits

Future Work

Acknowledgments

Author: Clinical Decision Making Group

Email: chris@medg.lcs.mit.edu

Home Page: http://medg.lcs.mit.edu/people/

Other information:
Medical Computing Class 2/19/98 Lecture Slides