Using Only the Training Set from Which the Tree was Built
Quinlan: raw resubstitution estimate of error rate is adjusted to reflect this estimate’s bias
- heuristic method
- views N training cases at a leaf, E incorrectly classified, as observing E “events” in N trials
- probability of E over entire population of cases cannot be determined exactly, but has itself a probability distribution that is usually summarized by a pair of confidence limits
- C4.5 equates the predicted error rate at a leaf with the upper limit on this probability(Quinlan, C4.5 Programs for Machine Learning, p. 35-42)