How to Construct the Smallest Classification Tree?

Need a way to measure the total disorder, or inhomogeneity, in the subsets produced by each test

From information theory:Avg disorder = ?b(nb/nt) x [ ?c-(nbc/nb)*log2(nbc/nb) ]where nb = # samples in branch b nt = total # samples in all branches nbc = total # samples in branch b of class c

Need a way to measure the total disorder, or inhomogeneity, in the subsets produced by each test