There are 7 positive examples and 4 negative ones overall, so the
overall entropy is -(7/11)*log2(7/11) - (4/11)*log2(4/11) which is about
0.94566.
There are 3 values for the texture attribute: smooth,
wavy and rough.
For smooth, there are 4 positive examples and 1 negative, so the entropy
is -(4/5)*log2(4/5) - (1/5)*log2(1/5) which is about 0.7219.
For wavy there is one positive and one negative example, so the
entropy is -(1/2)*log2(1/2) - (1/2)*log2(1/2) = 1.
For rough there are two positive and two negative examples, so the
entropy is -(1/2)*log2(1/2) - (1/2)*log2(1/2) = 1.
Expected information for a split on texture is thus:
Attribute temperature:
There are 4 values for the temperature attribute: cold,
cool, warm and hot.
For cold, there is 1 positive and 3 negative examples, so the
entropy is -(1/4)*log2(1/4) - (3/4)*log2(3/4) which is about 0.811278.
For cool, there are 3 positive and no negative examples, so the
entropy is -(0/3)*log2(0/3) - (3/3)*log2(3/3) = 0.
For warm, there is one positive and no negative example - again
the entropy is 0.
For hot, there are 2 positive and 1 negative examples, so the
entropy is -(1/3)*log2(1/3) - (2/3)*log2(2/3) which is about 0.918296.
Thus the expected information for a split on temperature is:
0.811278*(4/11) + 0*(3/11) + 0*(1/11) + 0.918296*(3/11) which is
about 0.54545.
Thus
expected information gain for a split on temperature is about
0.94566 - 0.54545 = 0.40021.
P: ( [8, 5] ) / \ / \ / \ L: [6,2] R: [4,1]The nodes have been labelled P(parent), L(left) and R(right) for ease of reference.
The Laplace (static) error for L is 0.3 (obtained by putting N = 8, n = 6,
and k = 2 into the Laplace error estimate formula.)
The Laplace error for R is 0.285714 (obtained by setting N = 5, n = 4,
& k = 2).
Hence BackedUpError(P) = (8/13)*0.3 + (5/13)*0.285714 = 0.2945 or so.
The Laplace (static) error for P (set N=13, n=8, k=2) is 0.4. This is
larger than the backed up error, so we do not prune