European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 113
European Policy Analysis
Table 2: Cross-table Decision Tree/Test
In the next step, the performance
Set
of the models has to be evaluated. One
way would be to compare the cross-tables
Prediction
Real
of the models, as has been done with the
FALSE TRUE Total
decision tree model. But this comparison
FALSE
178
21
199
TRUE
4
1
5
is not trivial. Do we prefer a model that is
Total
182
22
204
highly sensitive, that is, it identifies a high
number of the rare class “punctuation”?
To find a more robust model, the Or do we look for a model that is very
described bootstrapping procedure is specific, that is, it is only seldom wrong
applied. A bagging model and a random when predicting punctuations? But there
forest model are fitted on the training set. is a way to deal with this trade-off.
As described earlier, both models are very
The ROC curve is a popular
similar. The random forest only selects the graphic for comparing the performance
possible predictor variables at each split of different classification models. “The
of each tree randomly, while the bagging name “ROC” is historic, and comes from
model always takes the best predictor for communications theory. It is an acronym
each split.
for receiver operating characteristics”
First, the number of bootstrapped (James et al. 2013, 147). As described
trees has to be defined. If the number before, the models do not only predict the
of trees is too low, the model is most likely outcome for every observation,
underperforming. An ensemble with but they also calculate a probability for this
more trees would be more accurate.14 To assignment. The decision tree in Figure 7,
evaluate the performance with different for example, includes the proportion of
numbers of trees, we can plot the error right classifications for every node which
rates as a function of the number of trees. can be translated into a probability value.
Figure 8 shows the classification error For an ROC curve, the predictions are
rates (i.e., the percentage of wrongly ordered by these probabilities. In some
classified observations) for both classes cases, the model “is very sure” that there
“TRUE” and “FALSE.” The black curve was no punctuation (e.g., probability 0.1);
is the OOB rate. Since the bootstrapped in other cases, TRUE is a more likely
data is not using all observations in each prediction (e.g., probability 0.8). For
tree, we can predict that the classification every probability value, the true positives
error rate for those observations that have (i.e., the cases which have been rightly
been randomly excluded from the sample labeled “TRUE” of “FALSE”, also called
the tree is built on. The OOB error rate, sensitivity) and the false positives (i.e., the
therefore, is a very good measure for cases which have been wrongly labeled—
the robustness of the model. If the OOB also called specificity) are counted. Plotting
error rate is no longer decreasing when these values against each other results in
adding more trees to the ensemble, we an ROC curve (see Figure 9). A perfect
have reached a sufficient number of trees. model that predicts every observation
Figure 8 shows that the OOB error rate right would be represented by an ROC
seems quite stable with 200 and more curve that hugged the top-left corner of
trees so 300 trees should be a save choice. the plot. The bigger the area under the
113