European Policy Analysis Volume 2, Number 1, Spring 2016

European Policy Analysis Table 2: Cross-table Decision Tree/Test In the next step, the performance Set of the models has to be evaluated. One way would be to compare the cross-tables Prediction Real of the models, as has been done with the FALSE TRUE Total decision tree model. But this comparison FALSE 178 21 199 TRUE 4 1 5 is not trivial. Do we prefer a model that is Total 182 22 204 highly sensitive, that is, it identifies a high number of the rare class “punctuation”? To find a more robust model, the Or do we look for a model that is very described bootstrapping procedure is specific, that is, it is only seldom wrong applied. A bagging model and a random when predicting punctuations? But there forest model are fitted on the training set. is a way to deal with this trade-off. As described earlier, both models are very The ROC curve is a popular similar. The random forest only selects the graphic for comparing the performance possible predictor variables at each split of different classification models. “The of each tree randomly, while the bagging name “ROC” is historic, and comes from model always takes the best predictor for communications theory. It is an acronym each split. for receiver operating characteristics” First, the number of bootstrapped (James et al. 2013, 147). As described trees has to be defined. If the number before, the models do not only predict the of trees is too low, the model is most likely outcome for every observation, underperforming. An ensemble with but they also calculate a probability for this more trees would be more accurate.14 To assignment. The decision tree in Figure 7, evaluate the performance with different for example, includes the proportion of numbers of trees, we can plot the error right classifications for every node which rates as a function of the number of trees. can be translated into a probability value. Figure 8 shows the classification error For an ROC curve, the predictions are rates (i.e., the percentage of wrongly ordered by these probabilities. In some classified observations) for both classes cases, the model “is very sure” that there “TRUE” and “FALSE.” The black curve was no punctuation (e.g., probability 0.1); is the OOB rate. Since the bootstrapped in other cases, TRUE is a more likely data is not using all observations in each prediction (e.g., probability 0.8). For tree, we can predict that the classification every probability value, the true positives error rate for those observations that have (i.e., the cases which have been rightly been randomly excluded from the sample labeled “TRUE” of “FALSE”, also called the tree is built on. The OOB error rate, sensitivity) and the false positives (i.e., the therefore, is a very good measure for cases which have been wrongly labeled— the robustness of the model. If the OOB also called specificity) are counted. Plotting error rate is no longer decreasing when these values against each other results in adding more trees to the ensemble, we an ROC curve (see Figure 9). A perfect have reached a sufficient number of trees. model that predicts every observation Figure 8 shows that the OOB error rate right would be represented by an ROC seems quite stable with 200 and more curve that hugged the top-left corner of trees so 300 trees should be a save choice. the plot. The bigger the area under the 113

European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 113