European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 99

European Policy Analysis logic. The traditional approach is closer to our ideas about how politics work. Researchers have some expectations about causalities and try to verify or falsify these expectations empirically. Most of the time, they will end up with two values: one describing the strength of the effect (e.g., R2) and one the certainty of these results (e.g., the significance level). In this approach, data has to be carefully selected in order to allow the generalization, which is the aim of the whole procedure. Machine learning takes data as given. Without hardly any theoretical assumptions about the relationship between different variables, the computer tries to identify patterns and transfers these findings into a computational model. The researcher is interested in the accuracy (comparison of predicted values and real values) and the robustness (performance on new data) of the model. expected causalities might even bias the results because data collection, analysis, and interpretation are guided by the research interest. Machine learning often outperforms traditional approaches in accuracy and might reveal relations that are nonintuitive. On the other hand, generalizations from machine learning can easily lead to misjudgments, especially when correlations are taken for causalities. Machine learning, therefore, is not just a new toolbox for the same problems. It should rather be seen as a different way of thinking about political science issues which is adequate in cases where data is complex and theoretical expectations are missing or are drawn into question. The paper is structured as follows. In the Statistical Theory and Data Explanation section, the applied machine learning methods will be presented. The section starts with a closer look on the idea behind machine learning and discusses why machine learning is useful in political science. To have an example on which the methods can be discussed, this paper takes a test case from the mainstream of policy studies: Federal budget of the United States and attention of Congress and President. Researchers from punctuated equilibrium theory (PET) have intensively studied budgets and attention. Based on this, the machine learning methods—decision trees and random forest—are introduced. The section ends with an explanation of the concept of cross-validation. The second part of the paper shows an empirical application in detail. The discussed methods are used to predict punctuations in annual budgets. In this With algorithmic methods, there is no statistical model in the usual sense; no effort has been made to represent how the data were generated. And no apologies are offered for the absence of a model. There is a practical data analysis problem to solve that is attacked directly with procedures designed specifically for that purpose. (Berk 2006, 263) These differences can be seen as strengths as well as weaknesses of the two approaches. The traditional approach is more general because it is based on expected causalities. On the other hand, the approach will hardly detect any patterns that are not connected to the former expectations. In addition, 99