European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 99
European Policy Analysis
logic. The traditional approach is closer
to our ideas about how politics work.
Researchers have some expectations
about causalities and try to verify or
falsify these expectations empirically.
Most of the time, they will end up with
two values: one describing the strength of
the effect (e.g., R2) and one the certainty
of these results (e.g., the significance
level). In this approach, data has to be
carefully selected in order to allow the
generalization, which is the aim of the
whole procedure.
Machine learning takes data as
given. Without hardly any theoretical
assumptions about the relationship
between different variables, the computer
tries to identify patterns and transfers
these findings into a computational
model. The researcher is interested in
the accuracy (comparison of predicted
values and real values) and the robustness
(performance on new data) of the model.
expected causalities might even bias the
results because data collection, analysis,
and interpretation are guided by the
research interest.
Machine
learning
often
outperforms traditional approaches
in accuracy and might reveal relations
that are nonintuitive. On the other
hand, generalizations from machine
learning can easily lead to misjudgments,
especially when correlations are taken for
causalities.
Machine learning, therefore, is not
just a new toolbox for the same problems.
It should rather be seen as a different way
of thinking about political science issues
which is adequate in cases where data is
complex and theoretical expectations are
missing or are drawn into question.
The paper is structured as follows.
In the Statistical Theory and Data
Explanation section, the applied machine
learning methods will be presented.
The section starts with a closer look on
the idea behind machine learning and
discusses why machine learning is useful
in political science. To have an example
on which the methods can be discussed,
this paper takes a test case from the
mainstream of policy studies: Federal
budget of the United States and attention
of Congress and President. Researchers
from punctuated equilibrium theory
(PET) have intensively studied budgets
and attention.
Based on this, the machine
learning methods—decision trees and
random forest—are introduced. The
section ends with an explanation of the
concept of cross-validation.
The second part of the paper
shows an empirical application in detail.
The discussed methods are used to predict
punctuations in annual budgets. In this
With algorithmic methods, there
is no statistical model in the usual
sense; no effort has been made
to represent how the data were
generated. And no apologies are
offered for the absence of a model.
There is a practical data analysis
problem to solve that is attacked
directly with procedures designed
specifically for that purpose. (Berk
2006, 263)
These differences can be seen as
strengths as well as weaknesses of the
two approaches. The traditional approach
is more general because it is based
on expected causalities. On the other
hand, the approach will hardly detect
any patterns that are not connected to
the former expectations. In addition,
99