European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 103
European Policy Analysis
The budget data is then linked
to the attention data. The construction
of this combined dataset can be studied
with the attached R-code (see supporting
information) and is, therefore, only
explained in general terms here. For
each topic, the following variables are
calculated from the PAP data:
• the annual number of Congress’
hearings on each topic (congress),
• the annual number of public laws
passed by the Congress on each
topic (laws),
• the annual number of executive
orders issued by the President on
each topic (eo),
• the annual number of State of the
Union speeches by the President on
each topic (sou),
• and the annual percentage, how
often the topic was mentioned in
Gallup’s most important problems
(gallup).
These variables measure the
attention within the policy process on
different topics at a given time. For
example, if the President is relating to
environmental issues six times in his
annual State of the Union speech, the
variable sou has the value 6 for the topic
“Natural Resources and Environment” in
this year. Time span of all variables goes
from 1948 to 2014.
In addition, there are four
variables derived from the budget data:
• punctuation TRUE or FALSE
(Punc),
• the year, for which the budget was
proposed (Year),
• and
the
budget
function
(TopicCode).
The President reports on the
beginning of each year: how the budget
in the last year really was (this is the
data in the PAP dataset), how the budget
is distributed in the actual year, and
what his budget plans are for the year
to come (True 2009). To catch the effect
of attention on budget decisions, it is
necessary to calculate a time lag of two
years. Therefore, for all 610 data points,
budget is compared with the attention
variables from two years earlier.
As can be seen in Figure 2, there
are not many punctuations compared
with incremental budget shifts.
Figure 3 gives an overview of the
variables at hand. In the diagonal panels,
we see histograms of the variables with
density plots. The other panels show the
cross-wise comparisons of the variables.
In the lower left panels, the variables
are plotted against each other with a
linear regression fit. In the upper right
panels, the correlation coefficient of the
cross-wise comparison is reported. For
example, there is a correlation of 0.66
between congress and eo.6 Figure 3 gives
a good overview of the complexity of
the dataset. We find a combination of
categorical and numerical variables, of
which the latter do not seem to follow
a normal distribution (see histograms).
There are no strong correlations with the
response variable Punc, but some of the
predictor variables are highly correlated.
These features would make the analysis
with conventional methods quite tricky.
This data is now the starting point
for the task to predict punctuations in
the annual budget with machine learning
algorithms.
103