2018-2019 exchange Winter 2019 Newsletter FINAL | Page 15
– 49 versus 50-50, well then you need more samples. What’s the
type of signal you are trying to detect will depend on how much data
you need. Also, how good the data is.
Oftentimes in business you want to solve a quick problem, so you
don’t need that much. A lot of times you want to create some com-
petitive advantage, we want to find some data that helps us be a lit-
tle better than everybody else. And for that, having a large data set
and having additional input data that helps us predict something a lit-
tle better is crucial.
Jen: Are there certain types of data that would not interest you? That
is, data you don’t see any value in having—for example, data that is
50 years old.
Glenn: I do get that question a lot. There are two kinds of data that I
would not consider valuable. One is data that is no longer relevant or
was never relevant. From a data age perspective, it depends on how
quickly the phenomenon that you are trying to predict changes. For
life insurance mortality data, we easily look at data that is 25, 30
years old, and it is still quite relevant. This is because, luckily, death
is a rare phenomenon and the major source of what people die from
hasn’t really changed over the past few decades; it’s changed a little
bit, but not dramatically. That data would be valuable. If you are look-
ing at who is going to buy the next iPhone and you want to do a mar-
keting campaign, your 10-year old data would be quite useless be-
cause the market for smart phones has dramatically changed over
the past 10 years. Data is also not helpful if it got corrupted or can-
not be used by regulation.
Jen: Is there data on certain media that you would not use because it
would be cost prohibitive? For example, historical data on microfilm.
Glenn: Cost certainly does matter and sometimes data can be
hard to get to if it’s on microfiche and it’s more than a couple
decades old or if it’s handwritten on paper. For underwriting ap-
plication, it might be useful but for a marketing application it
probably is too expensive. So, one does have to look at the cost of
digitizing the data and making it useful versus the value it actually has
for that particular application.
15