CardioSource WorldNews | Page 31

“ We want to make sure we ’ re asking a relevant question instead of just blindly calculating a score for everybody or recalculating it even though an action has already been taken .”
A ( Brief ) Primer in Risk Statistics ]
Similar to the ways in which the descriptive statistics used in the reporting of clinical trials have watered down for the masses and tend to rely on simple , easy to understand statistical means , so do the stats used in risk prediction . Here ’ s a quick primer on risk score statistics :
Risk prediction models generally use various patient characteristics measured at one time point ( e . g . age , sex , race , smoking status , blood pressure , cholesterol levels ) to estimate the probability of an outcome occurring within a given time period ( e . g . 10-year risk of an atherosclerotic cardiovascular disease event ).
Most risk scores depend on regression analysis , the area of statistics that attempts to predict or estimate the value of a response ( dependent ) variable from the known values of one or more explanatory ( independent ) variables . Logistic regression , as an example , is used when the response variable is a binary categorical variable ( i . e ., diseased or not diseased ). The Framingham Risk Score uses a Cox proportional hazards model to stratify cardiovascular risk into three categories : low intermediate , and high .
The validity of a prediction model is assessed by evaluating its discrimination and calibration . Discrimination refers to the model ’ s ability to categorize those who develop events ( or a disease ) from those who do not .
Calibration measures how well the model ’ s predicted risk matches the observed event rates . So , for example , if 15 % of a group of people with a particular risk profile develop heart disease over a 10-year period , the risk score should produce a10-year risk ( probability ) of 15 % for subjects from that group .
The C-statistic is a global measure of model discrimination . When outcomes are binary , it reflects the probability that the model does a good job of identifying cases from non-cases . It is analogous to the area under a receiver operator characteristic ( ROC ) curve .
The C stands for concordance and , indeed , the model measures concordance between modelbased risk estimates and observed events . A C-statistic of 0.50 indicates that the model is no better than chance at making a prediction ( i . e ., random concordance ), while a value of 1.0 indicates that the model perfectly identifies those within a group and those not ( perfect concordance ). Values above 0.5 suggest some predictive value , but somewhat weak predictive models may generate C-statistics in the 0.75 range . 1 When the C-statistic is greater than 0.8 , the model is considered strong .
The C-statistic is undoubtedly the most popular means of measuring model accuracy and has several virtues , including that it ’ s easy to compute using statistical programs . It is also easy to explain , understand , and adapt to different situations .
However , as discussed by Michael J . Pencina , PhD , and Ralph B D ’ Agostino , Sr ., PhD , in a 2015 JAMA article , the C statistic also has several limitations and is prone to misuse and misunderstanding . 2 They should know : Dr . Pencina is the director of Biostatistics and faculty associate director at Duke Clinical Research Institute and Dr . D ’ Agostino is the chairman of the Mathematics and Statistics Department at Boston University . Both are experts in cardiovascular disease risk prediction model development and assessment of performance .
The C-statistic , they said , does not communicate as much clinically relevant information as negative and positive predictive values , sensitivity , and specificity . Nor does it balance misclassification errors .
“ In addition , the C-statistic is only a measure of discrimination , not calibration , so it provides no information regarding whether the overall magnitude of risk is predicted accurately ,” they wrote . There are several “ appealing single-number alternatives ” to the C-statistic , which include the discrimination slope , the Brier score , and the difference between sensitivity and 1 minus specificity evaluated at the event rate .
The C-statistic is best used as a “ familiar first-glance summary ,” and best supplemented with other statistical and clinical measures when a more in-depth evaluation of discrimination value of a risk model is desired , they concluded .
Another way to compare the discriminatory performance of a model is with net reclassification improvement ( NRI ; also called net reclassification index ). First introduced in 2008 by Drs . Pencina and D ’ Agostino and colleagues , 3 and the NRI was then expanded upon in 2011 . 4
NRI is an index that attempts to quantify how well a new model reclassifies subjects — appropriately or inappropriately — as compared to an old model . Typically , this comparison is between the original model ( e . g ., myocardial infarction as a function age and sex ) and a new model , which is the original model plus one additional component ( e . g ., myocardial infarction as a function of age , sex , and weight ).
NRI is composed of two components : subjects without events and subjects with events and calculations work the same for both groups . So subjects without events who were correctly reclassified lower are assigned a + 1 . Subjects without events who were incorrectly classified as higher are assigned a -1 . Subjects not reassigned are assigned a 0 . Then do the same but reversed for subjects with events – that is , assign + 1 for subjects correctly reclassified higher and a -1 for those incorrectly reclassified as lower . Add the scores in each group and divide by the number of subjects in that group . The sum of these two values is the NRI .
REFERENCES
1 . Hermansen SW . SAS Global Forum 2008 Paper 143-2008 . Evaluating Predictive Models : Computing and Interpreting the c Statistic . Available at http :// www2 . sas . com / proceedings / forum2008 / 143-2008 . pdf . Accessed August 7 , 2016 . 2 . Pencina MJ , D ’ Agostino RB Sr . JAMA . 2015 ; 314:1063-4 . 3 . Pencina MJ , et al . Stat Med . 2008 ; 27:157-72 . 4 . Pencina MJ , et al . Stat Med . 2011 ; 30:11 – 21 .

“ We want to make sure we ’ re asking a relevant question instead of just blindly calculating a score for everybody or recalculating it even though an action has already been taken .”

— Thomas M . Maddox , MD
“ We want to make sure we ’ re asking a relevant question instead of just blindly calculating a score for everybody or re-calculating it even though an action has already been taken ,” said Dr . Maddox . With greater automation of risk scores , he is also concerned they may impede clinical flow , unnecessarily adding time to a clinic visit .
“ With the amount of automation that is available at the bedside , through your smartphone , through the EHR at the hospital or the clinic , it ’ s not longer an issue that physicians don ’ t have the time to start mentally doing the math to calculate a risk score ,” said Dr . Maddox .
In May , Benjamin A . Goldstein , PhD , ( see sidebar on “ Machine Learning ”) and colleagues published in the Journal of the American Medical Informatics Association a systematic review of studies that have utilized EHR data as the primary source to build and validate risk prediction models . 9 Searching over a 6-year period , they found more than 100 papers from 15 different countries . Not surprisingly , most of the studies had large sample sizes ( median n = 26,100 ); 39 studies had sample sizes exceeding 100,000 patients .
However , despite sample sizes that are the stuff of statistical dreams , the authors found that most of the studies failed to fully leverage the breadth of EHR data and employed relatively few predictor variables ( a median of 27 ). Also , even though the data were conveniently electronic , less than half of the studies were multicenter and less than one-quarter of the studies performed validation across multiple sites .
“ Overall , we found room for improvement in maximizing the advantages of EHR-data for risk modeling and addressing inherent challenges ,” wrote Goldstein et al .
Many of the presumed advantages of using EHR data — the sample sizes , the large number of variables available , the fact that the data are not disease-specific ( like registry data ), and the opportunity to create external validation sets — were all largely under-exploited , according to Dr . Goldstein . Add that to some needed evolution in study design , the issues of missing data and loss of follow-up , and the difficulty of quantifying the impact of informed presence , and the field remains a work in progress .
ACC . org / CSWN CardioSource WorldNews
29