CardioSource WorldNews | Page 37

Also called data dredging , data fishing , data snooping , and equation fitting , p-hacking appears to be relatively widespread 11 and may be especially likely in today ’ s world where big data offer tantalizing opportunities to draw conclusions , where in fact none may lie . Remember : looking for patterns in data is legitimate . Applying hypothesis testing to that same data from which the pattern was detected is data dredging .
Dr . Pocock noted that a good amount of cardiology research is observational in nature , which does lend itself to less strict analysis . “ Because you probably didn ’ t have a clearly defined a priori research hypothesis in the first place , observational research and epidemiological research tend to be much looser on these things .”
It ’ s important to understand that p-hacking isn ’ t just one thing . There are several ways to manipulate data . Leif D . Nelson , PhD , from the Haas School of Business at the University of California , Berkeley , CA , gave a talk on the topic in 2014 . His slides were posted on Twitter by the UC Berkeley Initiative for Transparency in the Social Sciences (@ UCBITSS ) and listed easy ways to p-hack ( see list ).
Six Ways to p-Hack
1 Stop collecting data once p < 0.05 . 

2 Analyze many measures , but report only those with p < 0.05 . 

3 Collect and analyze many conditions , but only report those with p < 0.05 . 

4 Use covariates to get p < 0.05 . 
 5 Exclude participants to get p < 0.05 . 
 6 Transform the data to get p < 0.05 . 

Dr . Nelson , who holds an endowed professorship in Business Administration and Marketing , suggested that while we accept the threshold of p < 0.05 — in other words a 5 % false-positive rate — if we allow p-hacking , then you can calculate the effect and the false-positive rate is actually 61 %. That makes p-hacking , he said , a “ potential catastrophe to scientific inference ” that , in part at least , can be solved through complete and transparent reporting .
Dr . Nelson was coauthor of a paper that introduced the “ p curve ,” defined as the distribution of statistically significant p values for a set of studies . 12 As they explained , “ Because only true effects are expected to generate right-skewed p curves — containing more low ( 0.01s ) than high ( 0.04s ) significant p values — only right-skewed p curves are diagnostic of evidential value .”
Who ’ s to Blame for This p-Fiasco ? In a blog post commenting on the ASA statement ,
Jan G . P . Tijssen , PhD is emeritus professor of clinical epidemiology and biostatistics at The Academic Medical Center - University of Amsterdam in the Department of Cardiology . His interests are in study design and data analysis of clinical research with a special focus on cardiology and vascular medicine . He teaches courses on Clinical Trial Methodology and Data Analysis at the Erasmus Summer Program . The author or coauthor of more than 600 papers , Dr . Tijssen is a statistical co-editor for the Journal of the American College of Cardiology . As this issue of CSWN was going to press , JACC editors were reviewing an editorial on p values in biomedical research written by Dr . Tijssen and Dr . G Paul Kolm , a statistical editor for JACC : Interventions , which should be available online at JACC ahead of print by the time you receive this issue of CardioSource WorldNews .
What needs to be done to improve this situation with p values and statistical analysis ? Writers of journal papers need to adjust to the idea that when you report your results , you first have to describe the findings of your study , that is the descriptive statistics , and then you need effect estimates to interpret the data and , at the end of the day , you need a p value to check whether the data as observed are compatible with the play of chance or less compatible with the play of chance and much more compatible with evidence of an underlying treatment benefit or treatment effect .
What about the journals ? As I understand it , if your p value is not < 0.05 , you may not get published — or you may be very restricted in how you can present your findings . We all know that there is a tendency among the journals to publish significant differences and not to publish non-significant differences , but we really have to get rid of the idea that if p < 0.05 that there is evidence of an effect , and if p > 0.05 there is no benefit at all . The p value is determined by the size of your investigation , so if you have a relatively small study , the p value could be greater than 0.05 but in reality there is an effect of the determinant that you are studying .
Do you think journals are becoming more receptive to authors presenting these descriptive statistics , but not showing statistical significance ? I can only speak for the journal with which I am affiliated [ JACC ], but I know that the editorial board and the chief editor are very sensitive to this issue . We really have to ensure that the data , as reported , are clearly understandable and , of course , the statistical interpretation is important , but the guarded interpretation of the observation is as important .
The process can be much improved if the authors are aware of the fact that we as journal editors are not only interested in p values . We are interested in the data and the observations of the study and we are interested in translating the findings to patients — whether the treatment should or should not be given to patients . And that translation is based on effect estimates , confidence intervals , and much less on whether it ’ s < 0.05 or > 0.05 .
So this is an important message for researchers to hear . That is my perspective . Quite often I have to ask the authors to rewrite their results section because they ’ ve jumped to p values way too fast . If I ’ m judging the outcomes of a clinical trial , I ’ m not looking at the p value first , but I ’ m looking at the event rates in the two treatment groups and the differences in outcomes . At the end , I ’ ll look at the p value , but I let my interpretation much more be guided by the 95 % confidence interval than by the p value itself .
What about cherry picking p values for the abstract ? I ’ ve noticed that sometimes when the primary endpoint is significant and only some of the components of the primary endpoint are significant , only those components will be included in the abstract . The nonsignificant components are buried in the text . If the primary endpoint is significant , the issue for me is not whether every component is significant , but whether the components of the combined endpoint are moving in the same direction , with the same order of magnitude or similar hazard ratios as the overall hazard ratio for the primary endpoint . I ’ m looking for consistency in the hazard ratios , rather than checking whether one is significant and the other one isn ’ t .
Any other comments ? Yes , one more . My epidemiology textbooks of 20 or 25 years ago conveyed this same message that the ASA statement is trying to make now . These issues were made clear by my teachers back then , in particular Professor Olli Miettinen , who was then at Harvard and now is at McGill , and also by Professor Kenneth Rothman in his textbook Modern Epidemiology . So , this is not new .
— DLB
ACC . org / CSWN CardioSource WorldNews
35