Increased risks of false-positive or false-negative findings are common in outcomes graded as high certainty of evidence




Nussbaumer-Streit B1, Gartlehner G2, Wagner G1, Patel S3, Swinson-Evans T3, Dobrescu AI4, Gluud C5
1 Cochrane Austria, Austria
2 Cochrane Austria and RTI International, Research Triangle Park, Austria
3 RTI International, Research Triangle Park, USA
4 Genetics Department, Victor Babes University of Medicine and Pharmacy, Romania
5 Copenhagen Trial Unit, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen University Hospital, Denmark
Barbara Nussbaumer-Streit

Abstract text
GRADE (Grading of Recommendations Assessment, Development and Evaluation) has become a commonly used tool to convey the certainty of evidence (CoE) in systematic reviews. For decision-makers, such assessments are crucial because they convey the confidence that review authors have in the results. However, previous research has shown that 20% of outcomes graded as high CoE changed substantially as new studies were added. This raises concerns because high CoE, by definition, means that the effect estimate should remain stable when new studies are added to a systematic review. Possible explanations for the limited predictive value of high CoE outcomes could be a lack of adherence to the GRADE guidance, or the conceptual approach to grading CoE, which may not adequately take into consideration the risk of false-positive or false-negative conclusions.

We aimed to identify the factors responsible for the limited predictive value of high CoE grades; specifically, whether an increased risk of type I or type II errors could be the reason.

We randomly selected 100 Cochrane Reviews with dichotomous outcomes rated as high CoE using GRADE. To detect increased risks for random errors, two investigators independently conducted Trial Sequential Analysis (TSA) employing conventional thresholds for type I (α = 0.05) and type II (β = 0.10) errors. We dually re-graded all outcomes with increased risks of random errors and conducted multivariate logistic regression analyses to determine predictors of increased risks.

Overall, 38% (95% confidence interval 28% to 47%) of high CoE outcomes had increased risks of random errors. Outcomes measuring harms were more frequently affected than outcomes assessing benefits (47% versus 12%). Re-grading of outcomes with increased random errors showed that 74% should not have been rated as high CoE based on current guidance. Regression analyses rendered small absolute risk difference (P = 0.009) and low number of events (P = 0.001) as significant predictors of increased risks of random errors.

Decision-makers need to be aware that outcomes rated as high CoE often have increased risks of false-positive or false-negative findings.

Patient or healthcare consumer involvement:
Assessments of CoE are important for informed decision-making by healthcare consumers and they should be reliable.

Relevance to patients and consumers: 

Assessments of certainty of evidence (CoE) are important for informed-decision making because they convey the confidence that reviewers have in the correctness and stability of results. Our research has shown that even an outcome rated as high CoE does not automatically mean that the result will stay stable over time. To guarantee good clinical and health policy decision making the predictive value of high CoE needs to improve. It is necessary that decision makers can rely on the assumptions that high CoE are correct and stable, because their decision making directly influences patients’ health.