CHIMERAS showed better inter-rater reliability and inter-consensus reliability than GRADE in grading quality of evidence: a randomized controlled trial

Session: 

Oral session: Investigating bias (3)

Date: 

Tuesday 18 September 2018 - 11:20 to 11:40

Location: 

All authors in correct order:

Wu X1, Chung VC2, Wong CHL2, Yip BHK2, Cheung WKW2, Wu JCY2
1 Central South University, China
2 The Chinese University of Hong Kong, China
Presenting author and contact person

Presenting author:

Xin-Yin Wu

Contact person:

Abstract text
Background: To inform decision making and guideline developing, appraising quality of evidence (QoE) is an essential process for performing a systematic review. The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) is one of the tools for assessing QoE, however, concerns about its reliability and comprehensiveness have been raised.

Objective: To address these shortcomings, we developed the Clinical and Health Intervention Meta-analysis Evidence RAting System (CHIMERAS). This randomized controlled trial aims to assess and compare the reliability of CHIMERAS and GRADE.

Methods: A single-center, parallel randomized controlled trial was conducted to assess and compare inter-rater (IR) reliability (including IR reliability among individual raters and inter-consensus reliability across pairs of raters) of CHIMERAS and GRADE. Raters were randomly assigned into two groups. They were trained to use either GRADE or CHIMERAS for assessing QoE. QoE from 100 Cochrane systematic reviews (SRs) was assessed with GRADE in group 1 and CHIMERAS in group 2. IR reliability and inter-consensus reliability were evaluated by calculating the two-way random, single-measures intra-class correlation (ICC).

Results: The 100 SRs covered 17 different categories of conditions, and had included both pharmacological (37.0%) and non-pharmacological interventions (63.0%). CHIMERAS showed moderate agreement (ICC = 0.54, 95% confidence interval (CI) 0.44 to 0.64), while GRADE had fair agreement (ICC = 0.38, 95% CI 0.28 to 0.49) for IR reliability among individual raters. CHIMERAS showed substantial agreement (ICC = 0.78, 95% CI 0.69 to 0.84), while GRADE had moderate agreement (ICC = 0.52, 95% CI 0.36 to 0.65) for inter-consensus reliability across pairs of raters. With GRADE, 77.0% and 11.0% of SRs were judged as having low or very low, and high QoE, respectively. With CHIMERAS, 10.0% and 54.0% of SRs were judged as having low or very low, and high or very high QoE, respectively.

Conclusions: CHIMERAS outperformed GRADE in terms of IR reliability and inter-consensus reliability. CHIMERAS and GRADE also showed substantial disagreement in grading QoE, indicating the possible impact on decision making attributable to varying rating approaches.

Patient or healthcare consumer involvement: No patients or healthcare consumer was involved in this trial.

Relevance to patients and consumers: 

Evidence-based clinical practice partly depends on results from best available evidence for supporting decision making. Identifying the best available evidence is an essential process evidence-based practice. A sensitive quality of evidence appraisal approach, which can help evidence users to distinguish relatively better evidence from a pool of imperfect evidence, will help the adoption of evidence in real-world decision making. The development of the Clinical and Health Intervention Meta-analysis Evidence RAting System will serve this objective so as to facilitate the translation of evidence into better health care for patients.