Background: Natural experiments are of increasing interest because of their potential to address confounding and selection, leading to stronger evidence of causal effects. Regression discontinuity (RD) designs can be used to evaluate an intervention or exposure in which a cut-off rule determines treatment allocation. In order to make the best use of these studies in systematic reviews, critical appraisal tools need to reflect the strengths and limitations specific to the design.
Objectives: We aimed to:
1) test the US Department of Education What Works Clearinghouse Standards for RD (WWC) using a sample of public health studies;
2) develop a critical appraisal checklist for RD that is applicable to health research; and
3) apply the checklist in a comprehensive review of RD studies.
Methods: We conducted a systematic review of RD studies of health outcomes following a protocol registered with PROSPERO (CRD42015025117). We searched textbooks, methodological papers and the Internet for potential quality assessment tools. We piloted the only published RD-specific assessment tool (WWC) and evaluated 17 studies of minimum legal drinking age (MLDA) legislation. We developed a 10-item checklist for RD appraisal (RD-10) and assessed the quality of 181 studies included in the systematic review.
Results: Of the 17 MLDA studies, 16 failed to meet WWC standard 2 (reporting of attrition) and therefore failed the overall quality standard. WWC produced limited description of study quality, was of little use in distinguishing between higher and lower quality studies, and made no allowances for cross-sectional or retrospective applications of the design. We adapted the WWC standards to produce 10 individual criteria that could be judged as yes, no, or unclear. The majority of studies (160/181; 88.4%) provided a narrative explaining how the cut-off rule was implemented. Just over half of the studies (93/181; 51.4%) specified whether the cut-off value was used only to assign participants to the treatment of interest. Fewer than half the studies reported a density test or a falsification test (74/181; 40.9%). Only 5% (9/181) of studies fully met the 10 criteria.
Conclusions: Compared to WWC, RD-10 is easier to use, produces a more detailed description of quality, and is more applicable to the retrospective RD designs commonly seen in health research.