Facing Challenging Situations When Grading Strength of Evidence
AHRQ's 2012 Annual Conference Slide Presentation
Select to access the PowerPoint® presentation (800 KB).
Slide 1

Facing Challenging Situations When Grading Strength of Evidence
Presenters:
Nancy Santesso, RD, MLIS, McMaster University
Nancy Berkman, PhD, RTI International
Slide 2

Background
- Systematic reviewers need to provide clear judgments about the evidence that underlies conclusions of the review to enable decisionmakers to use them effectively.
- "Strength of evidence" grading is a key indicator of a review team's level of confidence that the studies included in the review collectively reflect the true effect of an intervention on a health outcome.
- Deciding on the appropriate strength of evidence grades can be challenging because of the complexity and unique characteristics of the evidence included in the review.
Slide 3

Session Approach and Goals
- Briefly review the AHRQ approach to grading the strength of evidence.
- Assume some prior experience in grading.
- Present a series of strength of evidence grading challenges.
- Not necessarily one "right answer" and would like session participants to share their thoughts with their neighbor and then discuss with the full group.
- Nancy S. will review how GRADE would approach the decision.
Slide 4

Steps in AHRQ EPC Approach to Grading SOE
- Separately for RCT and observational study evidence, aggregated across studies, for each outcome.
- Score 5 required domains:
- Risk of bias (Study limitations), Consistency, Directness, Precision.
- Maybe Publication bias.
- Considering, possibly scoring, 3 additional domains:
- Dose-response association.
- Plausible confounding.
- Strength of association.
- Combine into a separate SOE grade for RCTs and observational studies and then combine into final grade.
Slide 5

Risk of Bias Domain Score
- Concerns adequate control for bias based on both study design and study conduct of individual studies.
- Assesses the aggregate risk of bias of studies separately for RCTs and observational studies.
- Scores: high, medium, or low:
- Based on design, RCTs start as low Risk of Bias and Observational studies start as higher Risk of Bias.
- May be adjusted based on individual study conduct.
Slide 6

Consistency Domain Score
- Degree of similarity in the magnitude (or direction of effect) of different studies within the evidence base.
- Consistent: same direction of effect (same side of "no effect") and narrow range of effect sizes.
- Inconsistent: non-overlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc.
- Unknown or not applicable: single study so cannot be assessed.
Slide 7

Directness Domain Score
- Whether evidence reflects a single, direct link between the intervention of interest and the ultimate health outcome under consideration.
- Direct: single direct link between the intervention and health outcome.
- Indirect: evidence relies on:
- Surrogate or proxy outcomes.
- More than one body of evidence (no head-to-head studies).
Slide 8

Precision Domain Score
- Degree of certainty for estimate of effect with respect to a specific outcome.
- Precise: estimate allows a clinically useful decision.
- Imprecise: confidence interval is so wide that it could include clinically distinct (even conflicting) conclusions.
- Unknown: measures of dispersion not provided.
Slide 9

Reporting Bias Domain Score
- Publication bias: nonreporting of results.
- Selective outcome reporting: nonreporting of planned outcomes.
- Selective analysis reporting: reporting only the most favorable analyses.
- Suspected.
- Undetected.
Slide 10

Additional "Discretionary" Domains
- Dose-response association (pattern of larger effect with greater exposure): present, not present, NA.
- Plausible confounders (confounding that works in the direction opposite, "weakens" effect): present, absent.
- Strength of association (effect so large that cannot have occurred solely as a result of bias from confounders): strong, weak.
- Applicability is considered separately.
Slide 11

Integrating Domain Scores Into a SOE Grade
- EPCs can use different approaches to incorporating multiple domains into an overall strength of evidence grade:
- Important that it is consistent within the review and transparent.
- Evaluation needs to be made by (at least) 2 reviewers.
- Must document approach used.
Slide 12

AHRQ and Grade Grading Categories
AHRQ
HIGH |
MODERATE |
LOW |
INSUFFICIENT |
Slide 13

Challenge 1: 1 Study, Continuous Outcome, "Significant Effects"
- Question: What are the effects of a "fasting followed by vegan" diet for rheumatoid arthritis in adults?
- Outcome: Pain (13 months)—measured on a 10 cm VAS scale.
- Kjeldsen-Kragh 1991—population (age 18-75), mild to severe rheumatoid arthritis.
Image: A fragment of a table shows the outcomes of the Kieldsen-Kragh 1991 study:
Treatment:
- Mean - ;3.6.
- SD - 2.69.
- Total - 17.
Control:
- Mean - 5.49.
- SD - 2.44.
- Total - 17.
Mean Difference IV, Fixed, 95% CI: -1.89 [-3.62, -0.16].
- Examples:
- Developing a fair cost-sharing structure (Ginsburg et al., 2012).
- Priority-setting social and health interventions (Pesce et al., 2011).
Slide 14

Challenge 1: 1 Study, Continuous Outcome, "Significant Effects"
Risk of Bias: LOW
- Allocation concealment.
- Random sequence generation by computerised random number generator.
- Blinding: no participants; outcome assessors, investigators and data analysts blinded.
- No loss to follow-up.
- Other biases—none.
Reporting bias: UNDETECTED: Comprehensive search of major databases, grey literature, contacting authors in field, & government funding—no additional studies.
Slide 15

Challenge 1: 1 Study, Continuous Outcome, "Significant Effects"
Image: A fragment of a table shows the outcomes of the Kieldsen-Kragh 1991 study, described on Slide 13.
Slide 16

Challenge 1: What Is the Strength of Evidence and Why?
Discuss with your neighbor.
Vote! Strength of evidence.
- High.
- Moderate.
- Low.
- Insufficient.
Slide 17

Challenge 1: Assessment
- Risk of bias LOW.
- Consistency: Unknown (one study).
- Reporting bias: Undetected.
- Directness: Direct (outcome, population, intervention).
- Precision?
- Confidence intervals?
- 34 people?
Slide 18

Optimal Information Size
- We suggest the following: if the total number of patients included in a systematic review is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial, consider rating down for imprecision. Authors have referred to this threshold as the "optimal information size" (OIS).
- http://stat.ubc.ca/~rollin/stats/ssize/

Slide 19

Rule of Thumb
- For continuous outcomes: suggest at least a sample size of 400.
- More empirical evidence needed.
- Minimally Important Differences.
Slide 20

Challenge 1 Assessment (Modification)
- Risk of bias: MEDIUM—no allocation concealment; 30% loss to follow-up—most treatment related but evenly distributed.
- Consistency: Unknown (single study).
- Reporting bias: Undetected.
- Directness: Indirect (outcome, population—age >65 only, intervention).
- Precision: Imprecision.
- Rating???
Slide 21

Challenge 2: 1 Study, Dichotomous Outcome, "Non-significant Effects"
- Question: What are the effects of over the counter medications in acute pneumonia in children?
- Outcome: not cured or not improved.
- Principi 1986- population—inpatients age 2-16.
Image: A fragment of a table shows the outcomes of the Principi 1986 study:
Mucolytic (Ambroxol):
- Events - 3.
- Total - 60.
Placebo:
- Events - 7.
- Total - 60.
Odds Ratio MH, Fixed, 95% CI: 0.40 [0.10, 1.62].
Slide 22

Challenge 2: Assessment
Risk of bias: LOW
- Allocation concealment—unclear?
- Adequate sequence generation—computer generated random numbers.
- Blinding of participants and outcome assessors; unclear for data analysts.
- Complete outcome data.
Reporting bias
- Undetected; Selective outcome reporting bias: no, one study found for this medication and reported this outcome.
Slide 23

Challenge 2: What Is the Strength of Evidence and Why?
Discuss with your neighbor.
Vote! Strength of evidence.
- High.
- Moderate.
- Low.
- Insufficient.
Slide 24

Precision?
- Confidence intervals.
- Power calculation.
- Rules of thumb.
Image: A fragment of a table shows the outcomes of the Principi 1986 study, described on Slide 21.
Slide 25

Table
| Total Number of Events | Relative Risk Reduction | Implications for meeting OIS threshold |
|---|---|---|
| 100 or less | ≤30% | Will almost never meet threshold whatever control event rate |
| 200 | 30% | Will meet threshold for control group risks of ~25% or greater |
| 200 | 25% | Will meet threshold for control group risks of ~ 50% or greater |
| 200 | 20% | Will meet threshold only for control group risks of ~80% or greater |
| 300 | ≥30% | Will meet threshold |
| 300 | 25% | Will meet threshold for control group risks of ~25% or greater |
| 300 | 20% | Will meet threshold for control group risks of ~60% or greater |
| 400 or more | ≥25% | Will meet threshold for any control group risks |
| 400 or more | 20% | Will meet threshold for control group risks of ~40% or greater |
Slide 26

Sample Size
Optimal information size given alpha of 0.05 and beta of 0.2 for varying control event rates and relative risks
Image: A line graph shows the curve for total sample size required per control group rate event. A note reads, "For any chosen line, evidence meets optimal information size criterion if sample size above the line."
Slide 27

Number of Events
Image: A line graph shows the total number of events needed per control group rate event. A note reads, "For any chosen relative risk reduction, the available evidence meets optimal information size criterion if the number of events is above the associated line."
Slide 28

Precision:
Image: A fragment of a table shows the outcomes of the Principi 1986 study, described on Slide 21.
- Confidence intervals.
- Power calculation.
- Rules of thumb.
Slide 29

Challenge 3: Inconsistency and Precision
Question: Effects of taxane chemotherapy in early breast cancer.
Outcome: febrile neutropaenia (adverse event).
A priori exploration of heterogeneity: type of cancer; age; dose of taxane—could not explain heterogeneity.
Image: A table shows the outcomes of several studies on the effects of taxane chemotherapy.
Slide 30

Challenge 3: Assessment
Risk of bias: LOW
Reporting bias: undetected
Direct (population, intervention, outcome)
Discuss with your neighbor.
Vote! Strength of evidence.
- High.
- Moderate.
- Low.
- Insufficient.
Slide 31

Consistency and Precision
Image: A table shows the outcomes of several studies on the effects of taxane chemotherapy.
Confidence intervals—Non significant??
Rules of thumb
Optimal Information size—power calculation
Unexplained inconsistency
Overlapping confidence intervals
I2, p value of Chi2
Slide 32

Challenge 4: RCT and Observational Study Data
- Major bleeding: Cold Knife Conization vs. LEEP for women with confirmed cervical abnormalities.
- What is the overall SOE grade?
| Number of studies (subjects) | RoB | Consistency | Directness | Precision | Reporting Biase | RR | SOE |
|---|---|---|---|---|---|---|---|
3 RCTs | Low | Consistent | Direct | Imprecise | Undetected | 0.79 (0.23 to 2.58 | Moderate |
2 obs studies | Low | Consistent | Direct | Imprecise | Undetected | 3.42 | Low |
Slide 33

Challenge 4: Are You More or Less Confident in the RCT Data Given the Observational Data?
Discuss with your neighbor: Does the addition of the observational studies data make you more or less confident?
Vote! Overall strength of evidence.
- High.
- Moderate.
- Low.
- Insufficient.
Slide 34

Challenge 5: Telephone Counselling to Improve Adherence to Diet
Narrative synthesis
Total number of studies: 4
Total number of participants: 255
| Study | Statistic/Measure | Results | |
|---|---|---|---|
| Chui 2005 | Median (IQR) 0 (none)—3 (complete adherence) | Telephone (n=31): 1 (0-1) Control (n=32): 0 (0-0)- "Slight improvement with telephone intervention" | P value (0.32) "calculate effect" |
| Stewart 2005 | Number adhering to diet | Telephone (n=40): 26/40 Control (n=38): 15/38—"Slight improvement with telephone intervention | RR 1.65 (1.05, 2.59) |
| Racelis 1998 | Diet score | Telephone (n=11): improved Control (n=10): improved—"no effect of telephone intervention vs control" | "No significant difference" |
| Cummings 1981 | Number compliant | All groups (n=93) compliance pre = 86%, post = 90%- "study cannot be used" | Not reported |
Slide 35

Challenge 5: Assessment
- Risk of bias:
- Medium.
- Precision:
- All together 162 participants with small effect: OIS not met.
- Consistency:
- Some inconsistency.
- Directness:
- No concern.
- Reporting bias:
- No small negative study?
Slide 36

Challenge 5: What Is the Strength of Evidence and Why?
Discuss with your neighbor.
Vote! Strength of evidence.
- High.
- Moderate.
- Low.
- Insufficient.
Slide 37

More Information
Nancy Santesso
RD, MLIS, PhD Cand
Department of Clinical Epidemiology and Biostatistics
McMaster University
santesna@mcmaster.ca
Nancy Berkman, PhD
Senior Health Policy Research Analyst
Program on Healthcare Quality and Outcomes
berkman@rti.org


5600 Fishers Lane Rockville, MD 20857