Appraising Other Study Types

Because other study types have different features, you will not use the same validity criteria for all articles. On the next three pages, you will learn the techniques needed to appraise articles of diagnosis, harm/etiology, prognosis, and systematic review.

Evaluating the Validity of Types of Studies

Evaluating the Validity of a Diagnostic Test Study

Are the results valid?

1. Did participating patients present a diagnostic dilemma?

The group of patients in which the test was conducted should include patients with a high, medium and low probability of having the target disease. The clinical usefulness of a test is demonstrated in its ability to distinguish between obvious illness and those cases where it is not so obvious or where the diagnosis might otherwise be confused. The patients in the study should resemble what might be expected in a clinical practice.

2. Did investigators compare the test to an appropriate, independent reference standard?

The reference (or gold) standard refers to the commonly accepted proof that the target disorder is present or not present. The reference standard might be an autopsy or biopsy. The reference standard provides objective criteria (e.g., laboratory test not requiring subjective interpretation) or a current clinical standard (e.g., a venogram for deep venous thrombosis) for diagnosis. Sometimes there may not be a widely accepted reference standard. The author will then need to clearly justify their selection of the reference test.

3. Were those interpreting the test and reference standard blind to the other results?

To avoid potential bias, those conducting the test should not know or be aware of the results of the other test.

4. Did the investigators perform the same reference standard to all patients regardless of the results of the test under investigation?

Researchers should conduct both tests (the study test and the reference standard) on all patients in the study regardless of the results of the test in question. Researchers should not be tempted to forego either test based on the results of only one of the tests. Nor should the researchers apply a different reference standard to patients with a negative results in the study test.

Key issues for Diagnostic Studies:

diagnostic uncertainty
blind comparison to gold standard
each patient gets both tests

What are the results?

	Reference Standard Disease Positive	Reference Standard Disease Negative
Study Test Positive	True Positive	False Positive
Study Test Negative	False negative	True Negative

Sensitivity: = true positive / all disease positives

measures the proportion of patients with the disease who also test positive for the disease in this study. It is the probability that a person with the disease will have a positive test result.

Specificity: Specificity = true negative / all disease negatives

measures the proportion of patients without the disease who also test negative for the disease in this study. It is the probability that a person without the disease will have a negative test result.

Sensitivity and specificity are characteristics of the test but do not provide enough information for the clinician to act on the test results. Likelihood ratios can be used to help adapt the results of a study to specific patients. They help determine the probability of disease in a patient.

Likelihood ratios (LR):

LR + = positive test in patients with disease / positive test in patients without disease

LR - = negative test in patients with disease / negative test in patients without disease

Likelihood ratios indicate the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder.

Likelihood ratio of a positive test result (LR+) increases the odds of having the disease after a positive test result.

Likelihood ratio of a negative test result (LR-) decreases the odds of having the disease after a negative test result.

How much do LRs change disease likelihood?

LRs greater than 10 or less than 0.1	cause large changes
LRs 5 – 10 or 0.1 – 0.2	cause moderate changes
LRs 2 – 5 or 0.2 – 0.5	cause small changes
LRs less than 2 or greater than 0.5	cause tiny changes
LRs = 1.0	cause no change at all

More about likelihood ratios: Diagnostic tests 4: likelihood ratios. JJ Deeks & Douglas G Altman BMJ 2004 329:168-169

How can I apply the results to patient care?

Will the reproducibility of the test result and its interpretation be satisfactory in your clinical setting?
Does the test yield the same result when reapplied to stable participants?
Do different observers agree about the test results?

Are the study results applicable to the patients in your practice?Does the test perform differently (different LRs) for different severities of disease?
Does the test perform differently for populations with different mixes of competing conditions?

Will the test results change your management strategy?
What are the test and treatment thresholds for the health condition to be detected?
Are the test LRs high or low enough to shift posttest probability across a test or treatment threshold?

Will patients be better off as a result of the test?
Will patient care differ for different test results?
Will the anticipated changes in care do more good than harm?

Based on: Guyatt, G. Rennie, D. Meade, MO, Cook, DJ. Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Critical Review Form for Diagnosis Study

Evaluating the Validity of a Prognosis Study

Are the results Valid?

1. Was the sample of patients representative?

The patients groups should be clearly defined and representative of the spectrum of disease found in most practices. Failure to clearly define the patients who entered the study increases the risk that the sample is unrepresentative. To help you decide about the appropriateness of the sample, look for a clear description of which patients were included and excluded from a study. The way the sample was selected should be clearly specified, along with the objective criteria used to diagnose the patients with the disorder.

2. Were the patients sufficiently homogeneous with respect to prognostic factors?

Prognostic factors are characteristics of a particular patient that can be used to more accurately predict the course of a disease. These factors, which can be demographic (age, gender, race, etc.) or disease specific (e.g., stage of a tumor or disease) or comorbid (other conditions existing in the patient at the same time), can also help predict good or bad outcomes.

In comparing the prognosis of the 2 study groups, researchers should consider whether or not the patient’s clinical characteristics are similar. It may be that adjustments have to made based on prognostic factors to get a true picture of the clinical outcome. This may require clinical experience or knowledge of the underlying biology to determine if all relevant factors were considered.

3. Was the follow-up sufficiently complete?

Follow-up should be complete and all patients accounted for at the end of the study. Patients who are lost to follow-up may often suffer the adverse outcome of interest and therefore, if not accounted for, may bias the results of the study. Determining if the number of patients lost to follow up affects the validity depends on the proportion of patients lost and the proportion of patients suffering the adverse outcome.

Patients should be followed until they fully recover or one of the disease outcomes occur. The follow-up should be long enough to develop a valid picture of the extent of the outcome of interest. Follow-up should include at least 80% of participants until the occurrence of a major study end point or to the end of the study.

4. Were objective and unbiased outcome criteria used?

Some outcomes are clearly defined, such as death or full recovery. In between, can exist a wide range of outcomes that may be less clearly defined. Investigators should establish specific criteria that define each possible outcome of the disease and use these same criteria during patient follow-up. Investigators making judgments about the clinical outcomes may have to be “blinded” to the patient characteristics and prognostic factors in order to eliminate possible bias in their observations.

Key issues for Prognosis Studies:

well-defined sample
similar prognosis
follow-up complete
objective and unbias outcome criteria

What are the results?

How likely are the outcomes over time?

What are the event rates at different points in time?
If event rates vary with time, are the results shown using a survival curve?

How precise are the estimates of likelihood?

What is the confident interval for the principle event rate?
How do confidence intervals change over time?

Prognostic Results are the numbers of events that occur over time, expressed in:

absolute terms: e.g. 5 year survival rate
relative terms: e.g. risk from prognostic factor
survival curves: cumulative events over time

Critical Review Form for Prognosis Study

Evaluating the Validity of a Harm Study

Are the results of this article valid?

FOR COHORT STUDIES: Aside from the exposure of interest, did the exposed and control groups start and finish with the same risk for the outcome?

1. Were patients similar for prognostic factors that are known to be associated with the outcome (or did statistical adjustment level the playing field)?
The two groups, those exposed to the harm and those not exposed, must begin with the same prognosis. The characteristics of the exposed and non-exposed patients need to be carefully documented and their similarity (except for the exposure) needs to be demonstrated. The choice of comparison groups has a significant influence on the credibility of the study results. The researchers should identify an appropriate control population before making a strong inference about a harmful agent. The two groups should have the same baseline characteristics. If there are differences investigators should use statistical techniques to adjust or correct for differences.

2. Were the circumstances and methods for detecting the outcome similar?
In cohort studies determination of the outcome is critical. It is important to define the outcome and use objective measures to avoid possible bias. Detection bias may be an issue for these studies, as unblinded researchers may look deeper to detect disease or an outcome.

3. Was follow-up sufficiently complete?
Patients unavailable for complete follow-up may compromise the validity of the research because often these patients have very different outcomes than those that stayed with the study. This information must be factored into the study results.

FOR CASE CONTROL STUDIES: Did the cases and control group have the same risk (chance) of being exposed in the past?

1. Were cases and controls similar with respect to the indication or circumstances that would lead to exposure?
The characteristics of the cases and controls need to be carefully documented and their similarity needs to be demonstrated. The choice of comparison groups has a significant influence on the credibility of the study results. The researchers should identify an appropriate control population that would be eligible or likely to have the same exposure as the cases.

2. Were the circumstances and methods for determining exposure similar for cases and controls?
In a case control study determination of the exposure is critical. The exposure in the two groups should be identified by the same method. The identification should avoid any kind of bias, such as recall bias. Sometimes using objective data, such as medical records, or blinding the interviewer can help eliminate bias.

Key issues for Harm Studies:

similarity of comparison groups
outcomes and exposures measured same for both groups
follow-up of sufficient length (80% or better)

What are the results?

How strong is the association between exposure and outcome?
* What is the risk ratio or odds ratio?
* Is there a dose-response relationship between exposure and outcome?

How precise was the estimate of the risk?
* What is the confidence interval for the relative risk or odds ratio?

Strength of inference:

For RCT or Prospective cohort studies: Relative Risk

Cases

Outcome present

Controls

Outcome not present

Exposure Yes

Exposure No

Relative Risk (RR) = a /(a + b) / c/(c + d)
is the risk of the outcome in the exposed group divided by the risk of the outcome in the unexposed group:

RR = (exposed outcome yes / all exposed) / (not exposed outcome yes / all not exposed)

Example: “RR of 3.0 means that the outcome occurs 3 times more often in those exposed versus unexposed.”

For case-control or retrospective studies: Odds Ratio

Cases

Outcome present

Controls

Outcome not present

Exposure Yes

Exposure No

Odds Ratio (OR) = (a / c) / (b / d)
is the odds of previous exposure in a case divided by the odds of exposure in a control patient:

OR = (exposed - outcome yes / not exposed - outcome yes) / (exposed - outcome no / not exposed - outcome no)

Example: “OR of 3.0 means that cases were 3 times more likely to have been exposed than were control patients.”

Confidence Intervals are a measure of the precision of the results of a study. For example, “36 [95% CI 27-51]“, a 95%CI range means that if you were to repeat the same clinical trial a hundred times you can be sure that 95% of the time the results would fall within the calculated range of 27-51. Wider intervals indicate lower precision; narrow intervals show greater precision.

Confounding Variable is one whose influence distorts the true relationship between a potential risk factor and the clinical outcome of interest.

Read more on odds ratios: The odds ratio Douglas G Altman & J Martin Bland BMJ 2000;320:1468 (27 May)

How can I apply the results to patient care?

Were the study subjects similar to your patients or population?
Is your patient so different from those included in the study that the results may not apply?

Was the follow-up sufficiently long?
Were study participants followed-up long enough for important harmful effects to be detected?

Is the exposure similar to what might occur in your patient?
Are there important differences in exposures (dose, duration, etc) for your patients?

What is the magnitude of the risk?
What level of baseline risk for the harm is amplified by the exposure studied?

Are there any benefits known to be associated with the exposure?
What is the balance between benefits and harms for patients like yours?

Source: Guyatt, G. Rennie, D. Meade, MO, Cook, DJ. Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Critical Review Form for Harm Study

Evaluating the validity of a Systematic Review

Are the results of this article valid?

1. Did the review explicitly address a sensible question?

The systematic review should address a specific question that indicates the patient problem, the exposure and one or more outcomes. General reviews, which usually do not address specific questions, may be too broad to provide an answer to the clinical question for which you are seeking information.

2. Was the search for relevant studies detailed and exhaustive?

Researchers should conduct a thorough search of appropriate bibliographic databases. The databases and search strategies should be outlined in the methodology section. Researchers should also show evidence of searching for non-published evidence by contacting experts in the field. Cited references at the end of articles should also be checked.

3. Were the primary studies of high methodological quality?

Researchers should evaluate the validity of each study included in the systematic review. The same EBP criteria used to critically appraise studies should be used to evaluate studies to be included in the systematic review. Differences in study results may be explained by differences in methodology and study design.

4. Were selection and assessments of the included studies reproducible?

More than one researcher should evaluate each study and make decisions about its validity and inclusion. Bias (systematic errors) and mistakes (random errors) can be avoided when judgment is shared. A third reviewer should be available to break a tie vote.

Key issues for Systematic Reviews:

focused question
thorough literature search
include validated studies
selection of studies reproducible

What are the results?

Were the results similar from study to study?
How similar were the point estimates?
Do confidence intervals overlap between studies?

What are the overall results of the review?
Were results weighted both quantitatively and qualitatively in summary estimates?

How precise were the results?
What is the confidence interval for the summary or cumulative effect size?

More information on reading forest plots:

Ried K. Interpreting and understanding meta-analysis graphs: a practical
guide. Aust Fam Physician. 2006 Aug;35(8):635-8. PubMed PMID: 16894442.

Greenhalgh T. Papers that summarise other papers (systematic
reviews and meta-analyses). BMJ. 1997 Sep 13;315(7109):672-5.
PubMed PMID: 9310574.

How can I apply the results to patient care?

Were all patient-important outcomes considered?
Did the review omit outcomes that could change decisions?

Are any postulated subgroup effects credible?
Were subgroup differences postulated before data analysis?
Were subgroup differences consistent across studies?

What is the overall quality of the evidence?
Were prevailing study design, size, and conduct reflected in a summary of the quality of evidence?

Are the benefits worth the costs and potential risks?
Does the cumulative effect size cross a test or therapeutic threshold?

Based on: Guyatt, G. Rennie, D. Meade, MO, Cook, DJ. Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.

Critical Review Form for Overview Study