Critical appraisal of evidence
Appraising systematic reviews
Appraising randomised controlled trials
Appraising cohort studies
Appraising qualitative studies


Critical appraisal of the evidence

Critical appraisal (How good is the study)

Before we even consider critical appraisal and the quality of an article, it should be clear that every article should first be assessed according to its level in the ‘hierarchy of evidence’ as shown in Step 2. The hierarchy of evidence reflects the potential of each study to answer a particular type of question (i.e. treatment, diagnosis, prognosis, aetiology, screening intervention). However, each design within the hierarchy has its own inherent strengths and weaknesses, and their usefulness can vary according to what it is you want to find out.

Critical appraisal is the process of systematically examining research evidence to assess its validity and relevance before using it to inform a clinical decision (Hill & Spittlehouse 2003).  It can help clinicians identify the strengths and weaknesses of a research article and make a judgement on the usefulness of its findings.  Critical appraisal assists clinicians to decide whether a piece of reported research should be used to influence their clinical practice.  In other words, the underlying reason why one needs to critically appraise articles is to be able to make an informed decision as to whether the results of the paper are believable or not, and whether they are applicable to their clinical situation.  If you are to practice according to the principles of EBP, then it would make sense to underpin your practice with good quality evidence, rather than poor quality evidence which might prove to be biased and unreliable, as there is no point changing practice as a result of poor evidence.

Critical appraisal is typically done via the use of standard appraisal forms or checklists. The type of critical appraisal tool chosen to appraise a research article will depend on how the research was designed (i.e. study design). There are tools which are generic and can therefore be used for many different types of research designs. Reading the “overview” of each critical appraisal tool will provide information about its scope/coverage.

For a comprehensive list of critical appraisal tools, visit this site: International Centre for Allied Health Evidence repository of critical appraisal tools.


top

Appraising a Systematic Review

How to appraise a systematic review using the CASP (Critical Appraisal Skills Programme) tool

Screening Questions

Did the review address a clearly focused question?

Hint: An issue can be focused in terms: The population studied; The intervention (exposure) given; The outcome considered

This question determines the relevance of the study to your clinical practice.  Examine if the study clearly defined their population of interest, the nature of the intervention (or exposure) and the outcomes of interest.  If these are not clearly defined, you may find it difficult to determine which patients the results apply to, the intervention that the study proposes and whether this intervention produces outcomes which you (as a practitioner) and your patients/clients consider important.

Did the authors look for the appropriate sort of papers?  

Hint: The ‘best sort of studies’ would address the review’s question, and have an appropriate study design.

This question looks at the appropriateness of the study designs included in the systematic review in addressing the research question. For example, if the review is asking about the effectiveness of an intervention, a randomised controlled trial is considered a gold standard. On the other hand, if the review question is about prognosis or aetiology, then cohort studies are considered appropriate. For a guide as to which study designs are appropriate for a review question, you may refer to the ‘hierarchy of evidence’ table by NHMRC.

Your answer to these screening questions will help you determine whether or not it is worth continuing with the paper.  Answering yes to both questions means it is worth continuing, otherwise, you may have to search for another paper.

Detailed Questions

Do you think the important, relevant studies were included? Hint: Look for: which bibliographic databases were used; follow up from reference lists; personal contact with experts; search for unpublished as well as published studies; search for non-English language studies

Here you are looking at how thorough the search was done, and if other potentially important sources were explored. A good systematic review should have undertaken a comprehensive search of the literature and is unlikely to have missed any relevant articles.

Did the review’s authors do enough to assess the quality of the included studies?

The authors need to consider the rigour of the studies they have identified. Lack of rigour may affect the studies’ results.

If the results of the review have been combined, was it reasonable to do so?  Hint: Consider whetherthe results of each study are clearly displayedthe results were similar from study to study (look for tests of heterogeneity)the reasons for any variations in results are discussed

This question looks at the appropriateness of meta-analysis (i.e. a statistical technique for combining the findings from independent studies) in synthesising the results of a systematic review. If a meta-analysis was not undertaken and a narrative synthesis was done instead, this should be justified by the authors.

For more information about meta-analysis: Crombie I and Davies H. What is meta-analysis? 2009.

What are the overall results of the review?

Consider

if you are clear about the review’s bottom line results

what these are

how were the results expressed (odds ratio, mean difference, etc.)

How precise are the results? Hint: Look at the confidence intervals, if given (i.e. if a meta-analysis was undertaken)

Precision of the results can be determined based on the width of the confidence intervals. The confidence interval describes a range of values within which you can be reasonably sure that the true effect actually lies.  If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely.  If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater, although there may still be enough precision to make decisions about the utility of the intervention.

Intervals that are very wide (e.g. 0.50 to 1.10) indicate that there is little knowledge about the effect, and that further information is needed [Cochrane Library].The width of a confidence interval for a meta-analysis depends on the precision of the individual study estimates and on the number of studies combined. As more studies are added to a meta-analysis the width of the confidence interval usually decreases.  However, if the additional studies increase the heterogeneity in the meta-analysis and a random-effects model is used, it is possible that the confidence interval width will increase [Cochrane Library].

For more information about interpreting meta-analysis:

Ried K. Interpreting and understanding meta-analysis graphs: a practical guide. Aust Fam Physician 2006; 35(8): 635-638.

Can the results be applied to the local population?

Hint: Consider whether

the patients covered by the review could be sufficiently different to your population to cause concern

your local setting is likely to differ much from that of the review

Were all outcomes considered?

Hint: Consider whether the review considered the outcomes that the clinician and the patient are likely to view as important?

Are the benefits worth the harms and costs?

 

 


Activity

Read the following article and complete the CASP for systematic review tool.

Baker PRA, Francis DP, Soares J, Weightman AL, Foster C. Community wide interventions for increasing physical activity. Cochrane Database of Systematic Reviews 2011, Issue 4.

To check your appraisal, open the link below:

CASP Appraisal for Baker et al 2011


top

Appraising randomised controlled trials

How to appraise randomised controlled trials using the PEDro Scale

The PEDro scale is a tool for assessing the methodological quality of randomised controlled trials. Please click the link below to access the PEDro scale.  There are explicit criteria for addressing each of the questions in the scale.

PEDro scale: guidelines for appraisal


Activity

Read the following article and complete the PEDro scale.

Hansson EE, Jonsson-Lundgren M, Ronnheden AM, Sorensson E, Bjarnung A, Dahlberg LE. Effect of an education programme for patients with osteoarthritis in primary care — a randomized controlled trial. BMC Musculoskeletal Disorders 2010; 11(244).

To check your appraisal, open the link below:

PEDro Appraisal for Hansson et al 2010


How to appraise randomised controlled trials using the CASP (Critical Appraisal Skills Programme) tool

Screening Questions

Did the trial address a clearly focused question? Hint: An issue can be focused in terms of: The population studied; The intervention (exposure) given; The comparator given; The outcome considered

This question determines the relevance of the study to your clinical practice.  Examine if the study clearly defined their population of interest, the nature of the intervention (or exposure) and the outcomes of interest.  If these are not clearly defined, you may find it difficult to determine which patients the results apply to, the intervention that the study proposes and whether this intervention produces outcomes which you (as a practitioner) and your patients/clients consider important.

Was the assignment of patients to treatments randomized?

Randomisation minimises bias and confounding factors which adds to the strength of the study.

Why randomise: Beller EM, Gebski V, Keech AC. Randomisation in clinical trials. Med J Aust 2002; 177(10): 565-567.

Were all of the patients who entered the trial properly accounted for at its conclusion?

Hint:

  • Was follow up complete?
  • Were patients analysed in the groups to which they were randomised

Loss to follow up can lead to attrition bias in randomised trials.

Attrition bias: Dumville JC, Torgerson DJ, Hewitt CE. Reporting attrition in randomised controlled trials. BMJ2006 April; 332(7547): 969-971.

Your answer to these screening questions will help you determine whether or not it is worth continuing with the paper.  Answering yes to these questions means it is worth continuing, otherwise, you may have to search for another paper.

Detailed Questions

Were patients, health workers and study personnel ‘blind’ to treatment?  Hint: Look for: Were the patients blinded; Were the health workers blinded; Were the study personnel blinded?

This adds to the rigour of the study and minimises bias. Studies can be single, double or triple blinded. Triple blinding is often seen in pharmaceutical research but is difficult to execute in allied health research.

Blinding:

Day SJ and Altman DG. Blinding in clinical trials and other studies. BMJ 2000; 321:504.

Schulz KF and Grimes DA. Blinding in randomised trials: hiding who got what. Lancet 2002; 359:696-700.

Were the groups similar at the start of the trial?

This question helps determine whether both groups are equal at baseline –in terms of any factors that might have an effect on the outcome such as age, sex, social class, level of health/fitness, education.

Baseline imbalance: Roberts C and Torgerson DJ. Baseline imbalance in randomised controlled trials.BMJ 1999; 319:185.1.

Aside from the experimental intervention, were the groups treated equally?

This question addresses if there is any reason (apart from the intervention) that influences the performance of one group over the other.

How large was the treatment effect? What outcomes are measured?

Treatment effect in a randomized trial can be estimated from a sample using a comparison in mean outcomes for treated and untreated groups.

What is effect size: Using effcet size-or why the P value is not enough

How precise was the estimate of the treatment effect?  Hint: Look at the confidence intervals, if given.

Precision of the results can be determined based on the width of the confidence intervals. The confidence interval describes a range of values within which you can be reasonably sure that the true effect actually lies.  If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely.  If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater, although there may still be enough precision to make decisions about the utility of the intervention.  Intervals that are very wide (e.g. 0.50 to 1.10) indicate that there is little knowledge about the effect, and that further information is needed [Cochrane Library].

Can the results be applied to the local population?

Consider whether

  • the patients covered by the review could be sufficiently different to your population to cause concern
  • your local setting is likely to differ much from that of the review

Were all clinically important outcomes considered? If not, does this affect the decision?

Consider whether the review considered the outcomes that the clinician and the patient are likely to view as important?

Are the benefits worth the harms and costs?

This is unlikely to be addressed by the trial.  But what do you think?

 


Activity

Read the following article and complete the CASP scale.

Jull G, Trott P, Potter H, Niere K, Shirley D, Emberson J, Marschner I et al. A randomized controlled trial of exercise and manipulative therapy for cervicogenic headache. Spine 2002; 27(17):1835-1843.

To check your appraisal, open the link below:

CASP Appraisal for Jull et al 2002


top

Appraising cohort studies

How to appraise cohort studies using the CASP (Critical Appraisal Skills Programme) tool

Screening Questions

Did the study address a clearly focused issue? Hint: An issue can be focused in the following  terms: the population studied; the risk factors studied; the outcomes considered

is it clear whether the study tried to detect a beneficial or harmful effect?

This question determines the relevance of the study to your clinical practice.  Examine if the study clearly defined their population of interest and the outcomes of interest.  If these are not clearly defined, you may find it difficult to determine which patients the results apply to,  and whether the outcomes produced are ones which you (as a practitioner) and your patients/clients consider important.

Did the authors use an appropriate method to answer their question?

Consider:

  • is a cohort study a good way of answering the question under the circumstances?
  • did it address the study question?

Your answer to these screening questions will help you determine whether or not it is worth continuing with the paper.  Answering yes to both questions means it is worth continuing, otherwise, you may have to search for another paper.

Detailed Questions

Was the cohort recruited in an acceptable way?

Here you are looking for selection bias which might compromise the generalisability of the findings:

  • Was the cohort representative of a defined population?
  • Was there something special about the cohort?
  • Was everybody included who should have been included?

Was the exposure accurately measured to minimize bias?

For this question you are looking for measurement or classification bias:

  • Did they use subjective or objective measurements?
  • Do the measures truly reflect what you want them to (have they been validated)?
  • Were all the subjects classified into exposure groups using the same procedure?

Was the outcome accurately measured to minimize bias?

For this question you are looking for measurement or classification bias:

 

  • Did they use subjective or objective measurements?
  • Do the measures truly reflect what you want them to (have they been validated)?
  • Has a reliable system been established for detecting all the cases (for measuring disease occurrence)?
  • Were the measurement methods similar in the different groups?
  • Were the subjects and/or the outcome assessor blinded to exposure (does this matter)?

 

A. Have the authors identified all important confounding factors? List the ones you think might be important, that the author missed.  B.  Have they taken account of the confounding factors in the design and/or analysis?

Consider:

  • Looking for restriction in design, and techniques eg modelling, stratified-, regression-, or sensitivity
  • If analysis to correct, control or adjust for confounding factors was mentioned

What is a confounding factor: Dealing with confounding in the analysis

A. Was the follow up of subjects complete enough?

 B.  Was the follow up of subjects long enough?

Consider:

  • The good or bad effects should have had long enough to reveal themselves
  • The persons that are lost to follow-up may have different outcomes than those available for assessment
  • In an open or dynamic cohort, was there anything special about the outcome of the people leaving, or the exposure of the people entering the cohort?

What are the results of this study?

Consider:

 

  • What are the bottom line results?
  • Have they reported the rate or the proportion between the exposed/unexposed, the ratio/the rate difference?
  • How strong is the association between exposure and outcome (RR,)?
  • What is the absolute risk reduction (ARR)?

 

How precise are the results?

Are confidence intervals reported? Precision of the results can be determined based on the width of the confidence intervals. If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely.  If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater, although there may still be enough precision to make decisions about the utility of the intervention.  Intervals that are very wide (e.g. 0.50 to 1.10) indicate that there is little knowledge about the effect, and that further information is needed [Cochrane Library].

Do you believe the results?

Points to consider:

  • A big effect is hard to ignore!
  • Can it be due to bias, chance or confounding?
  • Are the design and methods of this study sufficiently flawed to make the results unreliable?

Can the results be applied to the local population?

Consider whether:

  • The subjects covered in the study could be sufficiently different from your population to cause concern
  • Your local setting is likely to differ much from that of the study
  • Can you quantify the local benefits and harms?

Do the results of this study fit with other available evidence?

Consider if the findings of this study contradict other studies on the same topic and how that would influence your application of the findings.


Activity

Read the following article and complete the PEDro scale.

Howard, JS., Sparkman, CR., Cohen, HG., Green, G., Stanislaw, H.  A comparison of intensive behaviour analytic and eclectic treatments for young children with autism. Res Dev Disabil 2005; 26(4):359.

To check your appraisal, open the link below:

CASP Appraisal for Howard 2005


top

Appraising qualitative studies

How to appraise qualitative studies using the CASP (Critical Appraisal Skills Programme) tool

Screening Questions

Was there a clear statement of the aims of the research?

Consider:

  • What the goal of the research was
  • Why is it important Its relevance to your practice and setting

Is a qualitative methodology appropriate?

Consider if the research seeks to interpret or illuminate the actions and/or subjective experiences of research participants

Your answer to these screening questions will help you determine whether or not it is worth continuing with the paper.  Answering yes to both questions means it is worth continuing, otherwise, you may have to search for another paper.

Detailed Questions

Was the research design appropriate to address the aims of the research?

Consider if the researcher has justified their choice of research design (e.g. have they discussed how they decided which method to use)?

Was the recruitment strategy appropriate to the aims of the research?

Consider:

  • If the researcher has explained how the participants were selected
  • If they explained why the participants they selected were the most appropriate to provide access to the type of knowledge sought by the study
  • If there are any discussions around recruitment (e.g. why some people chose not to take part)

Were the data collected in a way that addressed the research issue?

Consider:

  • If the setting for data collection was justified
  • If it is clear how data were collected (e.g. focus group, semi-structured interview etc.)
  • If the researcher has justified the methods chosen
  • If the researcher has made the methods explicit (e.g. for interview method, is there an indication of how interviews were conducted, or did they use a topic guide)?
  • If methods were modified during the study.  If so, has the researcher explained how and why?
  • If the form of data is clear (e.g. tape recordings, video material, notes etc.)
  • If the researcher has discussed saturation of data or not?

Has the relationship between researcher and participants been adequately considered?

Consider:

  • If the researcher critically examined their own role, potential bias and influence during:
    • Formulation of the research questions
    • Data collection, including sample recruitment and choice of location
  • How the researcher responded to events during the study and whether they considered the implications of any changes in the research design

Have ethical issues been taken into consideration?

Consider:

  • If there are sufficient details of how the research was explained to participants for the reader to assess whether ethical standards were maintained
  • If the researcher has discussed issues raised by the study (e.g. issues around informed consent or confidentiality or how they have handled the effects of the study on the participants during and after the study)
  • If approval has been sought from the ethics committee

Was the data analysis sufficiently rigorous?

Consider:

  • If there is an in-depth description of the analysis process
  • If thematic analysis is used. If so, is it clear how the categories/themes were derived from the data?
  • Whether the researcher explains how the data presented were selected from the original sample to demonstrate the analysis processIf sufficient data are presented to support  the findings
  • To what extent contradictory data are taken into account
  • Whether the researcher critically examined their own role, potential bias and influence during analysis and selection of data for presentation

Is there a clear statement of findings?

Consider:

  • If the findings are explicitIf there is adequate discussion of the evidence both for and against the researcher’s arguments
  • If the researcher has discussed the credibility of their findings (e.g. triangulation, respondent validation, more than one analyst)
  • If the findings are discussed in relation to the original research question

How valuable is the research?

Consider:

 

  • If the researcher discusses the contribution the study makes to existing knowledge or understanding e.g. do they consider the findings in relation to current practice or policy, or relevant research-based literature?
  • If they identify new areas where research is necessary
  • If the researchers have discussed whether or how the findings can be transferred to other populations or considered other ways the research may be used

 


Activity

Read the following article and complete the CASP scale.

Sommerseth R, Dysvik E. Health professionals’ experiences of person-centred collaboration in mental health care. Patient Prefer Adherence 2008; 2:259-269.

To check your appraisal, open the link below:

CASP Appraisal for Sommerseth & Dysvik 2008


top

<<Back to Step 2

Got to Step 4>>

EBP online home