Conducting a Literature Review

  • Literature Review
  • Developing a Topic
  • Planning Your Literature Review
  • Developing a Search Strategy
  • Managing Citations
  • Critical Appraisal Tools
  • Writing a Literature Review

Appraise Your Research Articles

The structure of a literature review should include the following :

  • An overview of the subject, issue, or theory under consideration, along with the objectives of the literature review,
  • Division of works under review into themes or categories [e.g. works that support a particular position, those against, and those offering alternative approaches entirely],
  • An explanation of how each work is similar to and how it varies from the others,
  • Conclusions as to which pieces are best considered in their argument, are most convincing of their opinions, and make the greatest contribution to the understanding and development of their area of research.

The critical evaluation of each work should consider :

  • Provenance  -- what are the author's credentials? Are the author's arguments supported by evidence [e.g. primary historical material, case studies, narratives, statistics, recent scientific findings]?
  • Methodology  -- were the techniques used to identify, gather, and analyze the data appropriate to addressing the research problem? Was the sample size appropriate? Were the results effectively interpreted and reported?
  • Objectivity  -- is the author's perspective even-handed or prejudicial? Is contrary data considered or is certain pertinent information ignored to prove the author's point?
  • Persuasiveness  -- which of the author's theses are most convincing or least convincing?
  • Value  -- are the author's arguments and conclusions convincing? Does the work ultimately contribute in any significant way to an understanding of the subject?

Reviewing the Literature

While conducting a review of the literature, maximize the time you devote to writing this part of your paper by thinking broadly about what you should be looking for and evaluating. Review not just what the articles are saying, but how are they saying it.

Some questions to ask:

  • How are they organizing their ideas?
  • What methods have they used to study the problem?
  • What theories have been used to explain, predict, or understand their research problem?
  • What sources have they cited to support their conclusions?
  • How have they used non-textual elements [e.g., charts, graphs, figures, etc.] to illustrate key points?
  • When you begin to write your literature review section, you'll be glad you dug deeper into how the research was designed and constructed because it establishes a means for developing more substantial analysis and interpretation of the research problem.

Tools for Critical Appraisal

Now, that you have found articles based on your research question you can appraise the quality of those articles. These are resources you can use to appraise different study designs.

Centre for Evidence Based Medicine (Oxford)

University of Glasgow

"AFP uses the Strength-of-Recommendation Taxonomy (SORT), to label key recommendations in clinical review articles."

  • SORT: Rating the Strength of Evidence    American Family Physician and other family medicine journals use the Strength of Recommendation Taxonomy (SORT) system for rating bodies of evidence for key clinical recommendations.

Seton Hall logo

  • The Interprofessional Health Sciences Library
  • 123 Metro Boulevard
  • Nutley, NJ 07110
  • [email protected]
  • Visiting Campus
  • News and Events
  • Parents and Families
  • Web Accessibility
  • Career Center
  • Public Safety
  • Accountability
  • Privacy Statements
  • Report a Problem
  • Login to LibApps

DistillerSR Logo

About Systematic Reviews

Choosing the Best Systematic Review Critical Appraisal Tool

critical appraisal tools literature reviews

Automate every stage of your literature review to produce evidence-based research faster and more accurately.

What is a critical appraisal.

Critical appraisal involves the evaluation of the quality, reliability, and relevance of studies, which is assessed based on quality measures specific to the research question, its related topics, design, methodology, data analysis, and the reporting of different types of systematic reviews .

Planning a critical appraisal starts with identifying or developing checklists. There are several critical appraisal tools that can be used to guide the process, adapting evaluation measures to be relevant to the specific research. It is important to pilot test these checklists and ensure that they are comprehensive enough to tackle all aspects of your systematic review.

What is the Purpose of a Critical Appraisal?

A critical appraisal is an integral part of a systematic review because it helps determine which studies can support the research. Here are some additional reasons why critical appraisals are important.

Assessing Quality

Critical appraisals employ measures specific to the systematic review. Through these, researchers can assess the quality of the studies—their trustworthiness, value, and reliability. This helps weed out substandard reviews, saving researchers’ time that would have been wasted reading full texts.

Determining Relevance

By appraising studies, researchers can determine whether or not they are relevant to the systematic review, such as if they’re connected to the topic or if their results support the research, etc. By doing this, the question “ Can you include a systematic review in a scoping review? ” can also be answered depending on its relevance to the study.

Identifying Flaws

Critical appraisals aim to identify methodological flaws in the literature, helping researchers and readers make informed decisions about the research evidence. They also help reduce the risk of bias when selecting studies.

What to Consider in a Critical Appraisal

Critical appraisals vary as they are specific to the topic, nature, and methodology of each systematic review. However, they generally have the same goal, trying to answer the following questions about the studies being considered:

  • Is the study relevant to the research question?
  • Is the study valid?
  • Did the study use appropriate methods to address the research question?
  • Does the study support the findings and evidence claims of the review?
  • Are the valid results of the study important?
  • Are the valid results of the study applicable to the research?

Learn More About DistillerSR

(Article continues below)

critical appraisal tools literature reviews

Critical Appraisal Tools

There are hundreds of tools and worksheets that can serve as a guide through the critical appraisal process. Here are just some of the most common ones to consider:

  • AMSTAR – to examine the effectiveness of interventions.
  • CASP – to appraise randomized control trials, systematic reviews, cohort studies, case-control studies, qualitative research, economic evaluations, diagnostic tests, and clinical prediction rules.
  • Cochrane Risk of Bias Tool – to assess the risk of bias of randomized control trials (RCTs).
  • GRADE – to grade the quality of evidence in healthcare research and policy.
  • JBI Critical Tools – to assess trustworthiness, relevance, and results of published papers.
  • NOS – to assess the quality of non-randomized studies in meta-analyses.
  • ROBIS – to assess the risk of bias in interventions, diagnosis, prognosis, and etiology.
  • STROBE – to address cohort, case-control, and conduct cross-sectional studies.

What is the Best Critical Appraisal Tool?

There is no single best critical appraisal tool for any study design, nor is there a generic one that can be expected to consistently do well when used across different study types.

Critical appraisal tools vary considerably in intent, components, and construction, and the right one for your systematic review is the one that addresses the components that you need to tackle and ensures that your research results in comprehensive, unbiased, and valid findings.

3 Reasons to Connect

critical appraisal tools literature reviews

University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

Critical appraisal.

  • Types of Reviews
  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Some reviews require a critical appraisal for each study that makes it through the screening process. This involves a risk of bias assessment and/or a quality assessment. The goal of these reviews is not just to find all of the studies, but to determine their methodological rigor, and therefore, their credibility.

"Critical appraisal is the balanced assessment of a piece of research, looking for its strengths and weaknesses and them coming to a balanced judgement about its trustworthiness and its suitability for use in a particular context." 1

It's important to consider the impact that poorly designed studies could have on your findings and to rule out inaccurate or biased work.

Selection of a valid critical appraisal tool, testing the tool with several of the selected studies, and involving two or more reviewers in the appraisal are good practices to follow.

1. Purssell E, McCrae N. How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students. 1st ed. Springer ;  2020.

Evaluation Tools

  • The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) was developed to address the issue of variability in the quality of practice guidelines.
  • Critical Appraisal Skills Programme (CASP) Checklists Critical Appraisal checklists for many different study types
  • Critical Review Form for Qualitative Studies Version 2, developed out of McMaster University
  • Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS) Downes MJ, Brennan ML, Williams HC, et al. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open 2016;6:e011458. doi:10.1136/bmjopen-2016-011458
  • Downs & Black Checklist for Assessing Studies Downs, S. H., & Black, N. (1998). The Feasibility of Creating a Checklist for the Assessment of the Methodological Quality Both of Randomised and Non-Randomised Studies of Health Care Interventions. Journal of Epidemiology and Community Health (1979-), 52(6), 377–384.
  • GRADE The Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group "has developed a common, sensible and transparent approach to grading quality (or certainty) of evidence and strength of recommendations."
  • Grade Handbook Full handbook on the GRADE method for grading quality of evidence.
  • MAGIC (Making GRADE the Irresistible choice) Clear succinct guidance in how to use GRADE
  • Joanna Briggs Institute. Critical Appraisal Tools "JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers." Includes checklists for 13 types of articles.
  • Latitudes Network This is a searchable library of validity assessment tools for use in evidence syntheses. This website also provides access to training on the process of validity assessment.
  • Mixed Methods Appraisal Tool A tool that can be used to appraise a mix of studies that are included in a systematic review - qualitative research, RCTs, non-randomized studies, quantitative studies, mixed methods studies.
  • RoB 2 Tool Higgins JPT, Sterne JAC, Savović J, Page MJ, Hróbjartsson A, Boutron I, Reeves B, Eldridge S. A revised tool for assessing risk of bias in randomized trials In: Chandler J, McKenzie J, Boutron I, Welch V (editors). Cochrane Methods. Cochrane Database of Systematic Reviews 2016, Issue 10 (Suppl 1). dx.doi.org/10.1002/14651858.CD201601.
  • ROBINS-I Risk of Bias for non-randomized (observational) studies or cohorts of interventions Sterne J A, Hernán M A, Reeves B C, Savović J, Berkman N D, Viswanathan M et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions BMJ 2016; 355 :i4919 doi:10.1136/bmj.i4919
  • Scottish Intercollegiate Guidelines Network. Critical Appraisal Notes and Checklists "Methodological assessment of studies selected as potential sources of evidence is based on a number of criteria that focus on those aspects of the study design that research has shown to have a significant effect on the risk of bias in the results reported and conclusions drawn. These criteria differ between study types, and a range of checklists is used to bring a degree of consistency to the assessment process."
  • The TREND Statement (CDC) Des Jarlais DC, Lyles C, Crepaz N, and the TREND Group. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: The TREND statement. Am J Public Health. 2004;94:361-366.
  • Assembling the Pieces of a Systematic Reviews, Chapter 8: Evaluating: Study Selection and Critical Appraisal.
  • How to Perform a Systematic Literature Review, Chapter: Critical Appraisal: Assessing the Quality of Studies.

Other library guides

  • Duke University Medical Center Library. Systematic Reviews: Assess for Quality and Bias
  • UNC Health Sciences Library. Systematic Reviews: Assess Quality of Included Studies
  • Last Updated: Apr 8, 2024 7:36 AM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

  • Search Menu
  • Advance Articles
  • Specialty Certificate Examination Cases
  • Cover Archive
  • Virtual Issues
  • Trending Articles
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • Reasons to Publish
  • Reviewer Guidelines
  • Self-Archiving Policy
  • About Clinical and Experimental Dermatology
  • About British Association of Dermatologists
  • Editorial Board
  • Advertising & Corporate Services
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Discussion and conclusions, learning points, funding sources, data availability, ethics statement, cpd questions, instructions for answering questions.

  • < Previous

How to critically appraise a systematic review: an aide for the reader and reviewer

ORCID logo

Conflicts of interest H.W. founded the Cochrane Skin Group in 1987 and was coordinator editor until 2018. The other authors declare they have no conflicts of interest.

  • Article contents
  • Figures & tables
  • Supplementary Data

John Frewen, Marianne de Brito, Anjali Pathak, Richard Barlow, Hywel C Williams, How to critically appraise a systematic review: an aide for the reader and reviewer, Clinical and Experimental Dermatology , Volume 48, Issue 8, August 2023, Pages 854–859, https://doi.org/10.1093/ced/llad141

  • Permissions Icon Permissions

The number of published systematic reviews has soared rapidly in recent years. Sadly, the quality of most systematic reviews in dermatology is substandard. With the continued increase in exposure to systematic reviews, and their potential to influence clinical practice, we sought to describe a sequence of useful tips for the busy clinician reader to determine study quality and clinical utility. Important factors to consider when assessing systematic reviews include: determining the motivation to performing the study, establishing if the study protocol was prepublished, assessing quality of reporting using the PRISMA checklist, assessing study quality using the AMSTAR 2 critical appraisal checklist, assessing for evidence of spin, and summarizing the main strengths and limitations of the study to determine if it could change clinical practice. Having a set of heuristics to consider when reading systematic reviews serves to save time, enabling assessment of quality in a structured way, and come to a prompt conclusion of the merits of a review article in order to inform the care of dermatology patients.

A systematic review aims to systematically and transparently summarize the available data on a defined clinical question, via a rigorous search for studies, a critique of the quality of included studies and a qualitative and/or quantitative synthesis. 1 Systematic reviews are at the top of the pyramid in most evidence hierarchies for informing evidence-based healthcare as they are considered of greater validity and clinical applicability than those study types lower down, such as case series or individual trials. 2

A good systematic review should provide an unbiased overview of studies to inform clinical practice. Systematic reviews can reconcile apparently conflicting results, add precision to estimating smaller treatment effects, highlight the evidence’s limitations and biases and identify research gaps. Guidelines are available to assist systematic reviewers to transparently report why the review was done, the authors’ methods and findings via the PRISMA checklist. 3

The sharp rise in systematic review publications over time raises concern that the majority are unnecessary, misleading and/or conflicted. 4 A review of dermatology systematic reviews noted that 93% failed to report at least one PRISMA checklist item. 5 Another review of a random sample of 140/732 dermatology systematic reviews in 2017 found 90% were low quality. 6 Some improvements have occurred: reporting standards compliance has improved slightly (between 2013 and 2017), 5 and several leading dermatology journals including the British Journal of Dermatology have changed editorial policies, mandating authors to preregister review protocols.

Given the surge in poor-quality systematic review publications, we sought to describe a checklist of seven practical tips from the authors’ collective experience of writing and critically appraising systematic reviews, hoping that they will assist busy clinicians to critically appraise systematic reviews both as manuscript reviewers and as readers and research users.

Read the abstract to develop a sense of the subject.

What was the motivation for completing the review?

Has the review protocol been published and have changes been made to it.

Review the reporting quality .

Review the quality of the article and the depth of the review question.

Consider the authors’ interpretation and assess for spin .

Summarize and come to a position .

Read the abstract to develop a sense of the subject

From the abstract, use the PICO (population, intervention, comparator and outcome) framework to establish if the subject, intervention and outcomes are relevant to clinical practice. Is the review question clear and appropriate?

Inspect the authors’ conflicts of interest and funding sources. Self-disclosed financial conflicts are often insufficiently described or not declared at all. 7 If you suspect conflicts for authors with no stated conflicts, briefly searching the senior authors’ names on PubMed, or the Open Payments website (for US authors) may reveal hidden conflicts. 8 Is the motivation for the systematic review justified in the introduction? Can new insights be formed by combining studies? If the systematic review is an update, what new available data justifies this? Search for similar recent systematic reviews (which may have been omitted intentionally). Is it a redundant duplicate review that adds little new useful information? 9 Has the author recently published reviews on similar subjects? Salami publications refer to authors chopping up a topic into smaller pieces to obtain maximum publications. 10

Search PROSPERO for publication of the review protocol. 11 A prepublished review protocol in a publicly accessible site offers reassurance that the systematic review followed a clear plan with prespecified PICO elements. Put bluntly, it reduces authors’ opportunity for deception by selective analysis and highlighting of results that are more likely to get published. If a protocol is found, assess deviation from this protocol and justification, if present. Protocol registration allows improved PRISMA reporting. 12 A registered protocol with reporting of deviations allows the reader to judge whether any modifications are justified, for example adjusting for unexpected challenges during analysis. 10

Review the reporting quality

Look for supplementary material detailing the PRISMA checklist. Commonly under-reported PRISMA items include protocol and registration, risk of bias across studies, risk of bias in individual studies, the data collection process and review objectives. 5 Adequate reporting quality using PRISMA does not necessarily indicate the review is clinically useful; however, it allows the reader to assess the study’s utility (see Table 1 ). Additional assessments of review quality are described below.

The relationship between systematic review reporting quality and study quality a

Adapted with permission from Williams. 21

Review the quality of the article and the depth of the review question

Distinct from quality of reporting completeness, assessing the review's quality allows for assessment of the overall clinical meaningfulness of the results. Does the PICO make sense in respect to this? The AMSTAR 2 critical appraisal instrument is useful in determining quantitative systematic review quality. 13 This checklist marks the key aspects of a systematic review and computes an outcome of the review quality. 14 If meta-analysis was performed, did the authors justify and use appropriate methods for statistical combination of results? Were weighted techniques used to combine results and adjusted for heterogeneity, if present? If heterogeneity was present, were sources of this investigated? Did authors assess the potential impact of the individual study’s risk of bias (RoB) and perform analysis to investigate the impact of RoB on the summary estimates of affect? See Table 2 for an example of a completed AMSTAR 2 checklist on a recently published poor-quality systematic review. 15

An example of assessment of the quality of a systematic review (Drake et al. ) 15 using the AMSTAR 2 checklist an explanation of which can be found at https://amstar.ca/Amstar-2.php

N/A, not applicable; PICO, population, intervention, comparator and outcome. a Denotes AMSTAR 2 critical domain. The overall confidence in the results of the review is dependent on such critical domains. When one critical domain is not satisfied, the confidence is rated as ‘low’ and the review may not provide an accurate and comprehensive summary of the available studies that address the question of interest. When more than one critical domain are not satisfied, the confidence in the results of the review is rated as ‘critically low’ and the review should not be relied on to provide an accurate and comprehensive summary of the available studies.

Quality checklists for assessment of qualitative research include Consolidated Criteria for Reporting Qualitative research (COREQ), Standards for Reporting Qualitative Research (SRQR) and Critical Appraisal Skills Programme (CASP). 16 Such checklists aim to improve identification of high-quality qualitative research in journal articles, as well as acting as a guide for conducting research. 16

Consider the authors’ interpretation and assess for spin

Spin is a distorted interpretation of results. This manifests itself in studies as (i) misleading reporting, (ii) misleading interpretation, and (iii) inappropriate extrapolation. 14 Are the conclusion’s clinical practice recommendations not supported by the studies’ findings? Is the title misleading? Is there selective reporting? These are the three most severe forms of spin occurring in systematic reviews. 17

Summarize and come to a position

Summarize the reviews main positives and negatives and establish if there is sufficient quality to merit changing clinical practice, or are fatal flaws present that nullify the review’s clinical utility? Consider internal validity (are the results true?) and external validity (are the results applicable to my patient group?). When applying the systematic review results to a particular patient, it may help to consider these points: (i) how similar are the study participants to my patient?; (ii) do the outcomes make sense to me?; (iii) what was the magnitude of treatment benefit? – work out the number needed to treat; 18 (iv) what are the adverse events?; and (v) what are my patient's values and preferences? 19

Although systematic reviews have potential for summarizing evidence for dermatological interventions in a systematic and unbiased way, the rapid expansion of poorly reported and poor-quality reviews (Table 3 ) is regrettable. We do not claim our checklist items (Table 4 ) are superior to other checklists such as those suggested by CASP, 20 but they are based on the practical experience of critical appraisal of dermatology systematic reviews conducted by the authors.

The top seven ‘sins’ of dermatology systematic reviews a

Adapted with permission from Williams. 10

Checklist of questions, considerations and tips for critical appraisal of systematic reviews

NNT, number needed to treat.

Considering each question suggested in our checklist when faced with yet another systematic review draws a timely conclusion on its quality and application to clinical practice, when acting as a reviewer or reader. Although the checklist may sound exhaustive and time-consuming, we recommend cutting it short if there are major red flags early on, such as absence of a protocol or assessment of RoB. Given the growing number of systematic reviews, having an efficient and succinct aide for appraising articles saves the reader time and energy, while simplifying the decision regarding what merits a change in clinical practice. Our intention is not to criticize others’ well-intentioned efforts, but to improve standards of reliable evidence to inform patient care.

Systematic reviews of randomized controlled trials offer one of the best methods to summarize the evidence surrounding therapeutic interventions for skin conditions.

The number of systematic reviews in the dermatology literature is increasing rapidly.

The quality of dermatology systematic reviews is generally poor.

We describe a checklist for the busy clinician or reviewer to consider when faced with a systematic review.

Key factors to consider include: determining the review motivation, establishing if the study protocol was prepublished, assessing quality of reporting and study quality using PRISMA, and AMSTAR 2 critical appraisal checklists, and assessing for evidence of spin.

Summarizing the main qualities and limitations of a systematic review will help to determine if the review is robust enough to potentially change clinical practice for patient benefit.

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

No new data generated.

Ethical approval: not applicable. Informed consent: not applicable.

Moher D , Liberati A , Tetzlaff J et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . Ann Intern Med 2009 ; 151 : 264 – 9 .

Google Scholar

Murad MH , Asi N , Alsawas M , Alahdab F . New evidence pyramid . Evid Based Med 2016 ; 21 : 125 – 7 .

Page MJ , McKenzie JE , Bossuyt PM et al.  The PRISMA 2020 statement: an updated guideline for reporting systematic reviews . BMJ 2021 ; 372 : n71 .

Ioannidis JP . The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses . Milbank Q 2016 ; 94 : 485 – 514 .

Croitoru DO , Huang Y , Kurdina A et al.  Quality of reporting in systematic reviews published in dermatology journals . Br J Dermatol 2020 ; 182 : 1469 – 76 .

Smires S , Afach S , Mazaud C et al.  Quality and reporting completeness of systematic reviews and meta-analyses in dermatology . J Invest Dermatol 2021 ; 141 : 64 – 71 .

Baraldi JH , Picozzo SA , Arnold JC et al.  A cross-sectional examination of conflict-of-interest disclosures of physician-authors publishing in high-impact US medical journals . BMJ Open 2022 ; 12 : e057598 .

Centers for Medicare & Medicaid Services . Open Payments Search Tool. About. Available at : https://openpaymentsdata.cms.gov/about (last accessed 22 April 2023).

Guelimi R , Afach S , Régnaux JP et al.  Overlapping network meta-analyses on psoriasis systemic treatments, an overview: quantity does not make quality . Br J Dermatol 2022 ; 187 : 29 – 41 .

Williams HC . Are dermatology systematic reviews spinning out of control? Dermatology 2021 ; 237 : 493 – 5 .

National Institute for Health and Care Research . About Prospero. Available at : https://www.crd.york.ac.uk/prospero/#aboutpage (last accessed 22 April 2023).

Barbieri JS , Wehner MR . Systematic reviews in dermatology: opportunities for improvement . Br J Dermatol 2020 ; 182 : 1329 – 30 .

Shea BJ , Reeves BC , Wells G et al.  AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both . BMJ 2017 ; 358 : j4008 .

AMSTAR . AMSTAR checklist. Available at : https://amstar.ca/Amstar_Checklist.php (last accessed 22 April 2023).

Drake L , Reyes-Hadsall S , Martinez J et al.  Evaluation of the safety and effectiveness of nutritional supplements for treating hair loss: a systematic review . JAMA Dermatol 2023 ; 159 : 79 – 86 .

Stenfors T , Kajamaa A , Bennett D . How to … assess the quality of qualitative research . Clin Teach 2020 ; 17 : 596 – 9 .

Yavchitz A , Ravaud P , Altman DG et al.  A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity . J Clin Epidemiol 2016 ; 75 : 56 – 65 .

Manriquez JJ , Villouta MF , Williams HC . Evidence-based dermatology: number needed to treat and its relation to other risk measures . J Am Acad Dermatol 2007 ; 56 : 664 – 71 .

Williams HC . Applying trial evidence back to the patient . Arch Dermatol 2003 ; 139 : 1195 – 200 .

CASP . CASP checklists. Available at : https://casp-uk.net/casp-tools-checklists/  (last accessed 22 April 2023).

Williams HC . Cars, CONSORT 2010, and clinical practice . Trials 2010 ; 11 : 33 .

Learning objective

To demonstrate up-to-date knowledge on assessing systematic reviews.

Which of the following critical appraisal checklists is useful for assessment of items that should be reported in a systematic review?

Which one of the following statements is correct?

The number of published systematic reviews in the dermatology literature is falling.

The quality of published dermatology systematic reviews is generally very good.

Publishing details of the PRISMA checklist in a systematic review indicates that the study quality is high.

External validity refers to the applicability of results to your patient group.

Internal validity refers to the applicability of results to your patient group.

Spin in systematic reviews can be described by which one of the following measures?

Authors declaring all conflicts of interest.

Title suggesting beneficial effect not supported by findings.

Adequate reporting of study limitations.

Conclusion formulating recommendations for clinical practice supported by findings.

Reporting a departure from study protocol that may modify interpretation of results.

PICO stands for which of the following.

PubMed, inclusion, comparator, outcome.

Population, items, comparator, outcome.

Population, intervention, context, observations.

Protocol, intervention, certainty, outcome.

Population, intervention, comparator, outcome.

Publication of a systematic review study protocol can be found at which source?

Cochrane Library.

ClinicalTrials.gov.

This learning activity is freely available online at https://oupce.rievent.com/a/TWWDCK

Users are encouraged to

Read the article in print or online, paying particular attention to the learning points and any author conflict of interest disclosures.

Reflect on the article.

Register or login online at https://oupce.rievent.com/a/TWWDCK and answer the CPD questions.

Complete the required evaluation component of the activity.

Once the test is passed, you will receive a certificate and the learning activity can be added to your RCP CPD diary as a self-certified entry.

This activity will be available for CPD credit for 5 years following its publication date. At that time, it will be reviewed and potentially updated and extended for an additional period.

Author notes

Email alerts, citing articles via.

  • Recommend to Your Librarian
  • Advertising and Corporate Services
  • Journals Career Network

Affiliations

  • Online ISSN 1365-2230
  • Copyright © 2024 British Association of Dermatologists
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Open access
  • Published: 08 June 2023

Guidance to best tools and practices for systematic reviews

  • Kat Kolaski 1 ,
  • Lynne Romeiser Logan 2 &
  • John P. A. Ioannidis 3  

Systematic Reviews volume  12 , Article number:  96 ( 2023 ) Cite this article

18k Accesses

13 Citations

78 Altmetric

Metrics details

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 , 5 , 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 , 19 , 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 , 24 , 25 , 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 , 33 , 34 , 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 , 45 , 46 , 47 , 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 , 50 , 51 , 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 , 56 , 57 , 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

figure 1

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 , 85 , 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 , 100 , 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 , 104 , 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 , 150 , 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 , 152 , 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 , 166 , 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 , 185 , 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Muka T, Glisic M, Milic J, Verhoog S, Bohlius J, Bramer W, et al. A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. Eur J Epidemiol. 2020;35(1):49–60.

Article   PubMed   Google Scholar  

Thomas J, McDonald S, Noel-Storr A, Shemilt I, Elliott J, Mavergames C, et al. Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for cochrane reviews. J Clin Epidemiol. 2021;133:140–51.

Article   PubMed   PubMed Central   Google Scholar  

Fontelo P, Liu F. A review of recent publication trends from top publishing countries. Syst Rev. 2018;7(1):147.

Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:1–7.

Article   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358: j4008.

Goldkuhle M, Narayan VM, Weigl A, Dahm P, Skoetz N. A systematic assessment of Cochrane reviews and systematic reviews published in high-impact medical journals related to cancer. BMJ Open. 2018;8(3): e020869.

Ho RS, Wu X, Yuan J, Liu S, Lai X, Wong SY, et al. Methodological quality of meta-analyses on treatments for chronic obstructive pulmonary disease: a cross-sectional study using the AMSTAR (Assessing the Methodological Quality of Systematic Reviews) tool. NPJ Prim Care Respir Med. 2015;25:14102.

Tsoi AKN, Ho LTF, Wu IXY, Wong CHL, Ho RST, Lim JYY, et al. Methodological quality of systematic reviews on treatments for osteoporosis: a cross-sectional study. Bone. 2020;139(June): 115541.

Arienti C, Lazzarini SG, Pollock A, Negrini S. Rehabilitation interventions for improving balance following stroke: an overview of systematic reviews. PLoS ONE. 2019;14(7):1–23.

Kolaski K, Romeiser Logan L, Goss KD, Butler C. Quality appraisal of systematic reviews of interventions for children with cerebral palsy reveals critically low confidence. Dev Med Child Neurol. 2021;63(11):1316–26.

Almeida MO, Yamato TP, Parreira PCS, do Costa LOP, Kamper S, Saragiotto BT. Overall confidence in the results of systematic reviews on exercise therapy for chronic low back pain: a cross-sectional analysis using the Assessing the Methodological Quality of Systematic Reviews (AMSTAR) 2 tool. Braz J Phys Ther. 2020;24(2):103–17.

Mayo-Wilson E, Ng SM, Chuck RS, Li T. The quality of systematic reviews about interventions for refractive error can be improved: a review of systematic reviews. BMC Ophthalmol. 2017;17(1):1–10.

Matthias K, Rissling O, Pieper D, Morche J, Nocon M, Jacobs A, et al. The methodological quality of systematic reviews on the treatment of adult major depression needs improvement according to AMSTAR 2: a cross-sectional study. Heliyon. 2020;6(9): e04776.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Riado Minguez D, Kowalski M, Vallve Odena M, Longin Pontzen D, Jelicic Kadic A, Jeric M, et al. Methodological and reporting quality of systematic reviews published in the highest ranking journals in the field of pain. Anesth Analg. 2017;125(4):1348–54.

Churuangsuk C, Kherouf M, Combet E, Lean M. Low-carbohydrate diets for overweight and obesity: a systematic review of the systematic reviews. Obes Rev. 2018;19(12):1700–18.

Article   CAS   PubMed   Google Scholar  

Storman M, Storman D, Jasinska KW, Swierz MJ, Bala MM. The quality of systematic reviews/meta-analyses published in the field of bariatrics: a cross-sectional systematic survey using AMSTAR 2 and ROBIS. Obes Rev. 2020;21(5):1–11.

Franco JVA, Arancibia M, Meza N, Madrid E, Kopitowski K. [Clinical practice guidelines: concepts, limitations and challenges]. Medwave. 2020;20(3):e7887 ([Spanish]).

Brito JP, Tsapas A, Griebeler ML, Wang Z, Prutsky GJ, Domecq JP, et al. Systematic reviews supporting practice guideline recommendations lack protection against bias. J Clin Epidemiol. 2013;66(6):633–8.

Zhou Q, Wang Z, Shi Q, Zhao S, Xun Y, Liu H, et al. Clinical epidemiology in China series. Paper 4: the reporting and methodological quality of Chinese clinical practice guidelines published between 2014 and 2018: a systematic review. J Clin Epidemiol. 2021;140:189–99.

Lunny C, Ramasubbu C, Puil L, Liu T, Gerrish S, Salzwedel DM, et al. Over half of clinical practice guidelines use non-systematic methods to inform recommendations: a methods study. PLoS ONE. 2021;16(4):1–21.

Faber T, Ravaud P, Riveros C, Perrodeau E, Dechartres A. Meta-analyses including non-randomized studies of therapeutic interventions: a methodological review. BMC Med Res Methodol. 2016;16(1):1–26.

Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485–514.

Møller MH, Ioannidis JPA, Darmon M. Are systematic reviews and meta-analyses still useful research? We are not sure. Intensive Care Med. 2018;44(4):518–20.

Moher D, Glasziou P, Chalmers I, Nasser M, Bossuyt PMM, Korevaar DA, et al. Increasing value and reducing waste in biomedical research: who’s listening? Lancet. 2016;387(10027):1573–86.

Barnard ND, Willet WC, Ding EL. The misuse of meta-analysis in nutrition research. JAMA. 2017;318(15):1435–6.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

Page MJ, Shamseer L, Altman DG, Tetzlaff J, Sampson M, Tricco AC, et al. Epidemiology and reporting characteristics of systematic reviews of biomedical research: a cross-sectional study. PLoS Med. 2016;13(5):1–31.

World Health Organization. WHO handbook for guideline development, 2nd edn. WHO; 2014. Available from: https://www.who.int/publications/i/item/9789241548960 . Cited 2022 Jan 20

Higgins J, Lasserson T, Chandler J, Tovey D, Thomas J, Flemying E, et al. Methodological expectations of Cochrane intervention reviews. Cochrane; 2022. Available from: https://community.cochrane.org/mecir-manual/key-points-and-introduction . Cited 2022 Jul 19

Cumpston M, Chandler J. Chapter II: Planning a Cochrane review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 30

Henderson LK, Craig JC, Willis NS, Tovey D, Webster AC. How to write a cochrane systematic review. Nephrology. 2010;15(6):617–24.

Page MJ, Altman DG, Shamseer L, McKenzie JE, Ahmadzai N, Wolfe D, et al. Reproducible research practices are underused in systematic reviews of biomedical interventions. J Clin Epidemiol. 2018;94:8–18.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. AMSTAR 2 overall confidence rating: lacking discriminating capacity or requirement of high methodological quality? J Clin Epidemiol. 2020;119:142–4.

Posadzki P, Pieper D, Bajpai R, Makaruk H, Könsgen N, Neuhaus AL, et al. Exercise/physical activity and health outcomes: an overview of Cochrane systematic reviews. BMC Public Health. 2020;20(1):1–12.

Wells G, Shea B, O’Connell D, Peterson J, Welch V, Losos M. The Newcastile-Ottawa Scale (NOS) for assessing the quality of nonrandomized studies in meta-analyses. The Ottawa Hospital; 2009. Available from: https://www.ohri.ca/programs/clinical_epidemiology/oxford.asp . Cited 2022 Jul 19

Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010;25(9):603–5.

Stang A, Jonas S, Poole C. Case study in major quotation errors: a critical commentary on the Newcastle-Ottawa scale. Eur J Epidemiol. 2018;33(11):1025–31.

Ioannidis JPA. Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018;33(11):1021–3.

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022;144:22–42.

Crequit P, Boutron I, Meerpohl J, Williams H, Craig J, Ravaud P. Future of evidence ecosystem series: 2. Current opportunities and need for better tools and methods. J Clin Epidemiol. 2020;123:143–52.

Shemilt I, Noel-Storr A, Thomas J, Featherstone R, Mavergames C. Machine learning reduced workload for the cochrane COVID-19 study register: development and evaluation of the cochrane COVID-19 study classifier. Syst Rev. 2022;11(1):15.

Nguyen P-Y, Kanukula R, McKensie J, Alqaidoom Z, Brennan SE, Haddaway N, et al. Changing patterns in reporting and sharing of review data in systematic reviews with meta-analysis of the effects of interventions: a meta-research study. medRxiv; 2022 Available from: https://doi.org/10.1101/2022.04.11.22273688v3 . Cited 2022 Nov 18

Afshari A, Møller MH. Broken science and the failure of academics—resignation or reaction? Acta Anaesthesiol Scand. 2018;62(8):1038–40.

Butler E, Granholm A, Aneman A. Trustworthy systematic reviews–can journals do more? Acta Anaesthesiol Scand. 2019;63(4):558–9.

Negrini S, Côté P, Kiekens C. Methodological quality of systematic reviews on interventions for children with cerebral palsy: the evidence pyramid paradox. Dev Med Child Neurol. 2021;63(11):1244–5.

Page MJ, Moher D. Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? Milbank Q. 2016;94(3):515–9.

Clarke M, Chalmers I. Reflections on the history of systematic reviews. BMJ Evid Based Med. 2018;23(4):121–2.

Alnemer A, Khalid M, Alhuzaim W, Alnemer A, Ahmed B, Alharbi B, et al. Are health-related tweets evidence based? Review and analysis of health-related tweets on twitter. J Med Internet Res. 2015;17(10): e246.

PubMed   PubMed Central   Google Scholar  

Haber N, Smith ER, Moscoe E, Andrews K, Audy R, Bell W, et al. Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): a systematic review. PLoS ONE. 2018;13(5): e196346.

Swetland SB, Rothrock AN, Andris H, Davis B, Nguyen L, Davis P, et al. Accuracy of health-related information regarding COVID-19 on Twitter during a global pandemic. World Med Heal Policy. 2021;13(3):503–17.

Nascimento DP, Almeida MO, Scola LFC, Vanin AA, Oliveira LA, Costa LCM, et al. Letter to the editor – not even the top general medical journals are free of spin: a wake-up call based on an overview of reviews. J Clin Epidemiol. 2021;139:232–4.

Ioannidis JPA, Fanelli D, Dunne DD, Goodman SN. Meta-research: evaluation and improvement of research methods and practices. PLoS Biol. 2015;13(10):1–7.

Munn Z, Stern C, Aromataris E, Lockwood C, Jordan Z. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. BMC Med Res Methodol. 2018;18(1):1–9.

Pollock M, Fernandez R, Becker LA, Pieper D, Hartling L. Chapter V: overviews of reviews. Cochrane handbook for systematic reviews of interventions. In:  Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-v . Cited 2022 Mar 7

Tricco AC, Lillie E, Zarin W, O’Brien K, Colquhoun H, Kastner M, et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16(1):1–10.

Garritty C, Gartlehner G, Nussbaumer-Streit B, King VJ, Hamel C, Kamel C, et al. Cochrane rapid reviews methods group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13–22.

Elliott JH, Synnot A, Turner T, Simmonds M, Akl EA, McDonald S, et al. Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol. 2017;91:23–30.

Higgins JPT, Thomas J, Chandler J. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 25

Aromataris E, Munn Z. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 15]. Available from: https://synthesismanual.jbi.global .

Tufanaru C, Munn Z, Aromartaris E, Campbell J, Hopp L. Chapter 3: Systematic reviews of effectiveness. In Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 25]. Available from: https://synthesismanual.jbi.global .

Leeflang MMG, Davenport C, Bossuyt PM. Defining the review question. In: Deeks JJ, Bossuyt PM, Leeflang MMG, Takwoingi Y, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/6-defining-review-question .

Noyes J, Booth A, Cargo M, Flemming K, Harden A, Harris J, et al.Qualitative evidence. In: Higgins J, Tomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/handbook/current/chapter-21#section-21-5 .

Lockwood C, Porritt K, Munn Z, Rittenmeyer L, Salmond S, Bjerrum M, et al. Chapter 2: Systematic reviews of qualitative evidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jul 11]. Available from: https://synthesismanual.jbi.global .

Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460.

Moola S, Munn Z, Tufanaru C, Aromartaris E, Sears K, Sfetcu R, et al. Systematic reviews of etiology and risk. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.

Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–57.

Munn Z, Moola S, Lisy K, Riitano D, Tufanaru C. Chapter 5: Systematic reviews of prevalence and incidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Centre for Evidence-Based Medicine. Study designs. CEBM; 2016. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/study-designs . Cited 2022 Aug 30

Hartling L, Bond K, Santaguida PL, Viswanathan M, Dryden DM. Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy. J Clin Epidemiol. 2011;64(8):861–71.

Crowe M, Sheppard L, Campbell A. Reliability analysis for a proposed critical appraisal tool demonstrated value for diverse research designs. J Clin Epidemiol. 2012;65(4):375–83.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels. J Clin Epidemiol. 2017;89:30–42.

Reeves BC, Deeks JJ, Higgins JPT, Shea B, Tugwell P, Wells GA. Chapter 24: including non-randomized studies on intervention effects.  In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-24 . Cited 2022 Mar 1

Reeves B. A framework for classifying study designs to evaluate health care interventions. Forsch Komplementarmed Kl Naturheilkd. 2004;11(Suppl 1):13–7.

Google Scholar  

Rockers PC, Røttingen J, Shemilt I. Inclusion of quasi-experimental studies in systematic reviews of health systems research. Health Policy. 2015;119(4):511–21.

Mathes T, Pieper D. Clarifying the distinction between case series and cohort studies in systematic reviews of comparative studies: potential impact on body of evidence and workload. BMC Med Res Methodol. 2017;17(1):8–13.

Jhangiani R, Cuttler C, Leighton D. Single subject research. In: Jhangiani R, Cuttler C, Leighton D, editors. Research methods in psychology, 4th edn. Pressbooks KPU; 2019. Available from: https://kpu.pressbooks.pub/psychmethods4e/part/single-subject-research/ . Cited 2022 Aug 15

Higgins JP, Ramsay C, Reeves BC, Deeks JJ, Shea B, Valentine JC, et al. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):12–25.

Cumpston M, Lasserson T, Chandler J, Page M. 3.4.1 Criteria for considering studies for this review, Chapter III: Reporting the review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-iii#section-iii-3-4-1 . Cited 2022 Oct 12

Kooistra B, Dijkman B, Einhorn TA, Bhandari M. How to design a good case series. J Bone Jt Surg. 2009;91(Suppl 3):21–6.

Murad MH, Sultan S, Haffar S, Bazerbachi F. Methodological quality and synthesis of case series and case reports. Evid Based Med. 2018;23(2):60–3.

Robinson K, Chou R, Berkman N, Newberry S, FU R, Hartling L, et al. Methods guide for comparative effectiveness reviews integrating bodies of evidence: existing systematic reviews and primary studies. AHRQ; 2015. Available from: https://archive.org/details/integrating-evidence-report-150226 . Cited 2022 Aug 7

Tugwell P, Welch VA, Karunananthan S, Maxwell LJ, Akl EA, Avey MT, et al. When to replicate systematic reviews of interventions: consensus checklist. BMJ. 2020;370: m2864.

Tsertsvadze A, Maglione M, Chou R, Garritty C, Coleman C, Lux L, et al. Updating comparative effectiveness reviews:current efforts in AHRQ’s effective health care program. J Clin Epidemiol. 2011;64(11):1208–15.

Cumpston M, Chandler J. Chapter IV: Updating a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Aug 2

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):1–8.

Pussegoda K, Turner L, Garritty C, Mayhew A, Skidmore B, Stevens A, et al. Identifying approaches for assessing methodological and reporting quality of systematic reviews: a descriptive study. Syst Rev. 2017;6(1):1–12.

Bhaumik S. Use of evidence for clinical practice guideline development. Trop Parasitol. 2017;7(2):65–71.

Moher D, Eastwood S, Olkin I, Drummond R, Stroup D. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999;354:1896–900.

Stroup D, Berlin J, Morton S, Olkin I, Williamson G, Rennie D, et al. Meta-analysis of observational studies in epidemiology A proposal for reporting. JAMA. 2000;238(15):2008–12.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009;62(10):1006–12.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.

Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.

Centre for Evidence-Based Medicine. Critical appraisal tools. CEBM; 2015. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/critical-appraisal-tools . Cited 2022 Apr 10

Page MJ, McKenzie JE, Higgins JPT. Tools for assessing risk of reporting biases in studies and syntheses of studies: a systematic review. BMJ Open. 2018;8(3):1–16.

Article   CAS   Google Scholar  

Ma LL, Wang YY, Yang ZH, Huang D, Weng H, Zeng XT. Methodological quality (risk of bias) assessment tools for primary and secondary medical studies: what are they and which is better? Mil Med Res. 2020;7(1):1–11.

Banzi R, Cinquini M, Gonzalez-Lorenzo M, Pecoraro V, Capobussi M, Minozzi S. Quality assessment versus risk of bias in systematic reviews: AMSTAR and ROBIS had similar reliability but differed in their construct and applicability. J Clin Epidemiol. 2018;99:24–32.

Swierz MJ, Storman D, Zajac J, Koperny M, Weglarz P, Staskiewicz W, et al. Similarities, reliability and gaps in assessing the quality of conduct of systematic reviews using AMSTAR-2 and ROBIS: systematic survey of nutrition reviews. BMC Med Res Methodol. 2021;21(1):1–10.

Pieper D, Puljak L, González-Lorenzo M, Minozzi S. Minor differences were found between AMSTAR 2 and ROBIS in the assessment of systematic reviews including both randomized and nonrandomized studies. J Clin Epidemiol. 2019;108:26–33.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. A psychometric study found AMSTAR 2 to be a valid and moderately reliable appraisal tool. J Clin Epidemiol. 2019;114:133–40.

Leclercq V, Hiligsmann M, Parisi G, Beaudart C, Tirelli E, Bruyère O. Best-worst scaling identified adequate statistical methods and literature search as the most important items of AMSTAR2 (A measurement tool to assess systematic reviews). J Clin Epidemiol. 2020;128:74–82.

Bühn S, Mathes T, Prengel P, Wegewitz U, Ostermann T, Robens S, et al. The risk of bias in systematic reviews tool showed fair reliability and good construct validity. J Clin Epidemiol. 2017;91:121–8.

Gates M, Gates A, Duarte G, Cary M, Becker M, Prediger B, et al. Quality and risk of bias appraisals of systematic reviews are inconsistent across reviewers and centers. J Clin Epidemiol. 2020;125:9–15.

Perry R, Whitmarsh A, Leach V, Davies P. A comparison of two assessment tools used in overviews of systematic reviews: ROBIS versus AMSTAR-2. Syst Rev. 2021;10(1):273.

Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):1–19.

Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Chapter 10: umbrella reviews. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis. JBI; 2020. Available from: https://synthesismanual.jbi.global . Cited 2022 Jul 11

Pieper D, Lorenz RC, Rombey T, Jacobs A, Rissling O, Freitag S, et al. Authors should clearly report how they derived the overall rating when applying AMSTAR 2—a cross-sectional study. J Clin Epidemiol. 2021;129:97–103.

Franco JVA, Meza N. Authors should also report the support for judgment when applying AMSTAR 2. J Clin Epidemiol. 2021;138:240.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6(7): e1000100.

Page MJ, Moher D. Evaluations of the uptake and impact of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement and extensions: a scoping review. Syst Rev. 2017;6(1):263.

Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372: n160.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement. J Clin Epidemiol. 2021;134:103–12.

Welch V, Petticrew M, Petkovic J, Moher D, Waters E, White H, et al. Extending the PRISMA statement to equity-focused systematic reviews (PRISMA-E 2012): explanation and elaboration. J Clin Epidemiol. 2016;70:68–89.

Beller EM, Glasziou PP, Altman DG, Hopewell S, Bastian H, Chalmers I, et al. PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Med. 2013;10(4): e1001419.

Moher D, Shamseer L, Clarke M. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1):1.

Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–84.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, et al. Preferred reporting items for a systematic review and meta-analysis of individual participant data: The PRISMA-IPD statement. JAMA. 2015;313(16):1657–65.

Zorzela L, Loke YK, Ioannidis JP, Golder S, Santaguida P, Altman DG, et al. PRISMA harms checklist: Improving harms reporting in systematic reviews. BMJ. 2016;352: i157.

McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy studies The PRISMA-DTA statement. JAMA. 2018;319(4):388–96.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

Wang X, Chen Y, Liu Y, Yao L, Estill J, Bian Z, et al. Reporting items for systematic reviews and meta-analyses of acupuncture: the PRISMA for acupuncture checklist. BMC Complement Altern Med. 2019;19(1):1–10.

Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al. PRISMA-S: An extension to the PRISMA statement for reporting literature searches in systematic reviews. J Med Libr Assoc. 2021;109(2):174–200.

Blanco D, Altman D, Moher D, Boutron I, Kirkham JJ, Cobo E. Scoping review on interventions to improve adherence to reporting guidelines in health research. BMJ Open. 2019;9(5): e26589.

Koster TM, Wetterslev J, Gluud C, Keus F, van der Horst ICC. Systematic overview and critical appraisal of meta-analyses of interventions in intensive care medicine. Acta Anaesthesiol Scand. 2018;62(8):1041–9.

Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51.

Pollock A, Berge E. How to do a systematic review. Int J Stroke. 2018;13(2):138–56.

Gagnier JJ, Kellam PJ. Reporting and methodological quality of systematic reviews in the orthopaedic literature. J Bone Jt Surg. 2013;95(11):1–7.

Martinez-Monedero R, Danielian A, Angajala V, Dinalo JE, Kezirian EJ. Methodological quality of systematic reviews and meta-analyses published in high-impact otolaryngology journals. Otolaryngol Head Neck Surg. 2020;163(5):892–905.

Boutron I, Crequit P, Williams H, Meerpohl J, Craig J, Ravaud P. Future of evidence ecosystem series 1. Introduction-evidence synthesis ecosystem needs dramatic change. J Clin Epidemiol. 2020;123:135–42.

Ioannidis JPA, Bhattacharya S, Evers JLH, Der Veen F, Van SE, Barratt CLR, et al. Protect us from poor-quality medical research. Hum Reprod. 2018;33(5):770–6.

Lasserson T, Thomas J, Higgins J. Section 1.5 Protocol development, Chapter 1: Starting a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/archive/v6/chapter-01#section-1-5 . Cited 2022 Mar 20

Stewart L, Moher D, Shekelle P. Why prospective registration of systematic reviews makes sense. Syst Rev. 2012;1(1):7–10.

Allers K, Hoffmann F, Mathes T, Pieper D. Systematic reviews with published protocols compared to those without: more effort, older search. J Clin Epidemiol. 2018;95:102–10.

Ge L, Tian J, Li Y, Pan J, Li G, Wei D, et al. Association between prospective registration and overall reporting and methodological quality of systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2018;93:45–55.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350: g7647.

Pieper D, Rombey T. Where to prospectively register a systematic review. Syst Rev. 2022;11(1):8.

PROSPERO. PROSPERO will require earlier registration. NIHR; 2022. Available from: https://www.crd.york.ac.uk/prospero/ . Cited 2022 Mar 20

Kirkham JJ, Altman DG, Williamson PR. Bias due to changes in specified outcomes during the systematic review process. PLoS ONE. 2010;5(3):3–7.

Victora CG, Habicht JP, Bryce J. Evidence-based public health: moving beyond randomized trials. Am J Public Health. 2004;94(3):400–5.

Peinemann F, Kleijnen J. Development of an algorithm to provide awareness in choosing study designs for inclusion in systematic reviews of healthcare interventions: a method study. BMJ Open. 2015;5(8): e007540.

Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M. The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350: h2147.

Junqueira DR, Phillips R, Zorzela L, Golder S, Loke Y, Moher D, et al. Time to improve the reporting of harms in randomized controlled trials. J Clin Epidemiol. 2021;136:216–20.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JPA. Routinely collected data and comparative effectiveness evidence: promises and limitations. CMAJ. 2016;188(8):E158–64.

Murad MH. Clinical practice guidelines: a primer on development and dissemination. Mayo Clin Proc. 2017;92(3):423–33.

Abdelhamid AS, Loke YK, Parekh-Bhurke S, Chen Y-F, Sutton A, Eastwood A, et al. Use of indirect comparison methods in systematic reviews: a survey of cochrane review authors. Res Synth Methods. 2012;3(2):71–9.

Jüni P, Holenstein F, Sterne J, Bartlett C, Egger M. Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol. 2002;31(1):115–23.

Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Control Clin Trials. 1998;19(2):159–66.

Jones CW, Keil LG, Weaver MA, Platts-Mills TF. Clinical trials registries are under-utilized in the conduct of systematic reviews: a cross-sectional analysis. Syst Rev. 2014;3(1):1–7.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ. 2017;356: j448.

Fanelli D, Costas R, Ioannidis JPA. Meta-assessment of bias in science. Proc Natl Acad Sci USA. 2017;114(14):3714–9.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-English reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017;17(1):64.

Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database Syst Rev. 2007;2:MR000010.

Shojania K, Sampson M, Ansari MT, Ji J, Garritty C, Radar T, et al. Updating systematic reviews. AHRQ Technical Reviews. 2007: Report 07–0087.

Tate RL, Perdices M, Rosenkoetter U, Wakim D, Godbee K, Togher L, et al. Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychol Rehabil. 2013;23(5):619–38.

Tate RL, Perdices M, McDonald S, Togher L, Rosenkoetter U. The design, conduct and report of single-case research: Resources to improve the quality of the neurorehabilitation literature. Neuropsychol Rehabil. 2014;24(3–4):315–31.

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366: l4894.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355: i4919.

Igelström E, Campbell M, Craig P, Katikireddi SV. Cochrane’s risk of bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: a methodological systematic review. J Clin Epidemiol. 2021;140:22–32.

McKenzie JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-12 . Cited 2022 Apr 10

Ioannidis J, Patsopoulos N, Rothstein H. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ. 2008;336(7658):1413–5.

Stewart LA, Tierney JF. To IPD or not to IPD? Eval Health Prof. 2002;25(1):76–97.

Tierney JF, Stewart LA, Clarke M. Chapter 26: Individual participant data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-26 . Cited 2022 Oct 12

Chaimani A, Caldwell D, Li T, Higgins J, Salanti G. Chapter 11: Undertaking network meta-analyses. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12.

Cooper H, Hedges L, Valentine J. The handbook of research synthesis and meta-analysis. 3rd ed. Russell Sage Foundation; 2019.

Sutton AJ, Abrams KR, Jones DR, Sheldon T, Song F. Methods for meta-analysis in medical research. Methods for meta-analysis in medical research; 2000.

Deeks J, Higgins JPT, Altman DG. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic review of interventions. Cochrane; 2022. Available from: http://www.training.cochrane.org/handbook . Cited 2022 Mar 20.

Clarke MJ. Individual patient data meta-analyses. Best Pract Res Clin Obstet Gynaecol. 2005;19(1):47–55.

Catalá-López F, Tobías A, Cameron C, Moher D, Hutton B. Network meta-analysis for comparing treatment effects of multiple interventions: an introduction. Rheumatol Int. 2014;34(11):1489–96.

Debray T, Schuit E, Efthimiou O, Reitsma J, Ioannidis J, Salanti G, et al. An overview of methods for network meta-analysis using individual participant data: when do benefits arise? Stat Methods Med Res. 2016;27(5):1351–64.

Tonin FS, Rotta I, Mendes AM, Pontarolo R. Network meta-analysis : a technique to gather evidence from direct and indirect comparisons. Pharm Pract (Granada). 2017;15(1):943.

Tierney JF, Vale C, Riley R, Smith CT, Stewart L, Clarke M, et al. Individual participant data (IPD) metaanalyses of randomised controlled trials: guidance on their use. PLoS Med. 2015;12(7): e1001855.

Rouse B, Chaimani A, Li T. Network meta-analysis: an introduction for clinicians. Intern Emerg Med. 2017;12(1):103–11.

Cochrane Training. Review Manager RevMan Web. Cochrane; 2022. Available from: https://training.cochrane.org/online-learning/core-software/revman . Cited 2022 Jun 24

MetaXL. MetalXL. Epi Gear; 2016. Available from: http://epigear.com/index_files/metaxl.html . Cited 2022 Jun 24.

JBI. JBI SUMARI. JBI; 2019. Available from: https://sumari.jbi.global/ . Cited 2022 Jun 24.

Ryan R. Cochrane Consumers and Communication Review Group: data synthesis and analysis. Cochrane Consumers and Communication Review Group; 2013. Available from: http://cccrg.cochrane.org . Cited 2022 Jun 24

McKenzie JE, Beller EM, Forbes AB. Introduction to systematic reviews and meta-analysis. Respirology. 2016;21(4):626–37.

Campbell M, Katikireddi SV, Sowden A, Thomson H. Lack of transparency in reporting narrative synthesis of quantitative data: a methodological assessment of systematic reviews. J Clin Epidemiol. 2019;105:1–9.

Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368: l6890.

McKenzie JE, Brennan S, Ryan R. Summarizing study characteristics and preparing for synthesis. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12

AHRQ. Systems to rate the strength of scientific evidence. Evidence report/technology assessment no. 47. AHRQ; 2002. Available from: https://archive.ahrq.gov/clinic/epcsums/strengthsum.htm . Cited 2022 Apr 10.

Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches. BMC Health Serv Res. 2004;4(1):38.

Ioannidis JPA. Meta-research: the art of getting it wrong.  Res Synth Methods. 2010;1(3–4):169–84.

Lai NM, Teng CL, Lee ML. Interpreting systematic reviews:  are we ready to make our own conclusions? A cross sectional study. BMC Med. 2011;9(1):30.

Glenton C, Santesso N, Rosenbaum S, Nilsen ES, Rader T, Ciapponi A, et al. Presenting the results of Cochrane systematic reviews to a consumer audience: a qualitative study. Med Decis Making. 2010;30(5):566–77.

Yavchitz A, Ravaud P, Altman DG, Moher D, HrobjartssonA, Lasserson T, et al. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. J Clin Epidemiol. 2016;75:56–65.

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:7454.

GRADE Working Group. Organizations. GRADE; 2022 [cited 2023 May 2].  Available from: www.gradeworkinggroup.org .

Hartling L, Fernandes RM, Seida J, Vandermeer B, Dryden DM. From the trenches: a cross-sectional study applying the grade tool in systematic reviews of healthcare interventions.  PLoS One. 2012;7(4):e34697.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, et al. The GRADE working group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017;87:4–13.

Schünemann H, Brozek J, Guyatt G, Oxman AD, Editors. Section 6.3.2. Symbolic representation. GRADE Handbook [internet].  GRADE; 2013 [cited 2022 Jan 27]. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954 .

Siemieniuk R, Guyatt G What is GRADE? [internet] BMJ Best Practice; 2017 [cited 2022 Jul 20]. Available from: https://bestpractice.bmj.com/info/toolkit/learn-ebm/what-is-grade/ .

Guyatt G, Oxman AD, Sultan S, Brozek J, Glasziou P, Alonso-Coello P, et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J Clin Epidemiol. 2013;66(2):151–7.

Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol. 2011;64(12):1311–6.

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence - Study limitations (risk of bias). J Clin Epidemiol. 2011;64(4):407–15.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence - Imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence - Inconsistency. J Clin Epidemiol. 2011;64(12):1294–302.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence - Indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence - Publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

Andrews JC, Schünemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, et al. GRADE guidelines: 15. Going from evidence to recommendation - Determinants of a recommendation’s direction and strength. J Clin Epidemiol. 2013;66(7):726–35.

Fleming PS, Koletsi D, Ioannidis JPA, Pandis N. High quality of the evidence for medical and other health-related interventions was uncommon in Cochrane systematic reviews. J Clin Epidemiol. 2016;78:34–42.

Howick J, Koletsi D, Pandis N, Fleming PS, Loef M, Walach H, et al. The quality of evidence for medical interventions does not improve or worsen: a metaepidemiological study of Cochrane reviews. J Clin Epidemiol. 2020;126:154–9.

Mustafa RA, Santesso N, Brozek J, Akl EA, Walter SD, Norman G, et al. The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses. J Clin Epidemiol. 2013;66(7):736-742.e5.

Schünemann H, Brozek J, Guyatt G, Oxman A, editors. Section 5.4: Overall quality of evidence. GRADE Handbook. GRADE; 2013. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954a . Cited 2022 Mar 25.

GRADE Working Group. Criteria for using GRADE. GRADE; 2016. Available from: https://www.gradeworkinggroup.org/docs/Criteria_for_using_GRADE_2016-04-05.pdf . Cited 2022 Jan 26

Werner SS, Binder N, Toews I, Schünemann HJ, Meerpohl JJ, Schwingshackl L. Use of GRADE in evidence syntheses published in high-impact-factor nutrition journals: a methodological survey. J Clin Epidemiol. 2021;135:54–69.

Zhang S, Wu QJ, Liu SX. A methodologic survey on use of the GRADE approach in evidence syntheses published in high-impact factor urology and nephrology journals. BMC Med Res Methodol. 2022;22(1):220.

Li L, Tian J, Tian H, Sun R, Liu Y, Yang K. Quality and transparency of overviews of systematic reviews. J Evid Based Med. 2012;5(3):166–73.

Pieper D, Buechter R, Jerinic P, Eikermann M. Overviews of reviews often have limited rigor: a systematic review. J Clin Epidemiol. 2012;65(12):1267–73.

Cochrane Editorial Unit. Appendix 1: Checklist for auditing GRADE and SoF tables in protocols of intervention reviews. Cochrane Training; 2022. Available from: https://training.cochrane.org/gomo/modules/522/resources/8307/Checklist for GRADE and SoF methods in Protocols for Gomo.pdf. Cited 2022 Mar 12

Ryan R, Hill S. How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group. Cochrane; 2016. Available from: https://cccrg.cochrane.org/author-resources .

Cunningham M, France EF, Ring N, Uny I, Duncan EA, Roberts RJ, et al. Developing a reporting guideline to improve meta-ethnography in health research: the eMERGe mixed-methods study. Heal Serv Deliv Res. 2019;7(4):1–116.

Tong A, Flemming K, McInnes E, Oliver S, Craig J. Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med Res Methodol. 2012;12:181.

Gates M, Gates G, Pieper D, Fernandes R, Tricco A, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378:e070849.

Whiting PF, Reitsma JB, Leeflang MMG, Sterne JAC, Bossuyt PMM, Rutjes AWSS, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(4):529–36.

Hayden JA, van der Windt DA, Cartwright JL, Co P. Research and reporting methods assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6.

Critical Appraisal Skills Programme. CASP qualitative checklist. CASP; 2018. Available from: https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf . Cited 2022 Apr 26

Hannes K, Lockwood C, Pearson A. A comparative analysis of three online appraisal instruments’ ability to assess validity in qualitative research. Qual Health Res. 2010;20(12):1736–43.

Munn Z, Moola S, Riitano D, Lisy K. The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Heal Policy Manag. 2014;3(3):123–8.

Lewin S, Bohren M, Rashidian A, Munthe-Kaas H, Glenton C, Colvin CJ, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings-paper 2: how to make an overall CERQual assessment of confidence and create a Summary of Qualitative Findings table. Implement Sci. 2018;13(suppl 1):10.

Munn Z, Porritt K, Lockwood C, Aromataris E, Pearson A.  Establishing confidence in the output of qualitative research synthesis: the ConQual approach. BMC Med Res Methodol. 2014;14(1):108.

Flemming K, Booth A, Hannes K, Cargo M, Noyes J. Cochrane Qualitative and Implementation Methods Group guidance series—paper 6: reporting guidelines for qualitative, implementation, and process evaluation evidence syntheses. J Clin Epidemiol. 2018;97:79–85.

Lockwood C, Munn Z, Porritt K. Qualitative research synthesis:  methodological guidance for systematic reviewers utilizing meta-aggregation. Int J Evid Based Health. 2015;13(3):179–87.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 1.  Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol. 2020;122:129–41.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol. 2020;122:142–52.

Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: use of GRADE for the assessment of evidence about prognostic factors:  rating certainty in identification of groups of patients with different absolute risks. J Clin Epidemiol. 2020;121:62–70.

Janiaud P, Agarwal A, Belbasis L, Tzoulaki I. An umbrella review of umbrella reviews for non-randomized observational evidence on putative risk and protective factors [internet]. OSF protocol; 2021 [cited 2022 May 28]. Available from: https://osf.io/xj5cf/ .

Mokkink LB, Prinsen CA, Patrick DL, Alonso J, Bouter LM, et al. COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) - user manual. COSMIN; 2018 [cited 2022 Feb 15]. Available from:  http://www.cosmin.nl/ .

Thomas J, M P, Noyes J, Chandler J, Rehfuess E, Tugwell P, et al. Chapter 17: Intervention complexity. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-17 . Cited 2022 Oct 12

Guise JM, Chang C, Butler M, Viswanathan M, Tugwell P. AHRQ series on complex intervention systematic reviews—paper 1: an introduction to a series of articles that provide guidance and tools for reviews of complex interventions. J Clin Epidemiol. 2017;90:6–10.

Riaz IB, He H, Ryu AJ, Siddiqi R, Naqvi SAA, Yao Y, et al. A living, interactive systematic review and network meta-analysis of first-line treatment of metastatic renal cell carcinoma [formula presented]. Eur Urol. 2021;80(6):712–23.

Créquit P, Trinquart L, Ravaud P. Live cumulative network meta-analysis: protocol for second-line treatments in advanced non-small-cell lung cancer with wild-type or unknown status for epidermal growth factor receptor. BMJ Open. 2016;6(8):e011841.

Ravaud P, Créquit P, Williams HC, Meerpohl J, Craig JC, Boutron I. Future of evidence ecosystem series: 3. From an evidence synthesis ecosystem to an evidence ecosystem. J Clin Epidemiol. 2020;123:153–61.

Download references

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Author information

Authors and affiliations.

Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC, USA

Kat Kolaski

Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY, USA

Lynne Romeiser Logan

Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA, USA

John P. A. Ioannidis

You can also search for this author in PubMed   Google Scholar

Contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kat Kolaski .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’ s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Supplementary Information

Additional file 2a..

Overviews, scoping reviews, rapid reviews and living reviews.

Additional file 2B.

Practical scheme for distinguishing types of research evidence.

Additional file 4.

Presentation of forest plots.

Additional file 5A.

Illustrations of the GRADE approach.

Additional file 5B.

 Adaptations of GRADE for evidence syntheses.

Additional file 6.

 Links to Concise Guide online resources.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kolaski, K., Logan, L.R. & Ioannidis, J.P.A. Guidance to best tools and practices for systematic reviews. Syst Rev 12 , 96 (2023). https://doi.org/10.1186/s13643-023-02255-9

Download citation

Received : 03 October 2022

Accepted : 19 February 2023

Published : 08 June 2023

DOI : https://doi.org/10.1186/s13643-023-02255-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Certainty of evidence
  • Critical appraisal
  • Methodological quality
  • Risk of bias
  • Systematic review

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

critical appraisal tools literature reviews

  • Research article
  • Open access
  • Published: 16 September 2004

A systematic review of the content of critical appraisal tools

  • Persis Katrak 1 ,
  • Andrea E Bialocerkowski 2 ,
  • Nicola Massy-Westropp 1 ,
  • VS Saravana Kumar 1 &
  • Karen A Grimmer 1  

BMC Medical Research Methodology volume  4 , Article number:  22 ( 2004 ) Cite this article

155k Accesses

207 Citations

15 Altmetric

Metrics details

Consumers of research (researchers, administrators, educators and clinicians) frequently use standard critical appraisal tools to evaluate the quality of published research reports. However, there is no consensus regarding the most appropriate critical appraisal tool for allied health research. We summarized the content, intent, construction and psychometric properties of published, currently available critical appraisal tools to identify common elements and their relevance to allied health research.

A systematic review was undertaken of 121 published critical appraisal tools sourced from 108 papers located on electronic databases and the Internet. The tools were classified according to the study design for which they were intended. Their items were then classified into one of 12 criteria based on their intent. Commonly occurring items were identified. The empirical basis for construction of the tool, the method by which overall quality of the study was established, the psychometric properties of the critical appraisal tools and whether guidelines were provided for their use were also recorded.

Eighty-seven percent of critical appraisal tools were specific to a research design, with most tools having been developed for experimental studies. There was considerable variability in items contained in the critical appraisal tools. Twelve percent of available tools were developed using specified empirical research. Forty-nine percent of the critical appraisal tools summarized the quality appraisal into a numeric summary score. Few critical appraisal tools had documented evidence of validity of their items, or reliability of use. Guidelines regarding administration of the tools were provided in 43% of cases.

Conclusions

There was considerable variability in intent, components, construction and psychometric properties of published critical appraisal tools for research reports. There is no "gold standard' critical appraisal tool for any study design, nor is there any widely accepted generic tool that can be applied equally well across study types. No tool was specific to allied health research requirements. Thus interpretation of critical appraisal of research reports currently needs to be considered in light of the properties and intent of the critical appraisal tool chosen for the task.

Peer Review reports

Consumers of research (clinicians, researchers, educators, administrators) frequently use standard critical appraisal tools to evaluate the quality and utility of published research reports [ 1 ]. Critical appraisal tools provide analytical evaluations of the quality of the study, in particular the methods applied to minimise biases in a research project [ 2 ]. As these factors potentially influence study results, and the way that the study findings are interpreted, this information is vital for consumers of research to ascertain whether the results of the study can be believed, and transferred appropriately into other environments, such as policy, further research studies, education or clinical practice. Hence, choosing an appropriate critical appraisal tool is an important component of evidence-based practice.

Although the importance of critical appraisal tools has been acknowledged [ 1 , 3 – 5 ] there appears to be no consensus regarding the 'gold standard' tool for any medical evidence. In addition, it seems that consumers of research are faced with a large number of critical appraisal tools from which to choose. This is evidenced by the recent report by the Agency for Health Research Quality in which 93 critical appraisal tools for quantitative studies were identified [ 6 ]. Such choice may pose problems for research consumers, as dissimilar findings may well be the result when different critical appraisal tools are used to evaluate the same research report [ 6 ].

Critical appraisal tools can be broadly classified into those that are research design-specific and those that are generic. Design-specific tools contain items that address methodological issues that are unique to the research design [ 5 , 7 ]. This precludes comparison however of the quality of different study designs [ 8 ]. To attempt to overcome this limitation, generic critical appraisal tools have been developed, in an attempt to enhance the ability of research consumers to synthesise evidence from a range of quantitative and or qualitative study designs (for instance [ 9 ]). There is no evidence that generic critical appraisal tools and design-specific tools provide a comparative evaluation of research designs.

Moreover, there appears to be little consensus regarding the most appropriate items that should be contained within any critical appraisal tool. This paper is concerned primarily with critical appraisal tools that address the unique properties of allied health care and research [ 10 ]. This approach was taken because of the unique nature of allied health contacts with patients, and because evidence-based practice is an emerging area in allied health [ 10 ]. The availability of so many critical appraisal tools (for instance [ 6 ]) may well prove daunting for allied health practitioners who are learning to critically appraise research in their area of interest. For the purposes of this evaluation, allied health is defined as encompassing "...all occasions of service to non admitted patients where services are provided at units/clinics providing treatment/counseling to patients. These include units primarily concerned with physiotherapy, speech therapy, family panning, dietary advice, optometry occupational therapy..." [ 11 ].

The unique nature of allied health practice needs to be considered in allied health research. Allied health research thus differs from most medical research, with respect to:

• the paradigm underpinning comprehensive and clinically-reasoned descriptions of diagnosis (including validity and reliability). An example of this is in research into low back pain, where instead of diagnosis being made on location and chronicity of pain (as is common) [ 12 ], it would be made on the spinal structure and the nature of the dysfunction underpinning the symptoms, which is arrived at by a staged and replicable clinical reasoning process [ 10 , 13 ].

• the frequent use of multiple interventions within the one contact with the patient (an occasion of service), each of which requires appropriate description in terms of relationship to the diagnosis, nature, intensity, frequency, type of instruction provided to the patient, and the order in which the interventions were applied [ 13 ]

• the timeframe and frequency of contact with the patient (as many allied health disciplines treat patients in episodes of care that contain multiple occasions of service, and which can span many weeks, or even years in the case of chronic problems [ 14 ])

• measures of outcome, including appropriate methods and timeframes of measuring change in impairment, function, disability and handicap that address the needs of different stakeholders (patients, therapists, funders etc) [ 10 , 12 , 13 ].

Search strategy

In supplementary data [see additional file 1 ].

Data organization and extraction

Two independent researchers (PK, NMW) participated in all aspects of this review, and they compared and discussed their findings with respect to inclusion of critical appraisal tools, their intent, components, data extraction and item classification, construction and psychometric properties. Disagreements were resolved by discussion with a third member of the team (KG).

Data extraction consisted of a four-staged process. First, identical replica critical appraisal tools were identified and removed prior to analysis. The remaining critical appraisal tools were then classified according to the study design for which they were intended to be used [ 1 , 2 ]. The scientific manner in which the tools had been constructed was classified as whether an empirical research approach has been used, and if so, which type of research had been undertaken. Finally, the items contained in each critical appraisal tool were extracted and classified into one of eleven groups, which were based on the criteria described by Clarke and Oxman [ 4 ] as:

• Study aims and justification

• Methodology used , which encompassed method of identification of relevant studies and adherence to study protocol;

• Sample selection , which ranged from inclusion and exclusion criteria, to homogeneity of groups;

• Method of randomization and allocation blinding;

• Attrition : response and drop out rates;

• Blinding of the clinician, assessor, patient and statistician as well as the method of blinding;

• Outcome measure characteristics;

• Intervention or exposure details;

• Method of data analyses ;

• Potential sources of bias ; and

• Issues of external validity , which ranged from application of evidence to other settings to the relationship between benefits, cost and harm.

An additional group, " miscellaneous ", was used to describe items that could not be classified into any of the groups listed above.

Data synthesis

Data was synthesized using MS Excel spread sheets as well as narrative format by describing the number of critical appraisal tools per study design and the type of items they contained. Descriptions were made of the method by which the overall quality of the study was determined, evidence regarding the psychometric properties of the tools (validity and reliability) and whether guidelines were provided for use of the critical appraisal tool.

One hundred and ninety-three research reports that potentially provided a description of a critical appraisal tool (or process) were identified from the search strategy. Fifty-six of these papers were unavailable for review due to outdated Internet links, or inability to source the relevant journal through Australian university and Government library databases. Of the 127 papers retrieved, 19 were excluded from this review, as they did not provide a description of the critical appraisal tool used, or were published in languages other than English. As a result, 108 papers were reviewed, which yielded 121 different critical appraisal tools [ 1 – 5 , 7 , 9 , 15 – 102 , 116 ].

Empirical basis for tool construction

We identified 14 instruments (12% all tools) which were reported as having been constructed using a specified empirical approach [ 20 , 29 , 30 , 32 , 35 , 40 , 49 , 51 , 70 – 72 , 79 , 103 , 116 ]. The empirical research reflected descriptive and/or qualitative approaches, these being critical review of existing tools [ 40 , 72 ], Delphi techniques to identify then refine data items [ 32 , 51 , 71 ], questionnaires and other forms of written surveys to identify and refine data items [ 70 , 79 , 103 ], facilitated structured consensus meetings [ 20 , 29 , 30 , 35 , 40 , 49 , 70 , 72 , 79 , 116 ], and pilot validation testing [ 20 , 40 , 72 , 103 , 116 ]. In all the studies which reported developing critical appraisal tools using a consensus approach, a range of stakeholder input was sought, reflecting researchers and clinicians in a range of health disciplines, students, educators and consumers. There were a further 31 papers which cited other studies as the source of the tool used in the review, but which provided no information on why individual items had been chosen, or whether (or how) they had been modified. Moreover, for 21 of these tools, the cited sources of the critical appraisal tool did not report the empirical basis on which the tool had been constructed.

Critical appraisal tools per study design

Seventy-eight percent (N = 94) of the critical appraisal tools were developed for use on primary research [ 1 – 5 , 7 , 9 , 18 , 19 , 25 – 27 , 34 , 37 – 41 ], while the remainder (N = 26) were for secondary research (systematic reviews and meta-analyses) [ 2 – 5 , 15 – 36 , 116 ]. Eighty-seven percent (N = 104) of all critical appraisal tools were design-specific [ 2 – 5 , 7 , 9 , 15 – 90 ], with over one third (N = 45) developed for experimental studies (randomized controlled trials, clinical trials) [ 2 – 4 , 25 – 27 , 34 , 37 – 73 ]. Sixteen critical appraisal tools were generic. Of these, six were developed for use on both experimental and observational studies [ 9 , 91 – 95 ], whereas 11 were purported to be useful for any qualitative and quantitative research design [ 1 , 18 , 41 , 96 – 102 , 116 ] (see Figure 1 , Table 1 ).

figure 1

Number of critical appraisal tools per study design [1,2]

Critical appraisal items

One thousand, four hundred and seventy five items were extracted from these critical appraisal tools. After grouping like items together, 173 different item types were identified, with the most frequently reported items being focused towards assessing the external validity of the study (N = 35) and method of data analyses (N = 28) (Table 2 ). The most frequently reported items across all critical appraisal tools were:

Eligibility criteria (inclusion/exclusion criteria) (N = 63)

Appropriate statistical analyses (N = 47)

Random allocation of subjects (N = 43)

Consideration of outcome measures used (N = 43)

Sample size justification/power calculations (N = 39)

Study design reported (N = 36)

Assessor blinding (N = 36)

Design-specific critical appraisal tools

Systematic reviews.

Eighty-seven different items were extracted from the 26 critical appraisal tools, which were designed to evaluate the quality of systematic reviews. These critical appraisal tools frequently contained items regarding data analyses and issues of external validity (Tables 2 and 3 ).

Items assessing data analyses were focused to the methods used to summarize the results, assessment of sensitivity of results and whether heterogeneity was considered, whereas the nature of reporting of the main results, interpretation of them and their generalizability were frequently used to assess the external validity of the study findings. Moreover, systematic review critical appraisal tools tended to contain items such as identification of relevant studies, search strategy used, number of studies included and protocol adherence, that would not be relevant for other study designs. Blinding and randomisation procedures were rarely included in these critical appraisal tools.

Experimental studies

One hundred and twenty thirteen different items were extracted from the 45 experimental critical appraisal tools. These items most frequently assessed aspects of data analyses and blinding (Tables 1 and 2 ). Data analyses items were focused on whether appropriate statistical analysis was performed, whether a sample size justification or power calculation was provided and whether side effects of the intervention were recorded and analysed. Blinding was focused on whether the participant, clinician and assessor were blinded to the intervention.

Diagnostic studies

Forty-seven different items were extracted from the seven diagnostic critical appraisal tools. These items frequently addressed issues involving data analyses, external validity of results and sample selection that were specific to diagnostic studies (whether the diagnostic criteria were defined, definition of the "gold" standard, the calculation of sensitivity and specificity) (Tables 1 and 2 ).

Observational studies

Seventy-four different items were extracted from the 19 critical appraisal tools for observational studies. These items primarily focused on aspects of data analyses (see Tables 1 and 2 , such as whether confounders were considered in the analysis, whether a sample size justification or power calculation was provided and whether appropriate statistical analyses were preformed.

Qualitative studies

Thirty-six different items were extracted from the seven qualitative study critical appraisal tools. The majority of these items assessed issues regarding external validity, methods of data analyses and the aims and justification of the study (Tables 1 and 2 ). Specifically, items were focused to whether the study question was clearly stated, whether data analyses were clearly described and appropriate, and application of the study findings to the clinical setting. Qualitative critical appraisal tools did not contain items regarding sample selection, randomization, blinding, intervention or bias, perhaps because these issues are not relevant to the qualitative paradigm.

Generic critical appraisal tools

Experimental and observational studies.

Forty-two different items were extracted from the six critical appraisal tools that could be used to evaluate experimental and observational studies. These tools most frequently contained items that addressed aspects of sample selection (such as inclusion/exclusion criteria of participants, homogeneity of participants at baseline) and data analyses (such as whether appropriate statistical analyses were performed, whether a justification of the sample size or power calculation were provided).

All study designs

Seventy-eight different items were contained in the ten critical appraisal tools that could be used for all study designs (quantitative and qualitative). The majority of these items focused on whether appropriate data analyses were undertaken (such as whether confounders were considered in the analysis, whether a sample size justification or power calculation was provided and whether appropriate statistical analyses were preformed) and external validity issues (generalization of results to the population, value of the research findings) (see Tables 1 and 2 ).

Allied health critical appraisal tools

We found no critical appraisal instrument specific to allied health research, despite finding at least seven critical appraisal instruments associated with allied health topics (mostly physiotherapy management of orthopedic conditions) [ 37 , 39 , 52 , 58 , 59 , 65 ]. One critical appraisal development group proposed two instruments [ 9 ], specific to quantitative and qualitative research respectively. The core elements of allied health research quality (specific diagnosis criteria, intervention descriptions, nature of patient contact and appropriate outcome measures) were not addressed in any one tool sourced for this evaluation. We identified 152 different ways of considering quality reporting of outcome measures in the 121 critical appraisal tools, and 81 ways of considering description of interventions. Very few tools which were not specifically targeted to diagnostic studies (less than 10% of the remaining tools) addressed diagnostic criteria. The critical appraisal instrument that seemed most related to allied health research quality [ 39 ] sought comprehensive evaluation of elements of intervention and outcome, however this instrument was relevant only to physiotherapeutic orthopedic experimental research.

Overall study quality

Forty-nine percent (N = 58) of critical appraisal tools summarised the results of the quality appraisal into a single numeric summary score [ 5 , 7 , 15 – 25 , 37 – 59 , 74 – 77 , 80 – 83 , 87 , 91 – 93 , 96 , 97 ] (Figure 2 ). This was achieved by one of two methods:

figure 2

Number of critical appraisal tools with, and without, summary quality scores

An equal weighting system, where one point was allocated to each item fulfilled; or

A weighted system, where fulfilled items were allocated various points depending on their perceived importance.

However, there was no justification provided for any of the scoring systems used. In the remaining critical appraisal tools (N = 62), a single numerical summary score was not provided [ 1 – 4 , 9 , 25 – 36 , 60 – 73 , 78 , 79 , 84 – 90 , 94 , 95 , 98 – 102 ]. This left the research consumer to summarize the results of the appraisal in a narrative manner, without the assistance of a standard approach.

Psychometric properties of critical appraisal tools

Few critical appraisal tools had documented evidence of their validity and reliability. Face validity was established in nine critical appraisal tools, seven of which were developed for use on experimental studies [ 38 , 40 , 45 , 49 , 51 , 63 , 70 ] and two for systematic reviews [ 32 , 103 ]. Intra-rater reliability was established for only one critical appraisal tool as part of its empirical development process [ 40 ], whereas inter-rater reliability was reported for two systematic review tools [ 20 , 36 ] (for one of these as part of the developmental process [ 20 ]) and seven experimental critical appraisal tools [ 38 , 40 , 45 , 51 , 55 , 56 , 63 ] (for two of these as part of the developmental process [ 40 , 51 ]).

Critical appraisal tool guidelines

Forty-three percent (N = 52) of critical appraisal tools had guidelines that informed the user of the interpretation of each item contained within them (Table 2 ). These guidelines were most frequently in the form of a handbook or published paper (N = 31) [ 2 , 4 , 9 , 15 , 20 , 25 , 28 , 29 , 31 , 36 , 37 , 41 , 50 , 64 – 67 , 69 , 80 , 84 – 87 , 89 , 90 , 95 , 100 , 116 ], whereas in 14 critical appraisal tools explanations accompanied each item [ 16 , 26 , 27 , 40 , 49 , 51 , 57 , 59 , 79 , 83 , 91 , 102 ].

Our search strategy identified a large number of published critical appraisal tools that are currently available to critically appraise research reports. There was a distinct lack of information on tool development processes in most cases. Many of the tools were reported to be modifications of other published tools, or reflected specialty concerns in specific clinical or research areas, without attempts to justify inclusion criteria. Less than 10 of these tools were relevant to evaluation of the quality of allied health research, and none of these were based on an empirical research approach. We are concerned that although our search was systematic and extensive [ 104 , 105 ], our broad key words and our lack of ready access to 29% of potentially useful papers (N = 56) potentially constrained us from identifying all published critical appraisal tools. However, consumers of research seeking critical appraisal instruments are not likely to seek instruments from outdated Internet links and unobtainable journals, thus we believe that we identified the most readily available instruments. Thus, despite the limitations on sourcing all possible tools, we believe that this paper presents a useful synthesis of the readily available critical appraisal tools.

The majority of the critical appraisal tools were developed for a specific research design (87%), with most designed for use on experimental studies (38% of all critical appraisal tools sourced). This finding is not surprising as, according to the medical model, experimental studies sit at or near the top of the hierarchy of evidence [ 2 , 8 ]. In recent years, allied health researchers have strived to apply the medical model of research to their own discipline by conducting experimental research, often by using the randomized controlled trial design [ 106 ]. This trend may be the reason for the development of experimental critical appraisal tools reported in allied health-specific research topics [ 37 , 39 , 52 , 58 , 59 , 65 ].

We also found a considerable number of critical appraisal tools for systematic reviews (N = 26), which reflects the trend to synthesize research evidence to make it relevant for clinicians [ 105 , 107 ]. Systematic review critical appraisal tools contained unique items (such as identification of relevant studies, search strategy used, number of studies included, protocol adherence) compared with tools used for primary studies, a reflection of the secondary nature of data synthesis and analysis.

In contrast, we identified very few qualitative study critical appraisal tools, despite the presence of many journal-specific guidelines that outline important methodological aspects required in a manuscript submitted for publication [ 108 – 110 ]. This finding may reflect the more traditional, quantitative focus of allied health research [ 111 ]. Alternatively, qualitative researchers may view the robustness of their research findings in different terms compared with quantitative researchers [ 112 , 113 ]. Hence the use of critical appraisal tools may be less appropriate for the qualitative paradigm. This requires further consideration.

Of the small number of generic critical appraisal tools, we found few that could be usefully applied (to any health research, and specifically to the allied health literature), because of the generalist nature of their items, variable interpretation (and applicability) of items across research designs, and/or lack of summary scores. Whilst these types of tools potentially facilitate the synthesis of evidence across allied health research designs for clinicians, their lack of specificity in asking the 'hard' questions about research quality related to research design also potentially precludes their adoption for allied health evidence-based practice. At present, the gold standard study design when synthesizing evidence is the randomized controlled trial [ 4 ], which underpins our finding that experimental critical appraisal tools predominated in the allied health literature [ 37 , 39 , 52 , 58 , 59 , 65 ]. However, as more systematic literature reviews are undertaken on allied health topics, it may become more accepted that evidence in the form of other research design types requires acknowledgement, evaluation and synthesis. This may result in the development of more appropriate and clinically useful allied health critical appraisal tools.

A major finding of our study was the volume and variation in available critical appraisal tools. We found no gold standard critical appraisal tool for any type of study design. Therefore, consumers of research are faced with frustrating decisions when attempting to select the most appropriate tool for their needs. Variable quality evaluations may be produced when different critical appraisal tools are used on the same literature [ 6 ]. Thus, interpretation of critical analysis must be carefully considered in light of the critical appraisal tool used.

The variability in the content of critical appraisal tools could be accounted for by the lack of any empirical basis of tool construction, established validity of item construction, and the lack of a gold standard against which to compare new critical tools. As such, consumers of research cannot be certain that the content of published critical appraisal tools reflect the most important aspects of the quality of studies that they assess [ 114 ]. Moreover, there was little evidence of intra- or inter-rater reliability of the critical appraisal tools. Coupled with the lack of protocols for use, this may mean that critical appraisers could interpret instrument items in different ways over repeated occasions of use. This may produce variable results [123].

Based on the findings of this evaluation, we recommend that consumers of research should carefully select critical appraisal tools for their needs. The selected tools should have published evidence of the empirical basis for their construction, validity of items and reliability of interpretation, as well as guidelines for use, so that the tools can be applied and interpreted in a standardized manner. Our findings highlight the need for consensus to be reached regarding the important and core items for critical appraisal tools that will produce a more standardized environment for critical appraisal of research evidence. As a consequence, allied health research will specifically benefit from having critical appraisal tools that reflect best practice research approaches which embed specific research requirements of allied health disciplines.

National Health and Medical Research Council: How to Review the Evidence: Systematic Identification and Review of the Scientific Literature. Canberra. 2000

Google Scholar  

National Health and Medical Research Council: How to Use the Evidence: Assessment and Application of Scientific Evidence. Canberra. 2000

Joanna Briggs Institute. [ http://www.joannabriggs.edu.au ]

Clarke M, Oxman AD: Cochrane Reviewer's Handbook 4.2.0. 2003, Oxford: The Cochrane Collaboration

Crombie IK: The Pocket Guide to Critical Appraisal: A Handbook for Health Care Professionals. 1996, London: BMJ Publishing Group

Agency for Healthcare Research and Quality: Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No. 47, Publication No. 02-E016. Rockville. 2002

Elwood JM: Critical Appraisal of Epidemiological Studies and Clinical Trials. 1998, Oxford: Oxford University Press, 2

Sackett DL, Richardson WS, Rosenberg W, Haynes RB: Evidence Based Medicine. How to Practice and Teach EBM. 2000, London: Churchill Livingstone

Critical literature reviews. [ http://www.cotfcanada.org/cotf_critical.htm ]

Bialocerkowski AE, Grimmer KA, Milanese SF, Kumar S: Application of current research evidence to clinical physiotherapy practice. J Allied Health Res Dec.

The National Health Data Dictionary – Version 10. http://www.aihw.gov.au/publications/hwi/nhdd12/nhdd12-v1.pdf and http://www.aihw.gov.au/publications/hwi/nhdd12/nhdd12-v2.pdf

Grimmer K, Bowman P, Roper J: Episodes of allied health outpatient care: an investigation of service delivery in acute public hospital settings. Disability and Rehabilitation. 2000, 22 (1/2): 80-87.

CAS   PubMed   Google Scholar  

Grimmer K, Milanese S, Bialocerkowski A: Clinical guidelines for low back pain: A physiotherapy perspective. Physiotherapy Canada. 2003, 55 (4): 1-9.

Grimmer KA, Milanese S, Bialocerkowski AE, Kumar S: Producing and implementing evidence in clinical practice: the therapies' dilemma. Physiotherapy. 2004,

Greenhalgh T: How to read a paper: papers that summarize other papers (systematic reviews and meta-analysis). BMJ. 1997, 315: 672-675.

CAS   PubMed   PubMed Central   Google Scholar  

Auperin A, Pignon J, Poynard T: Review article: critical review of meta-analysis of randomised clinical trials in hepatogastroenterology. Alimentary Pharmacol Therapeutics. 1997, 11: 215-225. 10.1046/j.1365-2036.1997.131302000.x.

CAS   Google Scholar  

Barnes DE, Bero LA: Why review articles on the health effects of passive smoking reach different conclusions. J Am Med Assoc. 1998, 279: 1566-1570. 10.1001/jama.279.19.1566.

Beck CT: Use of meta-analysis as a teaching strategy in nursing research courses. J Nurs Educat. 1997, 36: 87-90.

Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL: Report of the Canadian Hypertension Society Consensus Conference: 1. Introduction. Can Med Assoc J. 1993, 149: 289-293.

Oxman AD, Guyatt GH, Singer J, Goldsmith CH, Hutchinson BG, Milner RA, Streiner DL: Agreement among reviewers of review articles. J Clin Epidemiol. 1991, 44: 91-98. 10.1016/0895-4356(91)90205-N.

Sacks HS, Reitman D, Pagano D, Kupelnick B: Meta-analysis: an update. Mount Sinai Journal of Medicine. 1996, 63: 216-224.

Smith AF: An analysis of review articles published in four anaesthesia journals. Can J Anaesth. 1997, 44: 405-409.

L'Abbe KA, Detsky AS, O'Rourke K: Meta-analysis in clinical research. Ann Intern Med. 1987, 107: 224-233.

PubMed   Google Scholar  

Mulrow CD, Antonio S: The medical review article: state of the science. Ann Intern Med. 1987, 106: 485-488.

Continuing Professional Development: A Manual for SIGN Guideline Developers. [ http://www.sign.ac.uk ]

Learning and Development Public Health Resources Unit. [ http://www.phru.nhs.uk/ ]

FOCUS Critical Appraisal Tool. [ http://www.focusproject.org.uk ]

Cook DJ, Sackett DL, Spitzer WO: Methodologic guidelines for systematic reviews of randomized control trials in health care from the Potsdam Consultation on meta-analysis. J Clin Epidemiol. 1995, 48: 167-171. 10.1016/0895-4356(94)00172-M.

Cranney A, Tugwell P, Shea B, Wells G: Implications of OMERACT outcomes in arthritis and osteoporosis for Cochrane metaanalysis. J Rheumatol. 1997, 24: 1206-1207.

Guyatt GH, Sackett DL, Sinclair JC, Hoyward R, Cook DJ, Cook RJ: User's guide to the medical literature. IX. A method for grading health care recommendations. J Am Med Assoc. 1995, 274: 1800-1804. 10.1001/jama.274.22.1800.

Gyorkos TW, Tannenbaum TN, Abrahamowicz M, Oxman AD, Scott EAF, Milson ME, Rasooli Iris, Frank JW, Riben PD, Mathias RG: An approach to the development of practice guidelines for community health interventions. Can J Public Health. 1994, 85: S8-13.

Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF: Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of reporting of meta-analyses. Lancet. 1999, 354: 1896-1900. 10.1016/S0140-6736(99)04149-5.

Oxman AD, Cook DJ, Guyatt GH: Users' guides to the medical literature. VI. How to use an overview. Evidence-Based Medicine Working Group. J Am Med Assoc. 1994, 272: 1367-1371. 10.1001/jama.272.17.1367.

Pogue J, Yusuf S: Overcoming the limitations of current meta-analysis of randomised controlled trials. Lancet. 1998, 351: 47-52. 10.1016/S0140-6736(97)08461-4.

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB: Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis of observational studies in epidemiology (MOOSE) group. J Am Med Assoc. 2000, 283: 2008-2012. 10.1001/jama.283.15.2008.

Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz G, Chalmers TC, Mostellar F: Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med. 1994, 120: 667-676.

Moseley AM, Herbert RD, Sherrington C, Maher CG: Evidence for physiotherapy practice: A survey of the Physiotherapy Evidence Database. Physiotherapy Evidence Database (PEDro). Australian Journal of Physiotherapy. 2002, 48: 43-50.

Cho MK, Bero LA: Instruments for assessing the quality of drug studies published in the medical literature. J Am Med Assoc. 1994, 272: 101-104. 10.1001/jama.272.2.101.

De Vet HCW, De Bie RA, Van der Heijden GJ, Verhagen AP, Sijpkes P, Kipschild PG: Systematic reviews on the basis of methodological criteria. Physiotherapy. 1997, 83: 284-289.

Downs SH, Black N: The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998, 52: 377-384.

Evans M, Pollock AV: A score system for evaluating random control clinical trials of prophylaxis of abdominal surgical wound infection. Br J Surg. 1985, 72: 256-260.

Fahey T, Hyde C, Milne R, Thorogood M: The type and quality of randomized controlled trials (RCTs) published in UK public health journals. J Public Health Med. 1995, 17: 469-474.

Gotzsche PC: Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Control Clin Trials. 1989, 10: 31-56. 10.1016/0197-2456(89)90017-2.

Imperiale TF, McCullough AJ: Do corticosteroids reduce mortality from alcoholic hepatitis? A meta-analysis of the randomized trials. Ann Int Med. 1990, 113: 299-307.

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ: Assessing the quality of reports of randomized clinical trials: is blinding necessary?. Control Clin Trials. 1996, 17: 1-12. 10.1016/0197-2456(95)00134-4.

Khan KS, Daya S, Collins JA, Walter SD: Empirical evidence of bias in infertility research: overestimation of treatment effect in crossover trials using pregnancy as the outcome measure. Fertil Steril. 1996, 65: 939-945.

Kleijnen J, Knipschild P, ter Riet G: Clinical trials of homoeopathy. BMJ. 1991, 302: 316-323.

Liberati A, Himel HN, Chalmers TC: A quality assessment of randomized control trials of primary treatment of breast cancer. J Clin Oncol. 1986, 4: 942-951.

Moher D, Schulz KF, Altman DG, for the CONSORT Group: The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. J Am Med Assoc. 2001, 285: 1987-1991. 10.1001/jama.285.15.1987.

Reisch JS, Tyson JE, Mize SG: Aid to the evaluation of therapeutic studies. Pediatrics. 1989, 84: 815-827.

Sindhu F, Carpenter L, Seers K: Development of a tool to rate the quality assessment of randomized controlled trials using a Delphi technique. J Advanced Nurs. 1997, 25: 1262-1268. 10.1046/j.1365-2648.1997.19970251262.x.

Van der Heijden GJ, Van der Windt DA, Kleijnen J, Koes BW, Bouter LM: Steroid injections for shoulder disorders: a systematic review of randomized clinical trials. Br J Gen Pract. 1996, 46: 309-316.

Van Tulder MW, Koes BW, Bouter LM: Conservative treatment of acute and chronic nonspecific low back pain. A systematic review of randomized controlled trials of the most common interventions. Spine. 1997, 22: 2128-2156. 10.1097/00007632-199709150-00012.

Garbutt JC, West SL, Carey TS, Lohr KN, Crews FT: Pharmacotherapy for Alcohol Dependence. Evidence Report/Technology Assessment No. 3, AHCPR Publication No. 99-E004. Rockville. 1999

Oremus M, Wolfson C, Perrault A, Demers L, Momoli F, Moride Y: Interarter reliability of the modified Jadad quality scale for systematic reviews of Alzheimer's disease drug trials. Dement Geriatr Cognit Disord. 2001, 12: 232-236. 10.1159/000051263.

Clark O, Castro AA, Filho JV, Djubelgovic B: Interrater agreement of Jadad's scale. Annual Cochrane Colloqium Abstracts. 2001, [ http://www.biomedcentral.com/abstracts/COCHRANE/1/op031 ]October Lyon

Jonas W, Anderson RL, Crawford CC, Lyons JS: A systematic review of the quality of homeopathic clinical trials. BMC Alternative Medicine. 2001, 1: 12-10.1186/1472-6882-1-12.

Van Tulder M, Malmivaara A, Esmail R, Koes B: Exercises therapy for low back pain: a systematic review within the framework of the Cochrane Collaboration back review group. Spine. 2000, 25: 2784-2796. 10.1097/00007632-200011010-00011.

Van Tulder MW, Ostelo R, Vlaeyen JWS, Linton SJ, Morley SJ, Assendelft WJJ: Behavioral treatment for chronic low back pain: a systematic review within the framework of the cochrane back. Spine. 2000, 25: 2688-2699. 10.1097/00007632-200010150-00024.

Aronson N, Seidenfeld J, Samson DJ, Aronson N, Albertson PC, Bayoumi AM, Bennett C, Brown A, Garber ABA, Gere M, Hasselblad V, Wilt T, Ziegler MPHK, Pharm D: Relative Effectiveness and Cost Effectiveness of Methods of Androgen Suppression in the Treatment of Advanced Prostate Cancer. Evidence Report/Technology Assessment No. 4, AHCPR Publication No.99-E0012. Rockville. 1999

Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D, Ambroz A: A method for assessing the quality of a randomized control trial. Control Clin Trials. 1981, 2: 31-49. 10.1016/0197-2456(81)90056-8.

der Simonian R, Charette LJ, McPeek B, Mosteller F: Reporting on methods in clinical trials. New Eng J Med. 1982, 306: 1332-1337.

Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbe KA: Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol. 1992, 45: 255-265. 10.1016/0895-4356(92)90085-2.

Goudas L, Carr DB, Bloch R, Balk E, Ioannidis JPA, Terrin MN: Management of Cancer Pain. Evidence Report/Technology Assessment No. 35 (Contract 290-97-0019 to the New England Medical Center), AHCPR Publication No. 99-E004. Rockville. 2000

Guyatt GH, Sackett DL, Cook DJ: Users' guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. J Am Med Assoc. 1993, 270: 2598-2601. 10.1001/jama.270.21.2598.

Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J: Undertaking Systematic Reviews of Research on Effectiveness: Centre of Reviews and Dissemination's Guidance for Carrying Out or Commissioning Reviews: York. 2000

McNamara R, Bass EB, Marlene R, Miller J: Management of New Onset Atrial Fibrillation. Evidence Report/Technology Assessment No.12, AHRQ Publication No. 01-E026. Rockville. 2001

Prendiville W, Elbourne D, Chalmers I: The effects of routine oxytocic administration in the management of the third stage of labour: an overview of the evidence from controlled trials. Br J Obstet Gynae Col. 1988, 95: 3-16.

Schulz KF, Chalmers I, Hayes RJ, Altman DG: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. J Am Med Assoc. 1995, 273: 408-412. 10.1001/jama.273.5.408.

The Standards of Reporting Trials Group: A proposal for structured reporting of randomized controlled trials. J Am Med Assoc. 1994, 272: 1926-1931. 10.1001/jama.272.24.1926.

Verhagen AP, de Vet HC, de Bie RA, Kessels AGH, Boers M, Bouter LM, Knipschild PG: The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol. 1998, 51: 1235-1241. 10.1016/S0895-4356(98)00131-0.

Zaza S, Wright-De Aguero LK, Briss PA, Truman BI, Hopkins DP, Hennessy MH, Sosin DM, Anderson L, Carande-Kullis VG, Teutsch SM, Pappaioanou M: Data collection instrument and procedure for systematic reviews in the guide to community preventive services. Task force on community preventive services. Am J Prevent Med. 2000, 18: 44-74. 10.1016/S0749-3797(99)00122-1.

Haynes BB, Wilczynski N, McKibbon A, Walker CJ, Sinclair J: Developing optimal search strategies for detecting clinically sound studies in MEDLINE. J Am Informatics Assoc. 1994, 1: 447-458.

Greenhalgh T: How to read a paper: papers that report diagnostic or screening tests. BMJ. 1997, 315: 540-543.

Arroll B, Schechter MT, Sheps SB: The assessment of diagnostic tests: a comparison of medical literature in 1982 and 1985. J Gen Int Med. 1988, 3: 443-447.

Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM: Empirical evidence of design-related bias in studies of diagnostic tests. J Am Med Assoc. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.

Sheps SB, Schechter MT: The assessment of diagnostic tests. A survey of current medical research. J Am Med Assoc. 1984, 252: 2418-2422. 10.1001/jama.252.17.2418.

McCrory DC, Matchar DB, Bastian L, Dutta S, Hasselblad V, Hickey J, Myers MSE, Nanda K: Evaluation of Cervical Cytology. Evidence Report/Technology Assessment No. 5, AHCPR Publication No.99-E010. Rockville. 1999

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, DeVet HCW: Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem. 2003, 49: 1-6. 10.1373/49.1.1.

Greenhalgh T: How to Read a Paper: Assessing the methodological quality of published papers. BMJ. 1997, 315: 305-308.

Angelillo I, Villari P: Residential exposure to electromagnetic fields and childhood leukaemia: a meta-analysis. Bull World Health Org. 1999, 77: 906-915.

Ariens G, Mechelen W, Bongers P, Bouter L, Van der Wal G: Physical risk factors for neck pain. Scand J Work Environ Health. 2000, 26: 7-19.

Hoogendoorn WE, van Poppel MN, Bongers PM, Koes BW, Bouter LM: Physical load during work and leisure time as risk factors for back pain. Scand J Work Environ Health. 1999, 25: 387-403.

Laupacis A, Wells G, Richardson WS, Tugwell P: Users' guides to the medical literature. V. How to use an article about prognosis. Evidence-Based Medicine Working Group. J Am Med Assoc. 1994, 272: 234-237. 10.1001/jama.272.3.234.

Levine M, Walter S, Lee H, Haines T, Holbrook A, Moyer V: Users' guides to the medical literature. IV. How to use an article about harm. Evidence-Based Medicine Working Group. J Am Med Assoc. 1994, 271: 1615-1619. 10.1001/jama.271.20.1615.

Carey TS, Boden SD: A critical guide to case series reports. Spine. 2003, 28: 1631-1634. 10.1097/00007632-200308010-00001.

Greenhalgh T, Taylor R: How to read a paper: papers that go beyond numbers (qualitative research). BMJ. 1997, 315: 740-743.

Hoddinott P, Pill R: A review of recently published qualitative research in general practice. More methodological questions than answers?. Fam Pract. 1997, 14: 313-319. 10.1093/fampra/14.4.313.

Mays N, Pope C: Quality research in health care: Assessing quality in qualitative research. BMJ. 2000, 320: 50-52. 10.1136/bmj.320.7226.50.

Mays N, Pope C: Rigour and qualitative research. BMJ. 1995, 311: 109-112.

Colditz GA, Miller JN, Mosteller F: How study design affects outcomes in comparisons of therapy. I: Medical. Stats Med. 1989, 8: 441-454.

Turlik MA, Kushner D: Levels of evidence of articles in podiatric medical journals. J Am Pod Med Assoc. 2000, 90: 300-302.

Borghouts JAJ, Koes BW, Bouter LM: The clinical course and prognostic factors of non-specific neck pain: a systematic review. Pain. 1998, 77: 1-13. 10.1016/S0304-3959(98)00058-X.

Spitzer WO, Lawrence V, Dales R, Hill G, Archer MC, Clark P, Abenhaim L, Hardy J, Sampalis J, Pinfold SP, Morgan PP: Links between passive smoking and disease: a best-evidence synthesis. A report of the working group on passive smoking. Clin Invest Med. 1990, 13: 17-46.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F: Systematic reviews of trials and other studies. Health Tech Assess. 1998, 2: 1-276.

Chestnut RM, Carney N, Maynard H, Patterson P, Mann NC, Helfand M: Rehabilitation for Traumatic Brain Injury. Evidence Report/Technology Assessment No. 2, Agency for Health Care Research and Quality Publication No. 99-E006. Rockville. 1999

Lohr KN, Carey TS: Assessing best evidence: issues in grading the quality of studies for systematic reviews. Joint Commission J Qual Improvement. 1999, 25: 470-479.

Greer N, Mosser G, Logan G, Halaas GW: A practical approach to evidence grading. Joint Commission J Qual Improvement. 2000, 26: 700-712.

Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D: Current methods of the U.S. Preventive Services Task Force: a review of the process. Am J Prevent Med. 2001, 20: 21-35. 10.1016/S0749-3797(01)00261-6.

Anonymous: How to read clinical journals: IV. To determine etiology or causation. Can Med Assoc J. 1981, 124: 985-990.

Whitten PS, Mair FS, Haycox A, May CR, Williams TL, Hellmich S: Systematic review of cost effectiveness studies of telemedicine interventions. BMJ. 2002, 324: 1434-1437. 10.1136/bmj.324.7351.1434.

PubMed   PubMed Central   Google Scholar  

Forrest JL, Miller SA: Evidence-based decision making in action: Part 2-evaluating and applying the clinical evidence. J Contemp Dental Pract. 2002, 4: 42-52.

Oxman AD, Guyatt GH: Validation of an index of the quality of review articles. J Clin Epidemiol. 1991, 44: 1271-1278. 10.1016/0895-4356(91)90160-B.

Jones T, Evans D: Conducting a systematic review. Aust Crit Care. 2000, 13: 66-71.

Papadopoulos M, Rheeder P: How to do a systematic literature review. South African J Physiother. 2000, 56: 3-6.

Selker LG: Clinical research in Allied Health. J Allied Health. 1994, 23: 201-228.

Stevens KR: Systematic reviews: the heart of evidence-based practice. AACN Clin Issues. 2001, 12: 529-538.

Devers KJ, Frankel RM: Getting qualitative research published. Ed Health. 2001, 14: 109-117. 10.1080/13576280010021888.

Canadian Journal of Public Health: Review guidelines for qualitative research papers submitted for consideration to the Canadian Journal of Public Health. Can J Pub Health. 2000, 91: I2-

Malterud K: Shared understanding of the qualitative research process: guidelines for the medical researcher. Fam Pract. 1993, 10: 201-206.

Higgs J, Titchen A: Research and knowledge. Physiotherapy. 1998, 84: 72-80.

Maggs-Rapport F: Best research practice: in pursuit of methodological rigour. J Advan Nurs. 2001, 35: 373-383. 10.1046/j.1365-2648.2001.01853.x.

Cutcliffe JR, McKenna HP: Establishing the credibility of qualitative research findings: the plot thickens. J Advan Nurs. 1999, 30: 374-380. 10.1046/j.1365-2648.1999.01090.x.

Andresen EM: Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehab. 2000, 81: S15-S20. 10.1053/apmr.2000.20619.

Beatie P: Measurement of health outcomes in the clinical setting: applications to physiotherapy. Phys Theory Pract. 2001, 17: 173-185. 10.1080/095939801317077632.

Charnock DF, (Ed): The DISCERN Handbook: Quality criteria for consumer health information on treatment choices. 1998, Radcliffe Medical Press

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/4/22/prepub

Download references

Author information

Authors and affiliations.

Centre for Allied Health Evidence: A Collaborating Centre of the Joanna Briggs Institute, City East Campus, University of South Australia, North Terrace, Adelaide, 5000, Australia

Persis Katrak, Nicola Massy-Westropp, VS Saravana Kumar & Karen A Grimmer

School of Physiotherapy, The University of Melbourne, Melbourne, 3010, Australia

Andrea E Bialocerkowski

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Karen A Grimmer .

Additional information

Competing interests.

No competing interests.

Authors' contributions

PK Sourced critical appraisal tools

Categorized the content and psychometric properties of critical appraisal tools

AEB Synthesis of findings

Drafted manuscript

NMW Sourced critical appraisal tools

VSK Sourced critical appraisal tools

KAG Study conception and design

Assisted with critiquing critical appraisal tools and categorization of the content and psychometric properties of critical appraisal tools

Drafted and reviewed manuscript

Addressed reviewer's comments and re-submitted the article

Electronic supplementary material

Additional file 1: search strategy. (doc 30 kb), authors’ original submitted files for images.

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2, authors’ original file for figure 3, authors’ original file for figure 4, authors’ original file for figure 5, rights and permissions.

Reprints and permissions

About this article

Cite this article.

Katrak, P., Bialocerkowski, A.E., Massy-Westropp, N. et al. A systematic review of the content of critical appraisal tools. BMC Med Res Methodol 4 , 22 (2004). https://doi.org/10.1186/1471-2288-4-22

Download citation

Received : 10 May 2004

Accepted : 16 September 2004

Published : 16 September 2004

DOI : https://doi.org/10.1186/1471-2288-4-22

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Research Consumer
  • Empirical Basis
  • Ally Health Literature
  • Critical Appraisal Tool
  • Sample Size Justification

BMC Medical Research Methodology

ISSN: 1471-2288

critical appraisal tools literature reviews

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 08 April 2022

How to appraise the literature: basic principles for the busy clinician - part 1: randomised controlled trials

  • Aslam Alkadhimi 1 ,
  • Samuel Reeves 2 &
  • Andrew T. DiBiase 3  

British Dental Journal volume  232 ,  pages 475–481 ( 2022 ) Cite this article

815 Accesses

1 Citations

2 Altmetric

Metrics details

Critical appraisal is the process of carefully, judiciously and systematically examining research to adjudicate its trustworthiness and its value and relevance in clinical practice. The first part of this two-part series will discuss the principles of critically appraising randomised controlled trials. The second part will discuss the principles of critically appraising systematic reviews and meta-analyses.

Evidence-based dentistry (EBD) is the integration of the dentist's clinical expertise, the patient's needs and preferences and the most current, clinically relevant evidence. Critical appraisal of the literature is an invaluable and indispensable skill that dentists should possess to help them deliver EBD.

This article seeks to act as a refresher and guide for generalists, specialists and the wider readership, so that they can efficiently and confidently appraise research - specifically, randomised controlled trials - that may be pertinent to their daily clinical practice.

Evidence-based dentistry is discussed.

Efficient techniques for critically appraising randomised controlled trials are described.

Important methodological and statistical considerations are explicated.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 24 print issues and online access

251,40 € per year

only 10,48 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

critical appraisal tools literature reviews

Similar content being viewed by others

critical appraisal tools literature reviews

How to appraise the literature: basic principles for the busy clinician - part 2: systematic reviews and meta-analyses

Aslam Alkadhimi, Samuel Reeves & Andrew T. DiBiase

critical appraisal tools literature reviews

Making sense of the literature: an introduction to critical appraisal for the primary care practitioner

Kishan Patel & Meera Pajpani

critical appraisal tools literature reviews

Critical appraisal of systematic reviews of intervention in dentistry published between 2019-2020 using the AMSTAR 2 tool

Patrícia Pauletto, Helena Polmann, … Graziela De Luca Canto

Burls A. What is critical appraisal? 2014. Available at http://www.whatisseries.co.uk/whatiscritical-appraisal/ (accessed April 2021).

Hong B, Plugge E. Critical appraisal skills teaching in UK dental schools. Br Dent J 2017; 222: 209-213.

Isham A, Bettiol S, Hoang H, Crocombe L. A Systematic Literature Review of the Information-Seeking Behaviour of Dentists in Developed Countries. J Dent Educ 2016; 80: 569-577.

Critical Appraisal Skills Programme. CASP Checklist. Available at https://casp-uk.net/wp-content/uploads/2018/01/CASP-Randomised-Controlled-Trial-Checklist-2018.pdf (accessed April 2021).

Schulz K F, Altman D G, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Int Med 2010; 152 : 726-732.

Sterne J A C, Savović J, Page M J et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ 2019; DOI: 10.1136/bmj.l4898.

Petrou S, Grey A. Economic evaluation alongside randomised controlled trials: design, conduct, analysis, and reporting. BMJ 2011; DOI: 10.1136/bmj.d1548.

Black W C. The CE plane: a graphic representation of cost-effectiveness. Med Decis Making 1990; 10: 212-214.

Download references

Author information

Authors and affiliations.

Senior Registrar in Orthodontics, The Royal London Hospital Barts Health NHS Trust and East Kent Hospitals University NHS Foundation Trust, London, UK

Aslam Alkadhimi

Dental Core Trainee, East Kent Hospitals University NHS Foundation Trust, UK

Samuel Reeves

Consultant Orthodontist, East Kent Hospitals University NHS Foundation Trust, UK

Andrew T. DiBiase

You can also search for this author in PubMed   Google Scholar

Contributions

Aslam Alkadhimi contributed to conceptualisation, literature search, original draft preparation and drafting and critically revising the manuscript; Samuel Reeves contributed to original draft preparation and editing; and Andrew DiBiase contributed to supervision, draft editing and critically revising the manuscript.

Corresponding author

Correspondence to Aslam Alkadhimi .

Ethics declarations

The authors declare no competing interests.

Ethical approval and consent to participate did not apply to this study.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Alkadhimi, A., Reeves, S. & DiBiase, A. How to appraise the literature: basic principles for the busy clinician - part 1: randomised controlled trials. Br Dent J 232 , 475–481 (2022). https://doi.org/10.1038/s41415-022-4096-y

Download citation

Received : 31 January 2021

Accepted : 25 April 2021

Published : 08 April 2022

Issue Date : 08 April 2022

DOI : https://doi.org/10.1038/s41415-022-4096-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

critical appraisal tools literature reviews

CASP Checklists

How to use our CASP Checklists

Referencing and Creative Commons

  • Online Training Courses
  • CASP Workshops
  • What is Critical Appraisal
  • Study Designs
  • Useful Links
  • Bibliography
  • View all Tools and Resources
  • Testimonials

Critical Appraisal Checklists

We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types.

The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our CASP checklists .

CASP Randomised Controlled Trial Checklist

  • Print & Fill

CASP Systematic Review Checklist

CASP Qualitative Studies Checklist

CASP Cohort Study Checklist

CASP Diagnostic Study Checklist

CASP Case Control Study Checklist

CASP Economic Evaluation Checklist

CASP Clinical Prediction Rule Checklist

Checklist Archive

  • CASP Randomised Controlled Trial Checklist 2018 fillable form
  • CASP Randomised Controlled Trial Checklist 2018

CASP Checklist

Need more information?

  • Online Learning
  • Privacy Policy

critical appraisal tools literature reviews

Critical Appraisal Skills Programme

Critical Appraisal Skills Programme (CASP) will use the information you provide on this form to be in touch with you and to provide updates and marketing. Please let us know all the ways you would like to hear from us:

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Copyright 2024 CASP UK - OAP Ltd. All rights reserved Website by Beyond Your Brand

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 24, Issue 2
  • Five tips for developing useful literature summary tables for writing review articles
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 1 , 2 ,
  • http://orcid.org/0000-0002-7839-8130 Parveen Ali 3 , 4
  • 1 Memorial University of Newfoundland , St John's , Newfoundland , Canada
  • 2 Swat College of Nursing , Pakistan
  • 3 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
  • 4 Sheffield University Interpersonal Violence Research Group , Sheffield University , Sheffield , UK
  • Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St John's, NL A1C 5C4, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103417

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research. 1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis in reviews, the use of literature summary tables is of utmost importance. A literature summary table provides a synopsis of an included article. It succinctly presents its purpose, methods, findings and other relevant information pertinent to the review. The aim of developing these literature summary tables is to provide the reader with the information at one glance. Since there are multiple types of reviews (eg, systematic, integrative, scoping, critical and mixed methods) with distinct purposes and techniques, 2 there could be various approaches for developing literature summary tables making it a complex task specialty for the novice researchers or reviewers. Here, we offer five tips for authors of the review articles, relevant to all types of reviews, for creating useful and relevant literature summary tables. We also provide examples from our published reviews to illustrate how useful literature summary tables can be developed and what sort of information should be provided.

Tip 1: provide detailed information about frameworks and methods

  • Download figure
  • Open in new tab
  • Download powerpoint

Tabular literature summaries from a scoping review. Source: Rasheed et al . 3

The provision of information about conceptual and theoretical frameworks and methods is useful for several reasons. First, in quantitative (reviews synthesising the results of quantitative studies) and mixed reviews (reviews synthesising the results of both qualitative and quantitative studies to address a mixed review question), it allows the readers to assess the congruence of the core findings and methods with the adapted framework and tested assumptions. In qualitative reviews (reviews synthesising results of qualitative studies), this information is beneficial for readers to recognise the underlying philosophical and paradigmatic stance of the authors of the included articles. For example, imagine the authors of an article, included in a review, used phenomenological inquiry for their research. In that case, the review authors and the readers of the review need to know what kind of (transcendental or hermeneutic) philosophical stance guided the inquiry. Review authors should, therefore, include the philosophical stance in their literature summary for the particular article. Second, information about frameworks and methods enables review authors and readers to judge the quality of the research, which allows for discerning the strengths and limitations of the article. For example, if authors of an included article intended to develop a new scale and test its psychometric properties. To achieve this aim, they used a convenience sample of 150 participants and performed exploratory (EFA) and confirmatory factor analysis (CFA) on the same sample. Such an approach would indicate a flawed methodology because EFA and CFA should not be conducted on the same sample. The review authors must include this information in their summary table. Omitting this information from a summary could lead to the inclusion of a flawed article in the review, thereby jeopardising the review’s rigour.

Tip 2: include strengths and limitations for each article

Critical appraisal of individual articles included in a review is crucial for increasing the rigour of the review. Despite using various templates for critical appraisal, authors often do not provide detailed information about each reviewed article’s strengths and limitations. Merely noting the quality score based on standardised critical appraisal templates is not adequate because the readers should be able to identify the reasons for assigning a weak or moderate rating. Many recent critical appraisal checklists (eg, Mixed Methods Appraisal Tool) discourage review authors from assigning a quality score and recommend noting the main strengths and limitations of included studies. It is also vital that methodological and conceptual limitations and strengths of the articles included in the review are provided because not all review articles include empirical research papers. Rather some review synthesises the theoretical aspects of articles. Providing information about conceptual limitations is also important for readers to judge the quality of foundations of the research. For example, if you included a mixed-methods study in the review, reporting the methodological and conceptual limitations about ‘integration’ is critical for evaluating the study’s strength. Suppose the authors only collected qualitative and quantitative data and did not state the intent and timing of integration. In that case, the strength of the study is weak. Integration only occurred at the levels of data collection. However, integration may not have occurred at the analysis, interpretation and reporting levels.

Tip 3: write conceptual contribution of each reviewed article

While reading and evaluating review papers, we have observed that many review authors only provide core results of the article included in a review and do not explain the conceptual contribution offered by the included article. We refer to conceptual contribution as a description of how the article’s key results contribute towards the development of potential codes, themes or subthemes, or emerging patterns that are reported as the review findings. For example, the authors of a review article noted that one of the research articles included in their review demonstrated the usefulness of case studies and reflective logs as strategies for fostering compassion in nursing students. The conceptual contribution of this research article could be that experiential learning is one way to teach compassion to nursing students, as supported by case studies and reflective logs. This conceptual contribution of the article should be mentioned in the literature summary table. Delineating each reviewed article’s conceptual contribution is particularly beneficial in qualitative reviews, mixed-methods reviews, and critical reviews that often focus on developing models and describing or explaining various phenomena. Figure 2 offers an example of a literature summary table. 4

Tabular literature summaries from a critical review. Source: Younas and Maddigan. 4

Tip 4: compose potential themes from each article during summary writing

While developing literature summary tables, many authors use themes or subthemes reported in the given articles as the key results of their own review. Such an approach prevents the review authors from understanding the article’s conceptual contribution, developing rigorous synthesis and drawing reasonable interpretations of results from an individual article. Ultimately, it affects the generation of novel review findings. For example, one of the articles about women’s healthcare-seeking behaviours in developing countries reported a theme ‘social-cultural determinants of health as precursors of delays’. Instead of using this theme as one of the review findings, the reviewers should read and interpret beyond the given description in an article, compare and contrast themes, findings from one article with findings and themes from another article to find similarities and differences and to understand and explain bigger picture for their readers. Therefore, while developing literature summary tables, think twice before using the predeveloped themes. Including your themes in the summary tables (see figure 1 ) demonstrates to the readers that a robust method of data extraction and synthesis has been followed.

Tip 5: create your personalised template for literature summaries

Often templates are available for data extraction and development of literature summary tables. The available templates may be in the form of a table, chart or a structured framework that extracts some essential information about every article. The commonly used information may include authors, purpose, methods, key results and quality scores. While extracting all relevant information is important, such templates should be tailored to meet the needs of the individuals’ review. For example, for a review about the effectiveness of healthcare interventions, a literature summary table must include information about the intervention, its type, content timing, duration, setting, effectiveness, negative consequences, and receivers and implementers’ experiences of its usage. Similarly, literature summary tables for articles included in a meta-synthesis must include information about the participants’ characteristics, research context and conceptual contribution of each reviewed article so as to help the reader make an informed decision about the usefulness or lack of usefulness of the individual article in the review and the whole review.

In conclusion, narrative or systematic reviews are almost always conducted as a part of any educational project (thesis or dissertation) or academic or clinical research. Literature reviews are the foundation of research on a given topic. Robust and high-quality reviews play an instrumental role in guiding research, practice and policymaking. However, the quality of reviews is also contingent on rigorous data extraction and synthesis, which require developing literature summaries. We have outlined five tips that could enhance the quality of the data extraction and synthesis process by developing useful literature summaries.

  • Aromataris E ,
  • Rasheed SP ,

Twitter @Ahtisham04, @parveenazamali

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient consent for publication Not required.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

  • Open access
  • Published: 17 November 2023

Reversing frailty in older adults: a scoping review

  • Aurélie Tonjock Kolle 1 ,
  • Krystina B. Lewis 1 , 2 , 3 ,
  • Michelle Lalonde 1 , 4 &
  • Chantal Backman 1 , 2 , 5  

BMC Geriatrics volume  23 , Article number:  751 ( 2023 ) Cite this article

2324 Accesses

1 Citations

5 Altmetric

Metrics details

Individuals 65 years or older are presumably more susceptible to becoming frail, which increases their risk of multiple adverse health outcomes. Reversing frailty has received recent attention; however, little is understood about what it means and how to achieve it. Thus, the purpose of this scoping review is to synthesize the evidence regarding the impact of frail-related interventions on older adults living with frailty, identify what interventions resulted in frailty reversal and clarify the concept of reverse frailty.

We followed Arksey and O’Malley’s five-stage scoping review approach and conducted searches in CINAHL, EMBASE, PubMed, and Web of Science. We hand-searched the reference list of included studies and conducted a grey literature search. Two independent reviewers completed the title, abstract screenings, and full-text review using the eligibility criteria, and independently extracted approximately 10% of the studies. We critically appraised studies using Joanna Briggs critical appraisal checklist/tool, and we used a descriptive and narrative method to synthesize and analyze data.

Of 7499 articles, thirty met the criteria and three studies were identified in the references of included studies. Seventeen studies (56.7%) framed frailty as a reversible condition, with 11 studies (36.7%) selecting it as their primary outcome. Reversing frailty varied from either frail to pre-frail, frail to non-frail, and severe to mild frailty. We identified different types of single and multi-component interventions each targeting various domains of frailty. The physical domain was most frequently targeted (n = 32, 97%). Interventions also varied in their frequencies of delivery, intensities, and durations, and targeted participants from different settings, most commonly from community dwellings (n = 23; 69.7%).

Some studies indicated that it is possible to reverse frailty. However, this depended on how the researchers assessed or measured frailty. The current understanding of reverse frailty is a shift from a frail or severely frail state to at least a pre-frail or mildly frail state. To gain further insight into reversing frailty, we recommend a concept analysis. Furthermore, we recommend more primary studies considering the participant’s lived experiences to guide intervention delivery.

Peer Review reports

Within the next few decades, the population of people aged 65 and over will continue to rise more than all other age groups, with roughly one in six people over 65 by 2050, compared to one in eleven in 2019 [ 1 ]. Individuals over 65 years are presumably at greater risk of becoming frail [ 2 , 3 , 4 ]. Theoretically, frailty is considered a clinically recognized state of vulnerability that results from an age-related decline in reserve and function, compromising an individual’s ability to cope with the daily challenges of life [ 5 , 6 ]. The Frailty Phenotype (FP), which is the most dominant conceptual model in literature [ 3 , 7 , 8 , 9 , 10 ], considers an individual frail by the presence of at least three of five phenotypes: weakness, low levels of physical activity, unintentional weight loss, slow walking speed, and exhaustion. Physical, cognitive, psychological, and social impairments often characterize the different domains of frailty [ 11 ]. The physical domain is devoted to FP-related conditions [ 12 ], the cognitive domain is the co-existence of physical deficits and mild cognitive impairments [ 13 ], the psychological domain focuses on an individual’s coping mechanisms based on their own experiences [ 14 ], and the social domain looks at a person’s limited participation in social activities and limitations in social support [ 15 ]. Frail older adults are prone to adverse outcomes such as frequent falls, hospitalizations, disabilities, loneliness, cognitive decline, depression, poor quality of life, and even death [ 16 , 17 , 18 ]. In response, researchers have proposed various interventions to prevent or slow frailty progression by either targeting a single domain (e.g., physical, social, cognitive, etc.) using single component interventions or targeting two or more domains using multi-component interventions.

For example, Hergott and colleagues investigated the effects of a single-component intervention, functional exercise, on acromegaly-induced frailty [ 19 ]. Abizanda and colleagues examined the effects of a multi-component intervention, composed of nutrition and physical activity, on frail older people’s physical function and quality of life [ 20 ]. Some studies indicate that certain single or multi-component interventions can either reduce frailty, slow its progression, and possibly reverse it [ 3 , 21 , 22 ]. The current understanding of reverse frailty lacks clarity, and the characteristics of interventions related to frailty reversal have not yet been examined in a systematic manner.

Authors have determined the reversal of frailty using various measures. For instance, Kim and colleagues’ study evaluating an intervention composed of exercise and nutritional supplementation in frail elderly community-dwellers demonstrated reversals in FP components [ 23 ]. Components included fatigue, low physical activity, and slow walking, an improvement from the presence of 5 components of frailty (according to the FP) to 2, considered a pre-frail state [ 23 ]. Conversely, De Souto and colleagues demonstrated frailty reversal based on changes in frailty index (FI) scores, a measure of accumulation of deficits [ 24 ]. A FI score of 0.22 or greater indicates frailty, score less than or equal to 0.10 indicates a non-frail state [ 25 , 26 , 27 , 28 , 29 ]. Hergott et al. (2020) used frailty severity to indicate frailty reversal. Participants in their study reversed frailty from a severe state to a mild state [ 19 ]. These studies demonstrate the variability in how reversing frailty is measured and understood. For a more comprehensive understanding of reverse frailty and the characteristics of interventions associated with it, a comprehensive review of the literature on this topic is needed. Therefore, through a scoping review, the aim of this study is to provide an overview and synthesis of interventions that have been implemented for frail older adults, to determine whether some interventions have had an impact on reversing frailty.

This methodology is ideal because it encompasses a broad scope and can comprehensively analyze and synthesize data on a subject [ 30 ]. Findings from this review will synthesize the evidence regarding the impact of frail-related interventions on older adults living with frailty, identify what interventions resulted in frailty reversal and clarify the concept of reverse frailty.

Guiding conceptual framework

The deficit accumulation model framework, unlike the FP, considers frailty as more than a physical deficit but rather an accumulation of health-related deficits across multiple domains [ 31 ]. For this reason, the deficit accumulation model framework serves as our guiding conceptual framework. Through this framework, we recognize frailty as a complex phenomenon, strengthening the case for interventions addressing other health and personal concerns, such as illness, environmental disturbance, social dysfunction, cognitive decline, and psychosocial distress. This framework provides a helpful lens through which we can examine the number of domains addressed in the reported interventions and their relationship to one another.

We followed Arksey and O’Malley’s [ 30 ] five-stage approach, elaborated by Levac et al., [ 32 ] and Joanna Briggs Institute (JBI) for scoping review [ 33 ]. They propose six stages: (1) identifying the research question, (2) locating relevant studies, (3) selecting the study, (4) charting data, (5) summarizing results, and (6) consulting with stakeholders. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyzes Extension for Scoping Reviews (PRISMA-ScR) checklist [ 34 ] to guide study reporting. Refer to Additional file 1 .

Stage one: identifying the research question

According to Levac and colleagues, fundamental research questions should be broad enough to enable comprehensive analysis and appropriate mapping of relevant literature [ 32 ]. Following this, our three research questions are as follows:

What is the available literature on the impact of interventions for frail older adults?

Did any of these interventions result in frailty reversal?

What does it mean to reverse frailty?

Stage two: identifying relevant studies

Using the research questions as a guide, we engaged in an iterative process that involved searching the literature, identifying search terms, developing, and refining search strategies, to identify appropriate studies. We also sought the assistance of an experienced librarian who gave guidance on the use of various electronic databases, provided validation on the appropriateness of the methodology for this study, and conducted a peer-review of the search strategies. An overview of each step is provided below.

Eligibility criteria

JBI’s PCC mnemonic guided eligibility criteria, where P (population): frail older people over 65yrs of age, C (concept): frailty outcome, and C (context): all contexts. We included French and English studies of frail older adults over 65 years because most studies focused on frailty target this age group [ 35 , 36 , 37 , 38 ]. All types of interventions for frail older adults were included, except for interventions intended to prevent frailty. We did not apply any limitations to study dates, and settings. All study designs (quantitative, qualitative, and mixed methods) were considered for inclusion. We excluded conference abstracts, theses, dissertations, and knowledge syntheses, but did refer to their reference list for potential studies. Lastly, we performed a grey literature scan to identify relevant primary studies to ensure a comprehensive literature search.

Search terms

An a priori concept analysis [ 39 ] of frailty and frailty interventions revealed relevant search terms regarding the population of interest which included ‘frail elderly, frail, aged hospital patient, institutionalized elderly, very elderly, geriatrics, senior, and aged’. These keywords were presented to and approved by an academic librarian (VL). To capture a comprehensive list of studies that may be relevant, we looked at all types of interventions on frail older adults aimed at either reducing, improving, managing, enhancing, treating, or reversing frailty. Medical Subject Headings (MeSH) and boolean operators of these terms were used in different databases to identify relevant studies.

Search strategy

Two academic librarians (VL & VC) guided the development of the search strategy and selected databases. We conducted the searches between August 6th and August 9th, 2021, using MEDLINE (OVID interface), Embase (OVID interface), Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Web of Science. We first implemented the search in MEDLINE (Fig.  1 ), which we later adapted for the other three databases. We manually searched for relevant studies from the reference lists of included/eligible articles and reviewed conference abstracts and secondary analyzes to identify primary studies. A third academic librarian (LS) peer-reviewed the search strategy using the Peer Review of Electronic Search Strategies (PRESS) guidelines [ 40 ] on August 19th, 2021, without modification. On August 23rd, 2021, we imported the results in RIS format into Covidence, a web-based system for systematic reviews provided by Cochrane [ 41 , 42 ], which also removed duplicates. We did not import the articles identified via hand-searching the reference list into Covidence for screening. However, two reviewers independently assessed the articles’ eligibility according to our eligibility criteria.

figure 1

Ovid MEDLINE search strategy

Stage three: study selection

There were two reviewers (AK, OB) involved in this stage, which involved a first and second screening level. The first level included an independent screening of the titles and abstracts, and we decided by selecting ‘yes’, ‘no’, or ‘maybe’. To qualify for full-text screening, a study must receive two ‘yes’ or two ‘maybe’ votes. Two ‘no’ votes moved the study to exclude, and one ‘no’ vote along with one ‘yes’ or ‘maybe’ vote moved it to conflicts, pending resolution. After consultation with the second reviewer, the first author (AK) and second reviewer (OB) resolved the conflicts together. Following this first-level screen, the second level involved a full-text review of all studies included at the title-abstract level. Using the same principles as the first level screening, the first author (AK) and another reviewer (MA) completed this stage [ 41 , 42 ]. In cases where full-text articles could not be located or had to be purchased, the corresponding authors were contacted once by email to request copies. We excluded the articles if we did not receive a response after two weeks. We also searched Google Scholar for conference abstracts to see if the full text of the papers had been published and accessible. For most searches, this process was ineffective, leading to the exclusion of all conference abstracts. Articles excluded with reasons can be found in Additional file 2 .

Stage Four: charting the data

To extract essential information from the articles, we developed a standard Microsoft Excel form a priori. We used the Template for Intervention Description and Replication (TIDieR) checklist [ 43 ] to guide the extraction of the interventions. The form was pilot tested with five articles and revised following recommendations from the research team. After establishing the information to be extracted, we imported the data into Google Forms to facilitate the extracting process for the reviewers. To ensure consistency and reliability in data extraction, two reviewers (AK and MA) independently extracted data from at least 10% of the included studies and compared the results, as recommended by Levac and colleagues [ 32 ]. Once we established consistency, the first author (AK) extracted data from the remaining studies.

Data extracted

Data extraction items include a bibliography (authors, the journal-title and year of publication), setting, study population (frail, number and age of participants), aims of the study, the conceptual framework of frailty used, domains of frailty considered, details on interventions that reduce, enhance, treat or reverse frailty, the framework used to develop interventions, assessment tools or instruments to assess frailty outcome before and/or after the intervention, outcomes (frailty completely, partially, or not reversed). Data extraction items can be found in Additional file 3 .

Quality appraisal (QA)

We critically appraised included studies strengths and limitations of the studies (e.g., randomized controlled trials, quasi-experimental studies, case reports, case series, and cohort studies) using the corresponding JBI checklist for quality appraisal. Checklists, ranged from eight to 13 items [ 35 ]. Answers to the questions in each scale ranged from ‘yes’, ‘no’, and ‘unclear’. Three reviewers (YA, MA, and AK) independently appraised the included studies. After completing the assessment, the first author (AK) sorted the answers to determine any discrepancies. When two reviewers reported the same answer, agreement was achieved. When answers differed, the first author extensively reviewed the study and discussed the differences with the other two to reach a consensus. After completion, we converted all the answers into descriptive variables, with yes representing ‘1’ and no and unclear meaning ‘0’. Following recommendations from some studies [ 44 , 45 ], we used these variables to generate a total score, which we further used to classify a study into “low”, “moderate”, and “high” risk of bias. The quality appraisal interpretation scale can be found in Additional file 4 .

Stage five: summarizing and reporting the results

Data analysis.

To summarize and elaborate on the first research question, we used a narrative synthesis. Initially, we developed a preliminary synthesis by grouping studies that focused on similar concepts such as but not limited to types of interventions, domains of frailty targeted, outcome of interventions, into a tabular format. Next, using excel, we created bar graphs where we explored relationships between and within studies. Through the use of conceptual mapping, we linked multiple pieces of evidence from individual studies to highlight key concepts and ideas [ 46 , 47 ].

Our approach to answering the second research question, comparing study demographics and participant characteristics, was descriptive in nature. Using Excel, we calculated the counts and frequencies of variables in each category and compared their percentages across studies [ 48 ].

Study selection

We identified 7499 potential records, of which thirty met eligibility criteria. In addition, our hand search of references of included studies revealed three eligible studies, reaching a total of thirty-three. We illustrate the screening and selection process for the included studies using the PRISMA 2020 flow diagram for systematic reviews (Fig.  2 ).

figure 2

PRISMA flow diagram of the search process for studies

Study characteristics

Sample sizes ranged from one to 250,428 participants across the studies. The most common study designs were randomized controlled trials (RCTs) (n = 23) [ 22 , 23 , 24 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 ], quasi-experimental (n = 4) [ 69 , 70 , 71 , 72 ], cohort Studies (n = 3) [ 20 , 73 , 74 ], case series (n = 2) [ 75 , 76 ] and a case report (n = 1) [ 19 ]. Geographically, the studies took place in fifteen different countries, namely Japan (n = 6) [ 23 , 49 , 53 , 58 , 72 , 74 ], Spain (n = 6) [ 20 , 59 , 60 , 62 , 70 , 75 ], United States of America (n = 4) [ 19 , 63 , 64 , 68 ], China (n = 3) [ 51 , 52 , 69 ], Sweden (n = 2) [ 50 , 55 ], South Korea (n = 2) [ 71 , 76 ], Singapore (n = 2) [ 22 , 54 ], Australia (n = 1) [ 66 ], Netherlands (n = 1) [ 65 ], Canada (n = 1) [ 73 ], France (n = 1) [ 24 ], Brazil (n = 1) [ 67 ], Thailand (n = 1) [ 56 ], Turkey (n = 1) [ 57 ], Denmark (n = 1) [ 61 ]. Publication dates ranged from June 23rd, 1994, to January 2nd, 2021, with most articles (n = 24) published after 2015.

Critical appraisal results

The quality assessment scores of the studies ranged from seven to twelve, and study bias was low to moderate for all included studies (Appendix 4). Given that scoping reviews do not mandate the inclusion of studies based on critical appraisal results [ 77 ], we did not exclude studies based on their quality assessment cores.

Participant characteristics

Twelve studies (36.4%) included participants over 65 years of age, 11 studies (33.3%) over 70 years of age, and 10 studies (30.3%) over 75 years of age. Most authors referred to participants as male or female without definition making it difficult to distinguish between gender and sex. Consequently, we present the results as reported in the studies. All but one study reported the sex/gender of participants [ 57 ], with one study having only male participants [ 19 ] and two studies having only female participants as per their eligibility criteria [ 23 , 61 ]. In many studies, the presence of comorbidities beyond frailty was not a requirement for participation (n = 27). Some studies, however, required comorbid conditions for inclusion, such as acromegaly (n = 1) [ 19 ], cardiovascular disease (n = 1) [ 72 ], chronic obstructive pulmonary disease/lung disease (n = 1) [ 60 ], fatigue (n = 1) [ 69 ], and risk of mobility disability and sedentary lifestyle (n = 1) [ 64 ]. Table  1 presents a summary of participant characteristics.

Most and least common domains targeted

Twenty-six studies involved intervention and control groups. Additionally, each study’s intervention targeted at least one domain of frailty. For example, some interventions targeted one single domain (n = 23) [ 19 , 20 , 23 , 49 , 50 , 52 , 53 , 55 , 56 , 57 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 67 , 68 , 70 , 72 , 73 , 74 ], two domains (n = 6) [ 4 , 22 , 54 , 56 , 57 , 78 ], three domains (n = 2) [ 58 , 66 ], and four domains of frailty (n = 2) [ 51 , 71 ]. Counts per domain are presented in Fig.  3 . The most targeted domains were the physical and the cognitive domains. The social domain was the least targeted.

figure 3

Breakdown of the domains identified in studies

Single and multi-component interventions

Thirteen studies (39.4%) focused on single-component interventions; twelve were physical activity interventions [ 52 , 53 , 56 , 60 , 62 , 63 , 64 , 67 , 70 , 73 , 76 ], and one was a social intervention [ 74 ]. These activities were either individually tailored or performed in a group. Over 50% of the studies focused on multicomponent interventions [ 19 , 20 , 22 , 23 , 24 , 49 , 50 , 51 , 54 , 55 , 58 , 59 , 65 , 66 , 68 , 69 , 71 , 72 , 75 ]. The number of components varied across interventions; from two components to the interventions (n = 10) [ 20 , 23 , 49 , 50 , 55 , 59 , 65 , 68 , 69 , 75 ], three components to the interventions (n = 8) [ 19 , 22 , 24 , 54 , 58 , 66 , 71 , 72 ], or four components to the interventions (n = 2) [ 51 , 71 ]. Characteristics of the interventions are.

included in Table  2 .

Most and least common frailty definitions used

Frailty was defined in all but three studies (n = 30) [ 49 , 61 , 68 ]. Two different definitions of frailty were used dominantly: Fried’s phenotype (n = 20) [ 20 , 22 , 23 , 51 , 52 , 53 , 54 , 56 , 57 , 59 , 62 , 64 , 66 , 67 , 69 , 70 , 71 , 72 , 75 , 76 ], and the Frailty Index (n = 4) [ 24 , 60 , 71 , 73 ]. Notwithstanding, other definitions of frailty involved the use of the clinical frailty scale [ 19 ] and checklist such as the kihon checklist [ 74 ].

Studies without frailty reversal outcome

In the 33 studies included, the results of 22 did not indicate reversal of frailty. Among these, 36.36% (n = 8) focused solely on physical interventions [ 53 , 57 , 60 , 61 , 62 , 63 , 64 , 76 ], while 63.63% (n = 14) combined physical activity with nutritional, cognitive, social, pharmaceutical, or behavioral interventions [ 20 , 24 , 49 , 50 , 51 , 54 , 55 , 58 , 65 , 66 , 68 , 69 , 71 , 75 ]. Although physical activity remains a significant factor in these studies, the types of physical activity (aerobic, strengthening, gait, resistance, etc.) varied. Research suggests that resistance exercise performed at high intensity over a minimum of 12 weeks has the most beneficial effect on physical frailty [ 68 , 79 ]. When done regularly over the course of six months, it has the potential to improve both the physical and physiological aspects of frailty [ 80 ]. In this context, we noted that resistance exercise was more prevalent than other forms of physical activity. Although similar physical activities were often implemented, their characteristics often differed. For example, there was variation in frequency from daily to three times per week, variation in intensity from moderate to high, and variation in duration from 6 weeks to 6 months.

In addition to physical activity, other types of interventions were also used, including cognitive interventions such as memory and reasoning training, pharmaceutical interventions such as medication reconciliation, social interventions such as improving social lifestyles, and behavioral interventions such as goal setting, action plans, and goal execution. Similarly, the characteristics of these interventions were heterogeneous across studies, with some provided as group therapies, and others designed as per the needs of participants.

Studies indicating frailty reversal outcome

Eleven studies reported frailty reversal as an outcome [ 19 , 22 , 52 , 56 , 59 , 67 , 70 , 72 , 73 , 74 , 81 ]. The physical domain was targeted in over 80% of the studies (n = 9) [ 19 , 23 , 52 , 56 , 59 , 67 , 70 , 72 , 73 ], while the social [ 74 ] and cognitive domains [ 22 ] were each targeted in one study. In single-component interventions such as physical activities (n = 5) [ 52 , 56 , 67 , 70 , 73 ], resistance exercises appeared to be the most common, done on its own or in combination with other physical exercises. Meanwhile, the social intervention enhanced the patient’s social capital, a social network that facilitates access to benefits and helps individuals solve problems through association [ 74 ].

The multi-component intervention consisted of physical activity combined with either nutritional counselling/advice or supplements. Some (n = 5) of the interventions included physical activity, nutrition, plus pharmaceutical intervention in one study [ 72 ], physical activity, nutritional plus cognitive intervention in another study [ 22 ], and physical activity combined with occupational and speech therapy [ 19 ], with intervention characteristics varying across studies.

Definition/clarity about the concept of reverse frailty

Authors of 17 studies referred to frailty as a reversible condition. However, the concept of reversing frailty was not defined or explained in six studies [ 22 , 54 , 57 , 58 , 63 , 64 ]. When defined, definitions varied. Some authors defined it as a shift from a frail to pre-frail state (n = 1) [ 56 ], frail to non-frail (n = 2) [ 24 , 59 ], frail to pre- and non-frail (7) [ 23 , 52 , 67 , 70 , 72 , 73 , 74 ], and severe frailty to mild frailty (n = 1) [ 19 ]. What was common across all definitions is that the direction of reversal was from a more severe state of frailty to a less severe state of frailty or pre-frail state. What is different is the degree of frailty, given that some definitions indicated a participant should be frail while others indicated participants being severely frail. This suggests the use of different definitions, criteria, methods, and measures to determine whether frailty reversal occurred. For example, seven of the studies that showed reversal used the definition of Fried et al., [ 23 , 52 , 56 , 59 , 67 , 70 , 72 ], one study used the frailty index [ 73 ], and another study used the clinical frailty scale [ 19 ]. Finally, one study used the Kihon checklist, consisting of 25 yes or no questions on daily-life-related activities, motor functions, nutritional status, oral functions, homebound, cognitive functions, and depressed mood [ 74 ].

Our study aimed to summarize and synthesize evidence on the impact of interventions on frail older adults, to identify those that resulted in frailty reversal and those that did not. In cases where frailty reversal was indicated, we explored the meaning of the concept of reversing frailty. Among the 33 studies included, frailty was revealed to be a complex syndrome encompassing multiple domains, indicating the need for interventions targeting different aspects. Even though some interventions were more prevalent, we observed similarities between types of interventions across studies that showed frailty reversal and those that did not. We noted that the physical domain received the most attention across all studies, whereas the social domain received the least attention in studies with frailty reversal outcomes. Considering that frailty has been defined, addressed, or assessed in multiple ways throughout the studies, further exploration will contribute to clarifying the concept of reversing frailty. These findings lead us to the following points.

Frailty reversal may depend on targeted domains

To the best of our knowledge, the present study is the first to systematically map interventions that indicate frailty reversal as an outcome and relates these interventions to the targeted frailty domains. Using the deficit accumulation model framework as our conceptual framework, we anticipated interventions would target multiple domains of frailty to achieve frailty reversal. However, this was not the case. We identified that the physical domain of frailty is the most frequently targeted as compared to the cognitive, social, and psychological domains. This is supported by the findings of other reviews where authors perceived frailty as primarily a physical impairment, measured by the Fried criteria [ 82 , 83 , 84 , 85 , 86 , 87 ]. This finding suggests that reversing frailty may probably depend on the domain that is targeted by the intervention, or the conceptual framework used to identify and measure its outcome.

Definition of reverse frailty remains unclear

There is no standard definition of reverse frailty, yet the concept appears in several research studies. We used a descriptive approach such as percentages to examine the differences and similarities between the various definitions. A fundamental similarity is that the individual must be deemed frail at baseline. However, the process of determining an individual’s frailty score or status differed among the studies because of the different assessment instruments used. Another similarity was that to reverse frailty, frailty scores or status must not progress to a more severe state but rather improve to a pre-frail or milder state of frailty. Further research is required to clarify this concept, preferably through concept analysis.

Absence of a universal method to reverse frailty

This review included a heterogeneous group of studies with a diverse range of participant characteristics, intervention types, and duration of intervention. Single-component and multi-component interventions have shown efficacy in reversing frailty, with more studies of single-component interventions (i.e., physical activity or social interventions) than the latter.

Use of single-component interventions to reverse frailty

Our study identified physical activity as the most used intervention across studies that reversed frailty. This fits with previous findings that physical activity is essential in interventions for frail older adults [ 85 , 86 , 87 , 88 ]. The activities were performed together (combination exercises) or separately (resistance only). In one study, frailty was reversed as early as six weeks [ 70 ]. The authors attributed this to the combination of resistance, strength training and aerobic exercises. Therefore, when combined with other types of exercise, resistance exercise could promote the rapid improvement of physical frailty.

According to a recent scoping review, social frailty has not received adequate attention [ 15 ]. Based on the findings of our review, we agree with this notion, given we identified only one study [ 74 ] that explored frailty reversal through singular intervention. Using an established checklist of items, the study monitored the effects of enhanced social capital (including interaction with neighbours, trust in the community, social participation in activities) on frailty reversal over two years. The results showed that 31.8% of the participants’ frailty statuses reversed to pre-frail or non-frail Another study [ 58 ] showed that increasing participants’ social capital improved their adherence to activities and encouraged them to continue interventions even after the study had ended. Thus, interventions that consider this approach may have better outcomes when it comes to frailty reversal.

Use of multi-component interventions to reverse frailty

The studies(n = 11) that showed frailty reversal as an outcome employed a combination of two or more intervention components tailored to participant needs or conducted in small groups. Physical activity, particularly resistance exercise, is recommended in conjunction with nutritional interventions as a preventative measure of muscle atrophy in older adults [ 58 ], which may explain why this combination was the most common among the multi-component interventions. We also noted other physical activities such as strength, balance gait and aerobic exercise performed in combination with resistance exercise at varying frequencies and durations. Nutritional interventions included dietary supplements and nutritional education (advice and counselling) on healthy food choices, with the latter being the most reportedly used. We related the advantage of this approach as reported in other studies where Interventions that aimed to empower participants by way of soliciting and incorporating their input (e.g., choosing meals) were more likely to result in participants feeling in control and autonomous over their dietary choices [ 89 , 90 ]. This may explain how nutritional education may provide older adults with more food variety and improved food intake compared with dietary supplements [ 58 ]. In addition to nutritional education and physical activity, Ushijima et al. [ 72 ] also provided medication guidance, to mitigate the effects of polypharmacy, which have been shown to negate the effects of physical and nutritional interventions [ 91 , 92 ].

Recommendations

The results and discussion points above guide our research, practice, and policy recommendations.

In this scoping review, the reporting of the interventions was suboptimal. For example, not all studies reported whether interventions were modified, personalization of interventions were planned, fidelity and adherence were measured, or how intervention fidelity was maintained or improved. Therefore, we recommend that authors use the template for intervention description and replication (TIDIER) checklist [ 43 ] or the Standards for Reporting Implementation Studies (StaRI) [ 93 ] whenever possible to improve intervention reporting. These checklists facilitate clinician use of interventions and researchers’ synthesis and replication. Additionally, we recommend that authors of future studies provide details on the definition and components of frailty. Clinically, this may help identify groups of individuals in need of care and facilitate understanding among researchers.

Despite having no study design restrictions, we did not identify any qualitative or mixed method studies about frailty reversal interventions. None of the included studies reported engaging participants in decision-making or incorporating participant experiences into intervention delivery. A recent scoping review [ 94 ] echoes this concern, as older adults worry that they are not involved in health and well-being decisions. It is known that engaging older adults in decision-making improves health outcomes [ 95 ]. Therefore, we recommend qualitative and mixed methods studies aiming to integrate the older adults’ perspective regarding intervention development, evaluation, or implementation.

Acknowledging that frailty is complex in nature, RCTs with a large sample size could be beneficial to investigate the social, psychological, and cognitive aspects of frailty, which have received little attention to date.

Among the studies that did not report frailty reversal as an outcome, behavioural enhancement was one of the interventions implemented. The use of behavioral enhancement has been associated with the development of self-management skills and the maintenance of long-term changes [ 69 ]. It is therefore our recommendation that more studies consider a behavioural enhancement approach to facilitate adherence to interventions and maintain the benefits of interventions over the long-term. Lastly, given that frailty assessments and measurements are inconsistent, there is a need for more work to standardize them.

Further to considering the perspectives of older adults with frailty, we recommend tailoring interventions to fit the needs and capabilities of individuals rather than generalizing it across an entire population. For example, Latham and colleagues [ 96 ] conducted a resistance training program with Vitamin D supplements over ten weeks for participants with certain functional limitations, such as dependence on others for activities of daily living, prolonged bed rest, or impaired mobility. Contrary to other studies reporting positive effects of resistance exercise, such as improved functional outcomes and decreased frailty scores during this period [ 53 , 58 , 67 , 68 ], Latham and colleagues reported increased fatigue and musculoskeletal injury risks, which may be related to the participants’ functional limitations. We, therefore, recommend tailoring interventions to match participants’ needs and abilities rather than having set durations, frequencies, or intensities of interventions. Another reason is that some older adults may have functional limitations affecting their ability to adhere to prescribed interventions, including the potential adverse effects of polypharmacy on intervention effectiveness [ 92 ].

Research results influence guidelines and expectations for delivering care, services, and programs [ 97 ]. Frailty is becoming a potential public and global health concern, as indicated by the inclusion of studies from North America, Europe, Asia, Australia, etc. This reinforces the need to prevent or reverse this geriatric syndrome. Future studies should investigate frailty in all continents to increase our understanding on the global challenges of expectations, implementation, or care delivery for frail older adults. Such information can facilitate the transfer of healthcare professionals between continents by bridging the knowledge gap concerning frailty, its interventions, and potential strategies for reversing the condition.

Strengths and limitations

Our study has strengths and limitations. We established a reproducible, systematic approach, from the literature search to screening and data extraction. Furthermore, the search strategy was guided and peer-reviewed by academic librarians with extensive knowledge of scoping and systematic reviews. We quality appraised included articles permitting us to have a better sense of the quality of the evidence on this topic. Although not formally published or registered, an a priori protocol approved by the research team guided this study. In comparison to the protocol, a few changes have been made to this study, such as not obtaining expert consultation and revising the research questions.

In terms of limitations, included studies were heterogeneous in their study objectives, frailty definition, frailty domain targeted, and intervention characteristics. Some studies used self-administered questionnaires as outcome measures to assess frailty, potentially increasing the risk of bias and making replication difficult because there is no guarantee of having the same responses among different participants. In addition, two studies did not report the characteristics of the intervention [ 19 , 73 ], and one indicated that participants were frail but did not specify how frailty was determined [ 68 ]. Lastly, we acknowledge that using only a few databases may have limited the number of studies we were able to find.

Conclusions

We used a narrative and descriptive approach to synthesize the included studies. Despite the lack of a standard definition of frailty, we observed similar interventions across studies that reported an outcome of frailty reversal and those that did not. When frailty reversal was indicated, we explored the meaning of the concept. We noted that the physical domain received the most attention across all studies. In contrast, the social domain received the least attention in studies with frailty reversal outcomes.

This study confirms that frailty is a complex and worrying geriatric syndrome. As the world’s population ages, frailty is becoming a serious issue for public and global health. Thus, it is crucial for frailty to be considered a holistic phenomenon with a multi-factor approach rather than merely a physical condition. This requires more research addressing multiple domains to target its prevention and reversal. Our findings indicate that reversing frailty requires that a person first be considered frail, regardless of how frailty is assessed. Although we discovered different ways of assessing frailty among the studies, a key highlight is the fact that the ability to reverse frailty may depend on how frailty is defined and measured. Hence, a consensus on what reverse frailty means is necessary. A promising but challenging area for future research could be qualitative analysis that explores frail older adults’ lived experiences and perspectives. This will guide the development and implementation of possible interventions to reverse this critical geriatric syndrome.

Data availability

Data supporting the findings of this study are available in the article [and its supplementary information files].

Abbreviations

Body Mass Index

Center for Epidemiologic Studies Depression Scale

Medical Literature Analysis and Retrieval System Online

Chronic obstructive pulmonary disease

Cardiopulmonary resuscitation

Frailty Index

Frailty Phenotype

Geriatric Depression Scale

High intensity

Instrumental Activities of Daily Living

Joanna Briggs Institute

Kihon checklist

Multi-component Exercise Program

Medical Subject Headings

Post-intervention follow-up

Peer Review of Electronic Search Strategies

Preferred Reporting Items for Systematic Reviews and Meta-Analyzes

Resident Assessment Instrument-Home Care

Randomized control trials

Resting metabolic rate

Resistance training

Short Physical Performance Battery

Template for Intervention Description and Replication

Standards for Reporting Implementation Studies

United Nations. World Population Ageing 2019 . 2019.

Markle-Reid M, Browne G, Gafni A. Nurse-led health promotion interventions improve quality of life in frail older home care clients: Lessons learned from three randomized trials in Ontario, Canada. J Eval Clin Pract. 2013;19:118–31.

Article   PubMed   Google Scholar  

Kojima G, Liljas AEM, Iliffe S. Frailty syndrome: implications and challenges for health care policy. Risk Manag Healthc Policy. 2019;12:23–30.

Article   PubMed   PubMed Central   Google Scholar  

Sacha M, Sacha J, Wieczorowska-Tobis K. Multidimensional and physical Frailty in Elderly People: participation in Senior Organizations does not prevent Social Frailty and most prevalent psychological deficits. Front Public Heal. 2020;8:1–7.

Google Scholar  

Hoogendijk EO, Afilalo J, Ensrud KE, et al. Frailty: implications for clinical practice and public health. Lancet. 2019;394:1365–75.

Kamaruzzaman SB. Frailty in older people. Geriatric medicine. Springer, Singapore, 2018, 27–41.

Chapter   Google Scholar  

Bahat G, Ilhan B, Tufan A et al. Success of simpler modified Fried Frailty Scale to predict mortality among nursing home residents. J Nutr Heal Aging 2021; 1–5.

McIsaac DI, Macdonald DB, Aucoin SD. Frailty for Perioperative Clinicians: a narrative review. Anesth Analg. 2020;130:1450–60.

Vella Azzopardi R, Beyer I, Vermeiren S, et al. Increasing use of cognitive measures in the operational definition of frailty—A systematic review. Ageing Res Rev. 2018;43:10–6.

Article   CAS   PubMed   Google Scholar  

Dent E, Kowal P, Hoogendijk EO. Frailty measurement in research and clinical practice: a review. Eur J Intern Med. 2016;31:3–10.

Van Oostrom SH, Van Der ADL, Rietman ML, et al. A four-domain approach of frailty explored in the Doetinchem Cohort Study. BMC Geriatr. 2017;17:1–12.

Fried LP, Tangen CM, Walston J, et al. Frailty in older adults: evidence for a phenotype. Journals Gerontol - Ser A Biol Sci Med Sci. 2001;56:146–57.

Article   Google Scholar  

Kwan RYC, Leung AYM, Yee A, et al. Cognitive Frailty and its Association with Nutrition and Depression in Community-Dwelling Older People. J Nutr Heal Aging. 2019;23:943–8.

Article   CAS   Google Scholar  

Hoeyberghs LJ, Schols JMGA, Verté D, et al. Psychological Frailty and Quality of Life of Community Dwelling Older People: a qualitative study. Appl Res Qual Life. 2020;15:1395–412.

Bunt S, Steverink N, Olthof J, et al. Social frailty in older adults: a scoping review. Eur J Ageing. 2017;14:323–34.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Clegg. Frailty in Older People. Lancet. 2014;381:752–62.

Hao Q, Zhou L, Dong B, et al. The role of frailty in predicting mortality and readmission in older adults in acute care wards: a prospective study. Sci Rep. 2019;9:1–8.

Langlois F, Vu TTM, Chassé K, et al. Benefits of physical Exercise training on Cognition and Quality of Life in Frail older adults. Journals Gerontol Ser B Psychol Sci Soc Sci. 2013;68:400–4.

Hergott CG, Lovins J. The impact of functional exercise on the reversal of acromegaly induced frailty: a case report. Physiother Theory Pract. 2020. https://doi.org/10.1080/09593985.2020.1768456 . Epub ahead of print.

Abizanda P, López MD, García VP, et al. Effects of an oral Nutritional Supplementation Plus Physical Exercise intervention on the physical function, Nutritional Status, and quality of life in Frail Institutionalized older adults: the ACTIVNES Study. J Am Med Dir Assoc. 2015;16:439e9–16.

Marcucci M, Damanti S, Germini F, et al. Interventions to prevent, delay or reverse frailty in older people: a journey towards clinical guidelines. BMC Med. 2019;17:1–11.

Ng TP, Feng L, Nyunt MSZ, et al. Nutritional, Physical, Cognitive, and combination interventions and frailty reversal among older adults: a Randomized Controlled Trial. Am J Med. 2015;128:1225–1236e1.

Kim H, Kim M, Kojima N, et al. Effects of exercise and nutritional supplementation in community-dwelling frail elderly women in Japan: a randomized placebo controlled trial. J Am Geriatr Soc. 2016;64:107.

de Souto Barreto P, Rolland Y, Maltais M, et al. Associations of Multidomain Lifestyle intervention with Frailty: secondary analysis of a Randomized Controlled Trial. Am J Med. 2018;131:NPAG–NPAG.

Kulminski A, Yashin A, Arbeev K, et al. Cumulative index of health disorders as an indicator of aging-associated processes in the elderly: results from analyses of the National Long Term Care Survey. Mech Ageing Dev. 2007;128:250–8.

Hoover M, Rotermann M, Sanmartin C, et al. Validation of an index to estimate the prevalence of frailty among community-dwelling seniors. Heal Rep. 2013;24:10–7.

Martínez-Velilla N, Herce PA, Herrero ÁC, et al. Heterogeneity of different tools for detecting the prevalence of Frailty in nursing Homes: feasibility and meaning of different approaches. J Am Med Dir Assoc. 2017;18:898. .e1-898.e8.

Rockwood K, Mitnitski A. Frailty in relation to the accumulation of deficits. Journals Gerontol - Ser A Biol Sci Med Sci. 2007;62:722–7.

Theou O, Tan ECK, Bell JS, et al. Frailty levels in residential aged care Facilities measured using the Frailty Index and FRAIL-NH Scale. J Am Geriatr Soc. 2016;64:e207–12.

Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol Theory Pract. 2005;8:19–32.

Mitnitski AB, Mogilner AJ, Rockwood K. Accumulation of deficits as a proxy measure of aging. ScientificWorldJournal. 2001;1:323–36.

Levac D, Colquhoun H, O’Brien KK, et al. Scoping studies: advancing the methodology. Implement Sci. 2010;5:1–9.

Joanna Briggs Institute. The scoping review framework.

Tricco AC, Lillie E, Zarin W, et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. 2018;169:467–73.

Apóstolo J, Cooke R, Bobrowicz-Campos E, et al. Effectiveness of interventions to prevent pre-frailty and frailty progression in older adults: a systematic review. JBI Database Syst Rev Implement Reports. 2018;16:140–232.

Chu W, Chang S, Ho H, et al. The relationship between Depression and Frailty in Community-Dwelling Older People: a systematic review and Meta‐analysis of 84,351 older adults. J Nurs Scholarsh. 2019;51:547–59.

García-García FJ, Carcaillon L, Fernandez-Tresguerres J, et al. A new operational definition of Frailty: the Frailty Trait Scale. J Am Med Dir Assoc. 2014;15:371. .e7-371.e13.

Taube E, Kristensson J, Midlöv P, et al. The use of case management for community-dwelling older people: the effects on loneliness, symptoms of depression and life satisfaction in a randomised controlled trial. Scand J Caring Sci. 2018;32:889–901.

Ankilma do Nascimento Andrade, Maria das Graças Melo Fernandes MML da N. FRAILTY IN THE ELDERLY: CONCEPTUAL ANALYSIS. 2012; 21: 748–56.

McGowan J, Sampson M, Salzwedel DM, et al. PRESS peer review of electronic search strategies: 2015 Guideline Statement. J Clin Epidemiol. 2016;75:40–6.

Kellermeyer L, Harnke B, Knight S. Covidence and Rayyan. J Med Libr Association: JMLA. 2018;106:580–3.

Veritas Health Innovation. Covidence systematic review software. www.covidence.org; 2021.

Hoffmann TC, Glasziou PP, Boutron I, et al. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ. 2014;348:1–12.

Foe-Essomba JR, Kenmoe S, Tchatchouang S, et al. Diabetes mellitus and tuberculosis, a systematic review and meta-analysis with sensitivity analysis for studies comparable for confounders. PLoS ONE. 2021;16:e0261246.

Adalbert JR, Varshney K, Tobin R, et al. Clinical outcomes in patients co-infected with COVID-19 and Staphylococcus aureus: a scoping review. BMC Infect Dis. 2021;21:985.

Green BN, Johnson CD, Adams A. Writing narrative literature reviews for peer-reviewed journals: secrets of the trade. J Chiropr Med. 2006;5:101–17.

Popay J, Roberts H, Sowden A et al. Narrative synthesis in systematic reviews: a product from the ESRC Methods Programme. ESRC Methods Program 2006; 93.

Nassaji H. Qualitative and descriptive research: data type versus data analysis. Lang Teach Res. 2015;19:129–32.

Imaoka M, Higuchi Y, Todo E, et al. Low-frequency Exercise and vitamin D supplementation reduce Falls among Institutionalized Frail Elderly. Int J Gerontol. 2016;10:202–6.

Lammes E, Rydwik E, Akner G. Effects of nutritional intervention and physical training on energy intake, resting metabolic rate and body composition in frail elderly. A randomised, controlled pilot study. J Nutr Heal AGING. 2012;16:162–7.

Li C-M, Chen C-Y, Li C-Y, et al. The effectiveness of a comprehensive geriatric assessment intervention program for frailty in community-dwelling older people: a randomized, controlled trial. Arch Gerontol Geriatr. 2010;50(Suppl 1):39–42.

Liao YY, Chen IH, Wang RY. Effects of Kinect-based exergaming on frailty status and physical performance in prefrail and frail elderly: a randomized controlled trial. Sci Rep; 9. Epub ahead of print 2019. https://doi.org/10.1038/s41598-019-45767-y .

Nagai K, Miyamato T, Okamae A, et al. Physical activity combined with resistance training reduces symptoms of frailty in older adults: a randomized controlled trial. Arch Gerontol Geriatr. 2018;76:41–7.

Ng T-P, Chan G, Nyunt M, et al. Multi-domains lifestyle interventions reduces depressive symptoms among frail and pre-frail older persons: Randomized controlled trial. J Nutr Health Aging. 2017;21:918–26.

Rydwik E, Frändin K, Akner G. Effects of a physical training and nutritional intervention program in frail elderly people regarding habitual physical activity level and activities of daily living—A randomized controlled pilot study. Arch Gerontol Geriatr. 2010;51:283–9.

Sadjapong U, Yodkeeree S, Sungkarat S, et al. Multicomponent Exercise Program reduces Frailty and inflammatory biomarkers and improves physical performance in Community-Dwelling older adults: a Randomized Controlled Trial. Int J Environ Res Public Health. 2020;17. https://doi.org/10.3390/ijerph17113760 . Epub ahead of print.

Sahin UK, Kirdi N, Bozoglu E, et al. Effect of low-intensity versus high-intensity resistance training on the functioning of the institutionalized frail elderly. Int J Rehabil Res. 2018;41:211–7.

Seino S, Nishi M, Murayama H, et al. Effects of a multifactorial intervention comprising resistance exercise, nutritional and psychosocial programs on frailty and functional health in community-dwelling older adults: a randomized, controlled, cross-over trial. Geriatr Gerontol Int. 2017;17:2034–45.

Tarazona-Santabalbina FJ, Gómez-Cabrera MC, Pérez-Ros P, et al. A Multicomponent Exercise intervention that reverses Frailty and improves cognition, emotion, and Social networking in the Community-Dwelling Frail Elderly: a Randomized Clinical Trial. J Am Med Dir Assoc. 2016;17:426–33.

Torres-Sanchez I, Valenza MC, Cabrera-Martos I, et al. Effects of an Exercise intervention in Frail older patients with chronic obstructive Pulmonary Disease hospitalized due to an exacerbation: a Randomized Controlled Trial. COPD-JOURNAL CHRONIC Obstr Pulm Dis. 2017;14:37–42.

Vestergaard S, Kronborg C, Puggaard L. Home-based video exercise intervention for community-dwelling frail older women: a randomized controlled trial. Aging Clin Exp Res. 2008;20:479–86.

Arrieta H, Rezola-Pardo C, Gil SM, et al. Effects of Multicomponent Exercise on Frailty in Long‐Term nursing Homes: a Randomized Controlled Trial. J Am Geriatr Soc. 2019;67:1145–51.

Brown M, DR S. Low-intensity exercise as a modifier of physical frailty in older adults. Arch Phys Med Rehabil. 2000;81:960–5.

Cesari M, Vellas B, Hsu F-C, et al. A physical activity intervention to treat the frailty syndrome in older persons-results from the LIFE-P study. Journals Gerontol Ser a Biol Sci Med Sci. 2015;70:216–22.

Chin A, Paw MJM, De Jong N, Schouten EG, et al. Physical exercise and/or enriched foods for functional improvement in frail, independently living elderly: a randomized controlled trial. Arch Phys Med Rehabil. 2001;82:811–7.

Cameron ID, Fairhall N, Langron C, et al. A multifactorial interdisciplinary intervention reduces frailty in older people: randomized trial. BMC Med. 2013;11:65.

Coelho-Júnior HJ, Uchida MC. Effects of Low-Speed and High-Speed Resistance Training Programs on Frailty Status, physical performance, cognitive function, and blood pressure in Prefrail and Frail older adults. Front Med. 2021;8:1–19.

Fiatarone MA, O’Neill EF, Ryan ND, et al. Exercise training and nutritional supplementation for physical frailty in very elderly people. N Engl J Med. 1994;330:1769–75.

Liu JY-W, Lai CKY, Siu PM, et al. An individualized exercise programme with and without behavioural change enhancement strategies for managing fatigue among frail older people: a quasi-experimental pilot study. Clin Rehabil. 2017;31:521–31.

Losa-Reyna J, Baltasar-Fernandez I, Alcazar J, et al. Effect of a short multicomponent exercise intervention focused on muscle power in frail and pre frail elderly: a pilot trial. Exp Gerontol. 2019;115:114–21.

Oh G, Jang I-Y, Lee H, et al. Long-term effect of a Multicomponent intervention on physical performance and Frailty in older adults. Innov Aging. 2019;3:919–S920.

Ushijima A, Morita N, Hama T, et al. Effects of cardiac rehabilitation on physical function and exercise capacity in elderly cardiovascular patients with frailty. J Cardiol. 2021;77:424–31.

Larsen RT, Turcotte LA, Westendorp R, et al. Frailty Index Status of Canadian Home Care clients improves with Exercise Therapy and declines in the Presence of Polypharmacy. J Am Med Dir Assoc. 2020;21:766.

Takatori K, Matsumoto D. Social factors associated with reversing frailty progression in community-dwelling late-stage elderly people: An observational study. PLoS One ; 16. Epub ahead of print 2021. https://doi.org/10.1371/journal.pone.0247296 .

Cadore EL, Moneo ABB, Mensat MM, et al. Positive effects of resistance training in frail elderly patients with dementia after long-term physical restraint. Age (Omaha). 2014;36:801–11.

Kim YJ, Park H, Park JH, et al. Effects of Multicomponent Exercise on cognitive function in Elderly korean individuals. J Clin Neurol. 2020;16:612–23.

Munn Z, Peters MDJ, Stern C, et al. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol. 2018;18:143.

Uchmanowicz I, Jankowska-Polańska B, Wleklik M, et al. Frailty Syndrome: nursing interventions. SAGE Open Nurs. 2018;4:1–11.

Marcos-Pardo PJ, Orquin-Castrillón FJ, Gea-García GM, et al. Effects of a moderate-to-high intensity resistance circuit training on fat mass, functional capacity, muscular strength, and quality of life in elderly: a randomized controlled trial. Sci Rep. 2019;9:7830.

Saragih ID, Saragih IS, Batubara SO et al. Effects of resistance bands exercise for frail older adults: A systematic review and meta-analysis of randomised controlled studies. J Clin Nurs . Epub ahead of print 2021. https://doi.org/10.1111/jocn.15950 .

Kim H, Suzuki T, Kim M, et al. Effects of exercise and milk fat globule membrane (MFGM) supplementation on body composition, physical function, and hematological parameters in community-dwelling frail japanese women: a randomized double blind, placebo-controlled, follow-up trial. PLoS ONE. 2015;10:1–20.

Campbell E, Petermann-Rocha F, Welsh P et al. The effect of exercise on quality of life and activities of daily life in frail older adults: A systematic review of randomised control trials. Exp Gerontol ; 147. Epub ahead of print 2021. https://doi.org/10.1016/j.exger.2021.111287 .

Stookey AD, Katzel LI. Home Exercise Interventions in Frail older adults. Curr Geriatr REPORTS. 2020;9:163–75.

Kelaiditi E, van Kan GA, Cesari M. Frailty: role of nutrition and exercise. Curr Opin Clin Nutr Metab Care. 2014;17:32–9.

PubMed   Google Scholar  

Labra C, De, Guimaraes-pinheiro C, Maseda A et al. Effects of physical exercise interventions in frail older adults: a systematic review of randomized controlled trials. BMC Geriatr. Epub ahead of print 2015. https://doi.org/10.1186/s12877-015-0155-4 .

Arantes PMM, Alencar MA, Pereira LSM. Physical therapy treatment on frailty syndrome: systematic review. 2009; 13: 365–75.

Chin A, Paw MJM, Uffelen JGZ, Van, Riphagen I et al. The functional Effects of Physical Exercise Training in Frail a systematic review. 2008; 38: 781–93.

Dedeyne L, Deschodt M, Verschueren S, et al. Effects of multi-domain interventions in (pre)frail elderly on frailty, functional, and cognitive status: a systematic review. Clin Interv Aging. 2017;12:873–96.

Bombard Y, Baker GR, Orlando E et al. Engaging patients to improve quality of care: a systematic review. Implement Sci; 13. Epub ahead of print 2018. https://doi.org/10.1186/s13012-018-0784-z .

Krist AH, Tong ST, Aycock RA, et al. Engaging patients in decision-making and Behavior Change to Promote Prevention. Stud Health Technol Inform. 2017;240:284–302.

PubMed   PubMed Central   Google Scholar  

Dagli RJ, Sharma A. Polypharmacy: a global risk factor for elderly people. J Int oral Heal JIOH. 2014;6:i–ii.

Katsimpris A, Linseisen J, Meisinger C, et al. The Association between Polypharmacy and physical function in older adults: a systematic review. J Gen Intern Med. 2019;34:1865–73.

Duncan E, O’Cathain A, Rousseau N, et al. Guidance for reporting intervention development studies in health research (GUIDED): an evidence-based consensus study. BMJ Open. 2020;10:1–12.

Durepos P, Sakamoto M, Alsbury K et al. Older adults’ perceptions of Frailty Language: a scoping review. Can J Aging. Epub ahead of print 2021. https://doi.org/10.1017/S0714980821000180 .

Elliott J, McNeil H, Ashbourne J, et al. Engaging older adults in Health Care Decision-Making: a Realist synthesis. Patient. 2016;9:383–93.

Latham NK, Anderson CS, Lee A, et al. A randomized, controlled trial of quadriceps resistance exercise and vitamin D in frail older people: the frailty interventions trial in elderly subjects (FITNESS). J Am Geriatr Soc. 2003;51:291–9.

Erismann S, Pesantes MA, Beran D, et al. How to bring research evidence into policy? Synthesizing strategies of five research projects in low-and middle-income countries. Heal Res Policy Syst. 2021;19:1–13.

Download references

Acknowledgements

We would like to thank the University of Ottawa Health Science Librarians: Valentina Ly (VL), Victoria Cole (VC), and Lindsey Sikora (LS), for their guidance in ensuring searching for relevant studies. Special thanks also go to Ojongetakah Enokenwa Baa and Mbi Ayuk Solange, who acted as secondary screeners for selecting relevant studies.

Not applicable.

Author information

Authors and affiliations.

School of Nursing, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada

Aurélie Tonjock Kolle, Krystina B. Lewis, Michelle Lalonde & Chantal Backman

The Ottawa Hospital Research Institute, Ottawa, ON, Canada

Krystina B. Lewis & Chantal Backman

University of Ottawa Heart Institute, Ottawa, ON, Canada

Krystina B. Lewis

Institute du Savoir Montfort, Montfort Hospital, Ottawa, ON, Canada

Michelle Lalonde

Bruyère Research Institute, Ottawa, ON, Canada

Chantal Backman

You can also search for this author in PubMed   Google Scholar

Contributions

AK, the principal investigator, initiated the project, designed the search strategy, carried out data extracted, and performed an analysis of the findings. KL critiqued and guided the project’s direction, such as the research questions, methodology, and results. ML offered suggestions about the thesis design results, critiqued and provided feedback as needed. CB guided the development of the research topic, provided regular feedback, and edited and approved every stage of the project. All authors read and approved the final manuscripts.

Corresponding author

Correspondence to Chantal Backman .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethics approval and consent to participate

Consent for publication, additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kolle, A.T., Lewis, K.B., Lalonde, M. et al. Reversing frailty in older adults: a scoping review. BMC Geriatr 23 , 751 (2023). https://doi.org/10.1186/s12877-023-04309-y

Download citation

Received : 21 December 2022

Accepted : 12 September 2023

Published : 17 November 2023

DOI : https://doi.org/10.1186/s12877-023-04309-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multi component interventions
  • Single component interventions
  • Reverse frailty
  • Frailty domains

BMC Geriatrics

ISSN: 1471-2318

critical appraisal tools literature reviews

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10348931

Logo of springeropen

A scoping review on quality assessment tools used in systematic reviews and meta-analysis of real-world studies

Tadesse gebrye.

1 Department of Health Professions, Faculty of Health, Psychology, and Social Care, Manchester Metropolitan University, Brooks Building, Birley Fields Campus, Bonsall Street, 53 Bonsall Street, Manchester, M15 6GX UK

Francis Fatoye

2 Lifestyle Diseases, Faculty of Health Sciences, North-West University, Mahikeng, South Africa

Chidozie Mbada

Zalmai hakimi.

3 Apellis Pharmaceuticals, Zug, Switzerland

Associated Data

All results from our analyses are published in the Supplementary Material, available at Rheumatology International online. Items/domains employed to the included studies and extracted by our investigators are available upon reasonable request.

Risk of bias tools is important in identifying inherent methodical flaws and for generating evidence in studies involving systematic reviews (SRs) and meta-analyses (MAs), hence the need for sensitive and study-specific tools. This study aimed to review quality assessment (QA) tools used in SRs and MAs involving real-world data. Electronic databases involving PubMed, Allied and Complementary Medicine Database, Cumulated Index to Nursing and Allied Health Literature, and MEDLINE were searched for SRs and MAs involving real-world data. Search was delimited to articles published in English, and between inception to 20 of November 2022 following the SRs and MAs extension for scoping checklist. Sixteen articles on real-world data published between 2016 and 2021 that reported their methodological quality met the inclusion criteria. Seven of these articles were observational studies, while the others were of interventional type. Overall, 16 QA tools were identified. Except one, all the QA tools employed in SRs and MAs involving real-world data are generic, and only three of these were validated. Generic QA tools are mostly used for real-world data SRs and MAs, while no validated and reliable specific tool currently exist. Thus, there is need for a standardized and specific QA tool of SRs and MAs for real-world data.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00296-023-05354-x.

Introduction

Systematic Reviews (SRs), evidence-based medicine, and clinical guidelines bring together trustworthy information by systematically acquiring, analysing, and transferring research findings into clinical, management, and policy arenas [ 1 ]. As such, findings of different work in medical literature on related topics are evaluated using SRs and meta-analyses (MAs), through the application of scientific strategies that limit bias and errors that occur by chance [ 2 ]. Availability of the best evidence obtained though SRs and MAs is necessary to help clinicians, policy makers and patients reach the best health care decisions [ 3 ]. However, SRs and MAs require resources, take time, and are labour-intensive, as well, they may not always be warranted or possible. For example, a study estimated the expense of SRs for academic institutions and pharmaceutical companies to cost approximately $141,194.80, and on average, the total cost of all SRs per year to academic institutions and pharmaceutical companies amounts to $18,660,304.77 and $16,761,234.71 [ 4 ]. Therefore, unnecessary duplication of SRs should be avoided for cost, as well as given the large unmet need for SRs of a wide range of questions and the need to keep reviews up-to-date [ 5 ].

To use the results of SRs and MAs, it is important to assess the methodological quality of the primary studies [ 6 ]. Methodological quality assessment (QA) is the process of assessing the design and conduct of the included studies, and it is useful to establish transparency of evidence synthesis and to guarantee the certainty of the body of evidence of the review objective [ 7 , 8 ]. The main reason for assessing methodological quality of primary studies is to identify risks of bias [ 9 ] which may be due to poor reporting and several design features that are dependent on the research question. Poor reporting may prevent assessment of key features of design, making it difficult to evaluate whether the study methodology has been adequate [ 10 ]. According to National Health and Medical Research Council [ 11 ], “risks of bias refer to the likelihood that features of the study design or conduct of the study will give misleading results”, and thus bring about misused resources, un-thriftiness for effective interventions or harm to consumers [ 11 ].

A systematic review of methodological assessment tools for preclinical and clinical studies, and clinical practice guidelines show that there are a variety of methodological assessment tools for different types of study design [ 12 ]. Thus, it is critical to identify the study type before choosing the corresponding QA tool. In accordance, Zeng and colleagues [ 12 ] submit that further efforts in the development of critical appraisal tools are warranted for areas that currently lack such tools. However, there is an apparent dearth of specific QA tool for real-world evidence (RWE) studies. According to Food and Drugs Administrations [ 13 ], “RWE is the clinical evidence about the usage and potential benefits, or risks of a medical product derived from analysis of real-world data (RWD)”. Whereas RWD are routinely collected data pertaining to health status and/or health care delivery of the patient which are collected from a range of sources” [ 14 ] including claims, clinical studies, clinical setting, pharmaceuticals, and patient-powered platforms [ 15 , 16 ].

The increasing use of electronic health records, and health information systems has led to repositories of large volumes of complex longitudinal RWD [ 17 ]. Thus, RWD are mostly diversified, but generally are medical records, prescription data and lifestyle-related information from health care providers, hospitals, and pharmacies [ 18 ]. For primary studies based on RWD, the quality of their data should be defined in context, clearly represented, and accessible [ 15 , 19 ]. For example, Hyrich [ 20 ] concludes that RWD plays significant role in rheumatology because it helps to better understand disease progression and treatment outcomes beyond the conclusions of a clinical trial, as it provides a platform to "test" outcomes in an uncontrolled, real-life environment. Furthermore, the author posits that there is need to generate trustworthy conclusions from RWD by ensuring appropriate methodological and ethical considerations for handling RWD. Given the importance of RWD in research, population health, quality improvement, clinical decision support, and personalised medicine [ 21 ], it is necessary to explore the existing QA tools that have been used for SRs and MAs that involved RWD. Hence, this scoping review of QA tools used for SRs and MAs that involved RWD.

Scoping review

We conducted a scoping review, a type of literature review that is used when it is difficult to identify a narrow review question; no prior synthesis has been undertaken on the topic; studies in the review sources are likely to have employed a range of data collection and analysis techniques; and a quality assessment of reviewed sources is not going to be conducted [ 22 ].

Search strategy

An electronic database search was carried out by the reviewers through November 2022 using the following databases: PubMed, Allied and Complementary Medicine Database (AMED), Cumulated Index to Nursing and Allied Health Literature (CINAHL), and MEDLINE. The keywords used in the search included a combination of RWE, RWD, routinely collected data, electronic health records, claims and billing activities, registries, meta‐analysis, and systematic review (Appendix 2). Further, a manual search of reference sections of the included studies was also checked for additional studies. The search was delimited to articles published in English language.

Study selection and data extraction

One reviewer screened the abstracts of all publications obtained by the search strategies. Studies meeting the following inclusion criteria were selected for further review: interventional or observational studies, using real-world data, employed methodological QA tools. SRs or MAs not based on RWD and not methodological quality assessed were excluded. The potential eligible papers were retrieved, and the full articles were obtained and assessed for their relevance by two reviewers (TG & CEM) based on the preplanned criteria for inclusion. Any disagreement in study selection was resolved through discussion and consultation with a third reviewer (FF) where necessary.

A summary table was used to display the extracted data. The following data were extracted: authors and date, type of study, type of QA tool, number of items, domains, whether the tool is generic or specific, time to complete the tool, psychometric properties (validity and reliability), population/studies used to validate the tool, and name of the unit that developed the tool. The reviewers resolved differences through discussion to achieve consensus.

Data synthesis

Study data were extracted by three reviewers into a template. Findings for each study focusing on the QA tools used in SRs and MAs of RWD were then summarized by one reviewer, and the summaries discussed and modified by the research team as necessary, to generate an overall conclusion about the quality assessment (QA) tools used in SRs and MAs involving real-world data.

The search strategy retrieved 4,954 (PubMed = 4369; AMED = 5; CINHAL = 182; Medline = 398) articles from four databases (Fig.  1 ). After duplicates removal, the tittles, and abstracts of 4,153 publications were screened. From this, only 75 studies were included for full-text screening and 16 articles met the inclusion criteria.

An external file that holds a picture, illustration, etc.
Object name is 296_2023_5354_Fig1_HTML.jpg

Flow diagram of publications included and excluded in the review

Characteristics of included studies

The characteristics of the included studies are presented in Table ​ Table1. 1 . The included studies were published between 2016 and December 2021. Seven of the included studies were observational type and the remaining were interventional and observational type of studies. The included studies applied various QA tools. The number of items used for QA within the included studies ranged from 4 to 22. Seven of the included studies comprised core domains that contains different questions employed for quality assessment. Only one [ 23 ] of the included studies utilised very specific tools for methodological quality assessment. Three [ 24 – 26 ] of the included studies employed validated QA tools. In order to validate the tools used in the included studies, they employed 39 non-randomised studies [ 24 ], 131 cohort studies [ 25 ] and 30 cost-effectiveness studies [ 26 ]. On the other hand, the QA tools utilised to the remaining thirteen of the included studies were not validated.

Characteristics of the tools used in the included studies

GRADE The Grades of Recommendation, Assessment, Development, and Evaluation, ROBINS-I The Risk of Bias in Non-randomized Studies of Interventions, N/A Not Available, STROBE The Strengthening the Reporting of Observational Studies in Epidemiology Statement, CASP Critical Appraisal Skills Program

Ω These tools have not been independently published

Non-summative four-point system

Non-summative four-point system is one of the included studies used a QA tool specific to real-world data [ 23 ]. The tool was developed by Wylde and colleagues, it is non-summative four-point system [ 19 ]. The tool consisted of four items used to assess selection bias (inclusion of consecutive patients and representativeness), bias due to missing data (follow-up rates) and bias due to inadequate consideration of confounding (multivariable or univariable analysis). Each item was rated as adequate, not adequate or not reported.

In this paper, we reviewed the methodological QA tools for SRs and MAs used in RWE studies. The included studies in our review were published between 2016 and 2021, this finding aligns with the period of recent surge of use of methodological QA tools in real-world data studies. However, there is inadequate use of QA tool in RWD compared to other SRs and MA using randomised clinical trial [ 39 ]. The use of appropriate QA tools in SRs and MAs involving RWD is needed to generate trustworthy conclusions and acceptable evidence and recommendations to be used in health care [ 40 ]. The key point that is considered in the process of utilising evidence from SRs and MAs is whether critical appraisal is carried out or not [ 41 ]. For example, the findings of a study [ 42 ] that assessed the methodological, reporting and evidence quality of SRs and MAs of total glucosides of paeony for rheumatoid arthritis indicated that although included studies summarised that glucoside of paeony was effective and safe in the treatment of rheumatoid arthritis, the methodological and reporting quality and the quality of evidence was poor. As a result, the study recommended that decision-makers should be prudent when using glucosides of paeony in treating rheumatoid arthritis. Hyrich [ 20 ] in highlighting the key role of RWD in rheumatology, noted that methodological challenges in analysing RWD is a significant challenge to generating reliable scientific output using RWD.

Variation was observed within the QA tools used in the SRs and MAs with regard to content of domains, checklist, and scales. For example, some of the QA criteria such as inclusion of consecutive patients, representativeness, and follow-up were frequently reported in QA tools. Thus, the absence of a specific QA tool can restrict the process of consistent and reliable appraisal for SRs and MAs studies that have used RWD. In the current review, the authors observed that some of the QA tools were adapted or modified [ 23 , 32 , 34 , 36 ], whereas others used generic QA tools. Overall, little consensus was observed around the QA tools of the SRs and MAs for RWE studies.

The absence of a standard and specific QA tool for SRs and MAs involving RWE studies have resulted in the use of different types of QA tools that have been developed for other studies with a different methodology such as randomised controlled studies, cross-sectional studies. Except one [ 23 ], all the included studies for the current review have used different sets of QA tools that are generic. The tool developed by Evans and colleagues [ 23 ] was specific and consists of four items including inclusion of consecutive patients, representativeness, percentage of follow-up and minimisation of potential confounding. However, this QA tool was not validated, as its psychometric properties are lacking. Psychometric properties of a test are tests that identify and define critical aspects of an instrument that include its adequacy, relevance, and usefulness (or its validity) [ 43 ]. Other authors argued that there should be a QA tool which is specific to SRs and MAs for RWE that have been psychometrically tested for their feasibility, reliability, and validity [ 44 ].

The criteria to be used for QA in each type of tools are different and no specific tool covers all the methodological aspects. It is due to these methodological differences that relevant evaluation tools are developed based on the characteristics of different types of study. Some evaluation tools are, for example, used without recommendations for critical appraisal of evidence [ 45 ]. There are also many types of research methods such as before-after study (time series) and nested case–control study that do not have QA tools [ 46 ]. It is important that efforts should be made on developing QA tools for SRs and MAs of RWD.

This scoping review has certain strength and limitations. In this review, we used a systematic approach such as the screening of numerous data bases, and the involvement of multiple reviewers. Only studies conducted in English language were included, therefore, there is the possibility that some other relevant studies in other languages could have been excluded. Nevertheless, this review serves as a foundation for further work on QA tools in SRs and MA using RWD. Identification of appropriate QA tool for a specific type of study should be the priority for those utilising evidence from them. This is because it will be useful to increase the transparency and reproducibility of scientific work in real-world evidence. This study could be a foundation by way of summarising the QA tools while pointing out potential improvements to be adopted in the future.

Conclusions

The findings of the present scoping review indicated that many different types of QA tools are currently used for RWD of SRs and MAs studies, while no validated and reliable specific tool currently exist. Thus, there is a need for a standardized and specific QA tool of SRs and MAs for RWD.

Below is the link to the electronic supplementary material.

Author Contributions

TG participated in the design of the study, carried out the literature search and selection process, charted and modelled the data and drafted the paper. FF, CEM and ZH also participated in the design of the study, the literature selection process and the modelling of the data and helped to draft the paper. All the authors participated in modelling the data, drafting the paper and reading and approving the final version of this manuscript.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability

Declarations.

The authors have no conflict of interests to declare.

For this study ethical approval is not required.

The patient’s written informed consent was not made, as this was a systematic review study.

No part of this review is copied or published elsewhere in whole or in part in any languages. The information in Appendix 1 are Items/domains employed to the included studies. They are specific criteria developed to be used for quality assessment.

All data related to this work are available in this research article.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Tadesse Gebrye, Email: [email protected] .

Francis Fatoye, Email: [email protected] .

Chidozie Mbada, Email: [email protected] .

Zalmai Hakimi, Email: [email protected] .

IMAGES

  1. Summary table of the most well known Critical Appraisal Tools (CAT

    critical appraisal tools literature reviews

  2. Systematic Review Appraisal Training Course

    critical appraisal tools literature reviews

  3. Critical appraisal

    critical appraisal tools literature reviews

  4. Joanna Briggs Institute Critical Appraisal Checklist for critical and

    critical appraisal tools literature reviews

  5. Critical appraisal tools used.

    critical appraisal tools literature reviews

  6. Critical Appraisal Of A Meta-Analysis Or Systematic Review Template

    critical appraisal tools literature reviews

VIDEO

  1. Literature Review Tip 7 I Dr Dee

  2. Critical appraisal and literature review

  3. Critical Appraisal of Research NOV 23

  4. Critical appraisal of Research Papers and Protocols Testing Presence of Confounders GKSingh

  5. Critical Appraisal of a Clinical Trial- Lecture by Dr. Bishal Gyawali

  6. Critical Appraisal (3 sessions) practical book EBM

COMMENTS

  1. Critical Appraisal Tools

    Critical Appraisal Tools ; Writing a Literature Review ; Appraise Your Research Articles. The structure of a literature review should include the following: An overview of the subject, issue, or theory under consideration, along with the objectives of the literature review,

  2. Scientific writing: Critical Appraisal Toolkit (CAT) for assessing

    The literature review critical appraisal tool assesses the methodology, results and applicability of narrative reviews, systematic reviews and meta-analyses. After appraisal of individual items in each type of study, each critical appraisal tool also contains instructions for drawing a conclusion about the overall quality of the evidence from a ...

  3. Critical Appraisal Tools and Reporting Guidelines

    More. Critical appraisal tools and reporting guidelines are the two most important instruments available to researchers and practitioners involved in research, evidence-based practice, and policymaking. Each of these instruments has unique characteristics, and both instruments play an essential role in evidence-based practice and decision-making.

  4. Choosing the Best Systematic Review Critical Appraisal Tool

    Choosing the Best Systematic Review Critical Appraisal Tool. Automate every stage of your literature review to produce evidence-based research faster and more accurately. Critical appraisal is an important step in the systematic review methodology. It assesses the quality and validity of the studies to be considered in the research, helping ...

  5. JBI Critical Appraisal Tools

    JBI's critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers. These tools have been revised. Recently published articles detail the revision. "Assessing the risk of bias of quantitative analytical studies: introducing the vision for critical appraisal within JBI systematic reviews".

  6. Guidance to best tools and practices for systematic reviews

    These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [88, 96]. To enable their uptake, Table 4.1 links review components to the corresponding appraisal tool items.

  7. PDF © Joanna Briggs Institute 2017 Critical Appraisal Checklist for

    The systematic review is essentially an analysis of the available literature (that is, evidence) and a judgment of the effectiveness or otherwise of a practice, involving a series of complex ... Although designed for use in systematic reviews, JBI critical appraisal tools can also be used when creating Critically Appraised Topics (CAT), in ...

  8. A systematic review of the content of critical appraisal tools

    A systematic review was undertaken of 121 published critical appraisal tools sourced from 108 papers located on electronic databases and the Internet. The tools were classified according to the study design for which they were intended. Their items were then classified into one of 12 criteria based on their intent.

  9. Full article: Critical appraisal

    Whereas critical appraisal tools help reviewers explore a study's methodological rigour, reporting guidelines allow them to examine the clarity, ... requirements typically include a literature review, a nominal group or 'consensus' process, and a mechanism for item selection (Whiting et al., Citation 2017).

  10. Critical Appraisal Tools and Reporting Guidelines

    Keywords. Critical appraisal tools and reporting guidelines are the two most important instruments available to researchers and practitioners involved in research, evidence-based practice, and policymaking. Each of these instruments has unique characteristics, and both instruments play an essen-tial role in evidence-based practice and decision ...

  11. Critical Appraisal

    Selection of a valid critical appraisal tool, testing the tool with several of the selected studies, and involving two or more reviewers in the appraisal are good practices to follow. 1. Purssell E, McCrae N. How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students. 1st ed. Springer; 2020.

  12. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  13. Writing systematic, scoping, and app reviews: summarising and

    It also provides information on the critical appraisal and quality review tools that authors can use to review both literature and health applications (software 'apps'). ... Two forms of systematic review can be undertaken in relation to apps: (a) a review of the literature reporting on the apps (see for example Munter-Mas et al., 2019); or ...

  14. How to critically appraise a systematic review: an aide for the reader

    A systematic review aims to systematically and transparently summarize the available data on a defined clinical question, via a rigorous search for studies, a critique of the quality of included studies and a qualitative and/or quantitative synthesis. 1 Systematic reviews are at the top of the pyramid in most evidence hierarchies for informing evidence-based healthcare as they are considered ...

  15. Guidance to best tools and practices for systematic reviews

    These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [88, 96]. To enable their uptake, Table 4.1 links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated ...

  16. A systematic review of the content of critical appraisal tools

    Background Consumers of research (researchers, administrators, educators and clinicians) frequently use standard critical appraisal tools to evaluate the quality of published research reports. However, there is no consensus regarding the most appropriate critical appraisal tool for allied health research. We summarized the content, intent, construction and psychometric properties of published ...

  17. How to appraise the literature: basic principles for the busy clinician

    Critical appraisal is the process of carefully, judiciously and systematically examining research to adjudicate its trustworthiness and its value and relevance in clinical practice. The first part ...

  18. A critical appraisal tool for systematic literature reviews in software

    Critical appraisal tools. AMSTAR. 1. Introduction. Inspired by medicine, evidence-based software engineering (EBSE) promotes the use of systematic literature reviews (SLRs) to systematically identify, evaluate and synthesize research on a topic of interest [1]. Since the introduction of SLRs in Software Engineering (SE), the rate of papers ...

  19. Critical appraisal of published literature

    Critical appraisal. ' The process of carefully and systematically examining research to judge its trustworthiness, and its value and relevance in a particular context '. -Burls A [ 1] The objective of medical literature is to provide unbiased, accurate medical information, backed by robust scientific evidence that could aid and enhance ...

  20. CASP Checklists

    Critical Appraisal Checklists. We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types. The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our ...

  21. Searching with critical appraisal tools : Nursing2020 Critical Care

    Figure. Critical appraisal is "the process of assessing and interpreting evidence by systematically considering its validity, results and relevance to an individual's work." 1 Nurses rank feeling incapable of assessing the quality of research as one of the greatest barriers to using research in practice. 2 Participation in a journal club ...

  22. Optimising the value of the critical appraisal skills programme (CASP

    A key stage common to all systematic reviews is quality appraisal of the evidence to be synthesized. 1,8 There is broad debate and little consensus among the academic community over what constitutes 'quality' in qualitative research. 'Qualitative' is an umbrella term that encompasses a diverse range of methods, which makes it difficult to have a 'one size fits all' definition of ...

  23. Five tips for developing useful literature summary tables for writing

    Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research.1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis ...

  24. Reversing frailty in older adults: a scoping review

    Two independent reviewers completed the title, abstract screenings, and full-text review using the eligibility criteria, and independently extracted approximately 10% of the studies. We critically appraised studies using Joanna Briggs critical appraisal checklist/tool, and we used a descriptive and narrative method to synthesize and analyze data.

  25. A scoping review on quality assessment tools used in systematic reviews

    In accordance, Zeng and colleagues submit that further efforts in the development of critical appraisal tools are warranted for areas that currently lack such tools. However, there is an apparent dearth of specific QA tool for real-world evidence (RWE) studies. ... We conducted a scoping review, a type of literature review that is used when it ...