Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesize the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Prevent plagiarism. Run a free check.

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimize research bias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinized by others.
  • They’re thorough : they summarize all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomized control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective (s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

  • Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved April 3, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

University of Maryland Libraries Logo

Systematic Review

  • Library Help
  • What is a Systematic Review (SR)?

Steps of a Systematic Review

  • Framing a Research Question
  • Developing a Search Strategy
  • Searching the Literature
  • Managing the Process
  • Meta-analysis
  • Publishing your Systematic Review

Forms and templates

Logos of MS Word and MS Excel

Image: David Parmenter's Shop

  • PICO Template
  • Inclusion/Exclusion Criteria
  • Database Search Log
  • Review Matrix
  • Cochrane Tool for Assessing Risk of Bias in Included Studies

   • PRISMA Flow Diagram  - Record the numbers of retrieved references and included/excluded studies. You can use the Create Flow Diagram tool to automate the process.

   •  PRISMA Checklist - Checklist of items to include when reporting a systematic review or meta-analysis

PRISMA 2020 and PRISMA-S: Common Questions on Tracking Records and the Flow Diagram

  • PROSPERO Template
  • Manuscript Template
  • Steps of SR (text)
  • Steps of SR (visual)
  • Steps of SR (PIECES)

Adapted from  A Guide to Conducting Systematic Reviews: Steps in a Systematic Review by Cornell University Library

Source: Cochrane Consumers and Communications  (infographics are free to use and licensed under Creative Commons )

Check the following visual resources titled " What Are Systematic Reviews?"

  • Video  with closed captions available
  • Animated Storyboard
  • << Previous: What is a Systematic Review (SR)?
  • Next: Framing a Research Question >>
  • Last Updated: Mar 4, 2024 12:09 PM
  • URL: https://lib.guides.umd.edu/SR
  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • The PRISMA 2020...

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews

  • Related content
  • Peer review
  • Matthew J Page , senior research fellow 1 ,
  • Joanne E McKenzie , associate professor 1 ,
  • Patrick M Bossuyt , professor 2 ,
  • Isabelle Boutron , professor 3 ,
  • Tammy C Hoffmann , professor 4 ,
  • Cynthia D Mulrow , professor 5 ,
  • Larissa Shamseer , doctoral student 6 ,
  • Jennifer M Tetzlaff , research product specialist 7 ,
  • Elie A Akl , professor 8 ,
  • Sue E Brennan , senior research fellow 1 ,
  • Roger Chou , professor 9 ,
  • Julie Glanville , associate director 10 ,
  • Jeremy M Grimshaw , professor 11 ,
  • Asbjørn Hróbjartsson , professor 12 ,
  • Manoj M Lalu , associate scientist and assistant professor 13 ,
  • Tianjing Li , associate professor 14 ,
  • Elizabeth W Loder , professor 15 ,
  • Evan Mayo-Wilson , associate professor 16 ,
  • Steve McDonald , senior research fellow 1 ,
  • Luke A McGuinness , research associate 17 ,
  • Lesley A Stewart , professor and director 18 ,
  • James Thomas , professor 19 ,
  • Andrea C Tricco , scientist and associate professor 20 ,
  • Vivian A Welch , associate professor 21 ,
  • Penny Whiting , associate professor 17 ,
  • David Moher , director and professor 22
  • 1 School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
  • 2 Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, Netherlands
  • 3 Université de Paris, Centre of Epidemiology and Statistics (CRESS), Inserm, F 75004 Paris, France
  • 4 Institute for Evidence-Based Healthcare, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Australia
  • 5 University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA; Annals of Internal Medicine
  • 6 Knowledge Translation Program, Li Ka Shing Knowledge Institute, Toronto, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • 7 Evidence Partners, Ottawa, Canada
  • 8 Clinical Research Institute, American University of Beirut, Beirut, Lebanon; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
  • 9 Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
  • 10 York Health Economics Consortium (YHEC Ltd), University of York, York, UK
  • 11 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada; Department of Medicine, University of Ottawa, Ottawa, Canada
  • 12 Centre for Evidence-Based Medicine Odense (CEBMO) and Cochrane Denmark, Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Open Patient data Exploratory Network (OPEN), Odense University Hospital, Odense, Denmark
  • 13 Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, Canada; Clinical Epidemiology Program, Blueprint Translational Research Group, Ottawa Hospital Research Institute, Ottawa, Canada; Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Canada
  • 14 Department of Ophthalmology, School of Medicine, University of Colorado Denver, Denver, Colorado, United States; Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
  • 15 Division of Headache, Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA; Head of Research, The BMJ , London, UK
  • 16 Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
  • 17 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
  • 18 Centre for Reviews and Dissemination, University of York, York, UK
  • 19 EPPI-Centre, UCL Social Research Institute, University College London, London, UK
  • 20 Li Ka Shing Knowledge Institute of St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Epidemiology Division of the Dalla Lana School of Public Health and the Institute of Health Management, Policy, and Evaluation, University of Toronto, Toronto, Canada; Queen's Collaboration for Health Care Quality Joanna Briggs Institute Centre of Excellence, Queen's University, Kingston, Canada
  • 21 Methods Centre, Bruyère Research Institute, Ottawa, Ontario, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • 22 Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • Correspondence to: M J Page matthew.page{at}monash.edu
  • Accepted 4 January 2021

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement replaces the 2009 statement and includes new reporting guidance that reflects advances in methods to identify, select, appraise, and synthesise studies. The structure and presentation of the items have been modified to facilitate implementation. In this article, we present the PRISMA 2020 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and the revised flow diagrams for original and updated reviews.

Systematic reviews serve many critical roles. They can provide syntheses of the state of knowledge in a field, from which future research priorities can be identified; they can address questions that otherwise could not be answered by individual studies; they can identify problems in primary research that should be rectified in future studies; and they can generate or evaluate theories about how or why phenomena occur. Systematic reviews therefore generate various types of knowledge for different users of reviews (such as patients, healthcare providers, researchers, and policy makers). 1 2 To ensure a systematic review is valuable to users, authors should prepare a transparent, complete, and accurate account of why the review was done, what they did (such as how studies were identified and selected) and what they found (such as characteristics of contributing studies and results of meta-analyses). Up-to-date reporting guidance facilitates authors achieving this. 3

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement published in 2009 (hereafter referred to as PRISMA 2009) 4 5 6 7 8 9 10 is a reporting guideline designed to address poor reporting of systematic reviews. 11 The PRISMA 2009 statement comprised a checklist of 27 items recommended for reporting in systematic reviews and an “explanation and elaboration” paper 12 13 14 15 16 providing additional reporting guidance for each item, along with exemplars of reporting. The recommendations have been widely endorsed and adopted, as evidenced by its co-publication in multiple journals, citation in over 60 000 reports (Scopus, August 2020), endorsement from almost 200 journals and systematic review organisations, and adoption in various disciplines. Evidence from observational studies suggests that use of the PRISMA 2009 statement is associated with more complete reporting of systematic reviews, 17 18 19 20 although more could be done to improve adherence to the guideline. 21

Many innovations in the conduct of systematic reviews have occurred since publication of the PRISMA 2009 statement. For example, technological advances have enabled the use of natural language processing and machine learning to identify relevant evidence, 22 23 24 methods have been proposed to synthesise and present findings when meta-analysis is not possible or appropriate, 25 26 27 and new methods have been developed to assess the risk of bias in results of included studies. 28 29 Evidence on sources of bias in systematic reviews has accrued, culminating in the development of new tools to appraise the conduct of systematic reviews. 30 31 Terminology used to describe particular review processes has also evolved, as in the shift from assessing “quality” to assessing “certainty” in the body of evidence. 32 In addition, the publishing landscape has transformed, with multiple avenues now available for registering and disseminating systematic review protocols, 33 34 disseminating reports of systematic reviews, and sharing data and materials, such as preprint servers and publicly accessible repositories. To capture these advances in the reporting of systematic reviews necessitated an update to the PRISMA 2009 statement.

Summary points

To ensure a systematic review is valuable to users, authors should prepare a transparent, complete, and accurate account of why the review was done, what they did, and what they found

The PRISMA 2020 statement provides updated reporting guidance for systematic reviews that reflects advances in methods to identify, select, appraise, and synthesise studies

The PRISMA 2020 statement consists of a 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and revised flow diagrams for original and updated reviews

We anticipate that the PRISMA 2020 statement will benefit authors, editors, and peer reviewers of systematic reviews, and different users of reviews, including guideline developers, policy makers, healthcare providers, patients, and other stakeholders

Development of PRISMA 2020

A complete description of the methods used to develop PRISMA 2020 is available elsewhere. 35 We identified PRISMA 2009 items that were often reported incompletely by examining the results of studies investigating the transparency of reporting of published reviews. 17 21 36 37 We identified possible modifications to the PRISMA 2009 statement by reviewing 60 documents providing reporting guidance for systematic reviews (including reporting guidelines, handbooks, tools, and meta-research studies). 38 These reviews of the literature were used to inform the content of a survey with suggested possible modifications to the 27 items in PRISMA 2009 and possible additional items. Respondents were asked whether they believed we should keep each PRISMA 2009 item as is, modify it, or remove it, and whether we should add each additional item. Systematic review methodologists and journal editors were invited to complete the online survey (110 of 220 invited responded). We discussed proposed content and wording of the PRISMA 2020 statement, as informed by the review and survey results, at a 21-member, two-day, in-person meeting in September 2018 in Edinburgh, Scotland. Throughout 2019 and 2020, we circulated an initial draft and five revisions of the checklist and explanation and elaboration paper to co-authors for feedback. In April 2020, we invited 22 systematic reviewers who had expressed interest in providing feedback on the PRISMA 2020 checklist to share their views (via an online survey) on the layout and terminology used in a preliminary version of the checklist. Feedback was received from 15 individuals and considered by the first author, and any revisions deemed necessary were incorporated before the final version was approved and endorsed by all co-authors.

The PRISMA 2020 statement

Scope of the guideline.

The PRISMA 2020 statement has been designed primarily for systematic reviews of studies that evaluate the effects of health interventions, irrespective of the design of the included studies. However, the checklist items are applicable to reports of systematic reviews evaluating other interventions (such as social or educational interventions), and many items are applicable to systematic reviews with objectives other than evaluating interventions (such as evaluating aetiology, prevalence, or prognosis). PRISMA 2020 is intended for use in systematic reviews that include synthesis (such as pairwise meta-analysis or other statistical synthesis methods) or do not include synthesis (for example, because only one eligible study is identified). The PRISMA 2020 items are relevant for mixed-methods systematic reviews (which include quantitative and qualitative studies), but reporting guidelines addressing the presentation and synthesis of qualitative data should also be consulted. 39 40 PRISMA 2020 can be used for original systematic reviews, updated systematic reviews, or continually updated (“living”) systematic reviews. However, for updated and living systematic reviews, there may be some additional considerations that need to be addressed. Where there is relevant content from other reporting guidelines, we reference these guidelines within the items in the explanation and elaboration paper 41 (such as PRISMA-Search 42 in items 6 and 7, Synthesis without meta-analysis (SWiM) reporting guideline 27 in item 13d). Box 1 includes a glossary of terms used throughout the PRISMA 2020 statement.

Glossary of terms

Systematic review —A review that uses explicit, systematic methods to collate and synthesise findings of studies that address a clearly formulated question 43

Statistical synthesis —The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates (described below) and other methods, such as combining P values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect (see McKenzie and Brennan 25 for a description of each method)

Meta-analysis of effect estimates —A statistical technique used to synthesise results when study effect estimates and their variances are available, yielding a quantitative summary of results 25

Outcome —An event or measurement collected for participants in a study (such as quality of life, mortality)

Result —The combination of a point estimate (such as a mean difference, risk ratio, or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome

Report —A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information

Record —The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.

Study —An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses

PRISMA 2020 is not intended to guide systematic review conduct, for which comprehensive resources are available. 43 44 45 46 However, familiarity with PRISMA 2020 is useful when planning and conducting systematic reviews to ensure that all recommended information is captured. PRISMA 2020 should not be used to assess the conduct or methodological quality of systematic reviews; other tools exist for this purpose. 30 31 Furthermore, PRISMA 2020 is not intended to inform the reporting of systematic review protocols, for which a separate statement is available (PRISMA for Protocols (PRISMA-P) 2015 statement 47 48 ). Finally, extensions to the PRISMA 2009 statement have been developed to guide reporting of network meta-analyses, 49 meta-analyses of individual participant data, 50 systematic reviews of harms, 51 systematic reviews of diagnostic test accuracy studies, 52 and scoping reviews 53 ; for these types of reviews we recommend authors report their review in accordance with the recommendations in PRISMA 2020 along with the guidance specific to the extension.

How to use PRISMA 2020

The PRISMA 2020 statement (including the checklists, explanation and elaboration, and flow diagram) replaces the PRISMA 2009 statement, which should no longer be used. Box 2 summarises noteworthy changes from the PRISMA 2009 statement. The PRISMA 2020 checklist includes seven sections with 27 items, some of which include sub-items ( table 1 ). A checklist for journal and conference abstracts for systematic reviews is included in PRISMA 2020. This abstract checklist is an update of the 2013 PRISMA for Abstracts statement, 54 reflecting new and modified content in PRISMA 2020 ( table 2 ). A template PRISMA flow diagram is provided, which can be modified depending on whether the systematic review is original or updated ( fig 1 ).

Noteworthy changes to the PRISMA 2009 statement

Inclusion of the abstract reporting checklist within PRISMA 2020 (see item #2 and table 2 ).

Movement of the ‘Protocol and registration’ item from the start of the Methods section of the checklist to a new Other section, with addition of a sub-item recommending authors describe amendments to information provided at registration or in the protocol (see item #24a-24c).

Modification of the ‘Search’ item to recommend authors present full search strategies for all databases, registers and websites searched, not just at least one database (see item #7).

Modification of the ‘Study selection’ item in the Methods section to emphasise the reporting of how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process (see item #8).

Addition of a sub-item to the ‘Data items’ item recommending authors report how outcomes were defined, which results were sought, and methods for selecting a subset of results from included studies (see item #10a).

Splitting of the ‘Synthesis of results’ item in the Methods section into six sub-items recommending authors describe: the processes used to decide which studies were eligible for each synthesis; any methods required to prepare the data for synthesis; any methods used to tabulate or visually display results of individual studies and syntheses; any methods used to synthesise results; any methods used to explore possible causes of heterogeneity among study results (such as subgroup analysis, meta-regression); and any sensitivity analyses used to assess robustness of the synthesised results (see item #13a-13f).

Addition of a sub-item to the ‘Study selection’ item in the Results section recommending authors cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded (see item #16b).

Splitting of the ‘Synthesis of results’ item in the Results section into four sub-items recommending authors: briefly summarise the characteristics and risk of bias among studies contributing to the synthesis; present results of all statistical syntheses conducted; present results of any investigations of possible causes of heterogeneity among study results; and present results of any sensitivity analyses (see item #20a-20d).

Addition of new items recommending authors report methods for and results of an assessment of certainty (or confidence) in the body of evidence for an outcome (see items #15 and #22).

Addition of a new item recommending authors declare any competing interests (see item #26).

Addition of a new item recommending authors indicate whether data, analytic code and other materials used in the review are publicly available and if so, where they can be found (see item #27).

PRISMA 2020 item checklist

  • View inline

PRISMA 2020 for Abstracts checklist*

Fig 1

PRISMA 2020 flow diagram template for systematic reviews. The new design is adapted from flow diagrams proposed by Boers, 55 Mayo-Wilson et al. 56 and Stovold et al. 57 The boxes in grey should only be completed if applicable; otherwise they should be removed from the flow diagram. Note that a “report” could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report or any other document providing relevant information.

  • Download figure
  • Open in new tab
  • Download powerpoint

We recommend authors refer to PRISMA 2020 early in the writing process, because prospective consideration of the items may help to ensure that all the items are addressed. To help keep track of which items have been reported, the PRISMA statement website ( http://www.prisma-statement.org/ ) includes fillable templates of the checklists to download and complete (also available in the data supplement on bmj.com). We have also created a web application that allows users to complete the checklist via a user-friendly interface 58 (available at https://prisma.shinyapps.io/checklist/ and adapted from the Transparency Checklist app 59 ). The completed checklist can be exported to Word or PDF. Editable templates of the flow diagram can also be downloaded from the PRISMA statement website.

We have prepared an updated explanation and elaboration paper, in which we explain why reporting of each item is recommended and present bullet points that detail the reporting recommendations (which we refer to as elements). 41 The bullet-point structure is new to PRISMA 2020 and has been adopted to facilitate implementation of the guidance. 60 61 An expanded checklist, which comprises an abridged version of the elements presented in the explanation and elaboration paper, with references and some examples removed, is available in the data supplement on bmj.com. Consulting the explanation and elaboration paper is recommended if further clarity or information is required.

Journals and publishers might impose word and section limits, and limits on the number of tables and figures allowed in the main report. In such cases, if the relevant information for some items already appears in a publicly accessible review protocol, referring to the protocol may suffice. Alternatively, placing detailed descriptions of the methods used or additional results (such as for less critical outcomes) in supplementary files is recommended. Ideally, supplementary files should be deposited to a general-purpose or institutional open-access repository that provides free and permanent access to the material (such as Open Science Framework, Dryad, figshare). A reference or link to the additional information should be included in the main report. Finally, although PRISMA 2020 provides a template for where information might be located, the suggested location should not be seen as prescriptive; the guiding principle is to ensure the information is reported.

Use of PRISMA 2020 has the potential to benefit many stakeholders. Complete reporting allows readers to assess the appropriateness of the methods, and therefore the trustworthiness of the findings. Presenting and summarising characteristics of studies contributing to a synthesis allows healthcare providers and policy makers to evaluate the applicability of the findings to their setting. Describing the certainty in the body of evidence for an outcome and the implications of findings should help policy makers, managers, and other decision makers formulate appropriate recommendations for practice or policy. Complete reporting of all PRISMA 2020 items also facilitates replication and review updates, as well as inclusion of systematic reviews in overviews (of systematic reviews) and guidelines, so teams can leverage work that is already done and decrease research waste. 36 62 63

We updated the PRISMA 2009 statement by adapting the EQUATOR Network’s guidance for developing health research reporting guidelines. 64 We evaluated the reporting completeness of published systematic reviews, 17 21 36 37 reviewed the items included in other documents providing guidance for systematic reviews, 38 surveyed systematic review methodologists and journal editors for their views on how to revise the original PRISMA statement, 35 discussed the findings at an in-person meeting, and prepared this document through an iterative process. Our recommendations are informed by the reviews and survey conducted before the in-person meeting, theoretical considerations about which items facilitate replication and help users assess the risk of bias and applicability of systematic reviews, and co-authors’ experience with authoring and using systematic reviews.

Various strategies to increase the use of reporting guidelines and improve reporting have been proposed. They include educators introducing reporting guidelines into graduate curricula to promote good reporting habits of early career scientists 65 ; journal editors and regulators endorsing use of reporting guidelines 18 ; peer reviewers evaluating adherence to reporting guidelines 61 66 ; journals requiring authors to indicate where in their manuscript they have adhered to each reporting item 67 ; and authors using online writing tools that prompt complete reporting at the writing stage. 60 Multi-pronged interventions, where more than one of these strategies are combined, may be more effective (such as completion of checklists coupled with editorial checks). 68 However, of 31 interventions proposed to increase adherence to reporting guidelines, the effects of only 11 have been evaluated, mostly in observational studies at high risk of bias due to confounding. 69 It is therefore unclear which strategies should be used. Future research might explore barriers and facilitators to the use of PRISMA 2020 by authors, editors, and peer reviewers, designing interventions that address the identified barriers, and evaluating those interventions using randomised trials. To inform possible revisions to the guideline, it would also be valuable to conduct think-aloud studies 70 to understand how systematic reviewers interpret the items, and reliability studies to identify items where there is varied interpretation of the items.

We encourage readers to submit evidence that informs any of the recommendations in PRISMA 2020 (via the PRISMA statement website: http://www.prisma-statement.org/ ). To enhance accessibility of PRISMA 2020, several translations of the guideline are under way (see available translations at the PRISMA statement website). We encourage journal editors and publishers to raise awareness of PRISMA 2020 (for example, by referring to it in journal “Instructions to authors”), endorsing its use, advising editors and peer reviewers to evaluate submitted systematic reviews against the PRISMA 2020 checklists, and making changes to journal policies to accommodate the new reporting recommendations. We recommend existing PRISMA extensions 47 49 50 51 52 53 71 72 be updated to reflect PRISMA 2020 and advise developers of new PRISMA extensions to use PRISMA 2020 as the foundation document.

We anticipate that the PRISMA 2020 statement will benefit authors, editors, and peer reviewers of systematic reviews, and different users of reviews, including guideline developers, policy makers, healthcare providers, patients, and other stakeholders. Ultimately, we hope that uptake of the guideline will lead to more transparent, complete, and accurate reporting of systematic reviews, thus facilitating evidence based decision making.

Acknowledgments

We dedicate this paper to the late Douglas G Altman and Alessandro Liberati, whose contributions were fundamental to the development and implementation of the original PRISMA statement.

We thank the following contributors who completed the survey to inform discussions at the development meeting: Xavier Armoiry, Edoardo Aromataris, Ana Patricia Ayala, Ethan M Balk, Virginia Barbour, Elaine Beller, Jesse A Berlin, Lisa Bero, Zhao-Xiang Bian, Jean Joel Bigna, Ferrán Catalá-López, Anna Chaimani, Mike Clarke, Tammy Clifford, Ioana A Cristea, Miranda Cumpston, Sofia Dias, Corinna Dressler, Ivan D Florez, Joel J Gagnier, Chantelle Garritty, Long Ge, Davina Ghersi, Sean Grant, Gordon Guyatt, Neal R Haddaway, Julian PT Higgins, Sally Hopewell, Brian Hutton, Jamie J Kirkham, Jos Kleijnen, Julia Koricheva, Joey SW Kwong, Toby J Lasserson, Julia H Littell, Yoon K Loke, Malcolm R Macleod, Chris G Maher, Ana Marušic, Dimitris Mavridis, Jessie McGowan, Matthew DF McInnes, Philippa Middleton, Karel G Moons, Zachary Munn, Jane Noyes, Barbara Nußbaumer-Streit, Donald L Patrick, Tatiana Pereira-Cenci, Ba’ Pham, Bob Phillips, Dawid Pieper, Michelle Pollock, Daniel S Quintana, Drummond Rennie, Melissa L Rethlefsen, Hannah R Rothstein, Maroeska M Rovers, Rebecca Ryan, Georgia Salanti, Ian J Saldanha, Margaret Sampson, Nancy Santesso, Rafael Sarkis-Onofre, Jelena Savović, Christopher H Schmid, Kenneth F Schulz, Guido Schwarzer, Beverley J Shea, Paul G Shekelle, Farhad Shokraneh, Mark Simmonds, Nicole Skoetz, Sharon E Straus, Anneliese Synnot, Emily E Tanner-Smith, Brett D Thombs, Hilary Thomson, Alexander Tsertsvadze, Peter Tugwell, Tari Turner, Lesley Uttley, Jeffrey C Valentine, Matt Vassar, Areti Angeliki Veroniki, Meera Viswanathan, Cole Wayant, Paul Whaley, and Kehu Yang. We thank the following contributors who provided feedback on a preliminary version of the PRISMA 2020 checklist: Jo Abbott, Fionn Büttner, Patricia Correia-Santos, Victoria Freeman, Emily A Hennessy, Rakibul Islam, Amalia (Emily) Karahalios, Kasper Krommes, Andreas Lundh, Dafne Port Nascimento, Davina Robson, Catherine Schenck-Yglesias, Mary M Scott, Sarah Tanveer and Pavel Zhelnov. We thank Abigail H Goben, Melissa L Rethlefsen, Tanja Rombey, Anna Scott, and Farhad Shokraneh for their helpful comments on the preprints of the PRISMA 2020 papers. We thank Edoardo Aromataris, Stephanie Chang, Toby Lasserson and David Schriger for their helpful peer review comments on the PRISMA 2020 papers.

Contributors: JEM and DM are joint senior authors. MJP, JEM, PMB, IB, TCH, CDM, LS, and DM conceived this paper and designed the literature review and survey conducted to inform the guideline content. MJP conducted the literature review, administered the survey and analysed the data for both. MJP prepared all materials for the development meeting. MJP and JEM presented proposals at the development meeting. All authors except for TCH, JMT, EAA, SEB, and LAM attended the development meeting. MJP and JEM took and consolidated notes from the development meeting. MJP and JEM led the drafting and editing of the article. JEM, PMB, IB, TCH, LS, JMT, EAA, SEB, RC, JG, AH, TL, EMW, SM, LAM, LAS, JT, ACT, PW, and DM drafted particular sections of the article. All authors were involved in revising the article critically for important intellectual content. All authors approved the final version of the article. MJP is the guarantor of this work. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: There was no direct funding for this research. MJP is supported by an Australian Research Council Discovery Early Career Researcher Award (DE200101618) and was previously supported by an Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship (1088535) during the conduct of this research. JEM is supported by an Australian NHMRC Career Development Fellowship (1143429). TCH is supported by an Australian NHMRC Senior Research Fellowship (1154607). JMT is supported by Evidence Partners Inc. JMG is supported by a Tier 1 Canada Research Chair in Health Knowledge Transfer and Uptake. MML is supported by The Ottawa Hospital Anaesthesia Alternate Funds Association and a Faculty of Medicine Junior Research Chair. TL is supported by funding from the National Eye Institute (UG1EY020522), National Institutes of Health, United States. LAM is supported by a National Institute for Health Research Doctoral Research Fellowship (DRF-2018-11-ST2-048). ACT is supported by a Tier 2 Canada Research Chair in Knowledge Synthesis. DM is supported in part by a University Research Chair, University of Ottawa. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/conflicts-of-interest/ and declare: EL is head of research for the BMJ ; MJP is an editorial board member for PLOS Medicine ; ACT is an associate editor and MJP, TL, EMW, and DM are editorial board members for the Journal of Clinical Epidemiology ; DM and LAS were editors in chief, LS, JMT, and ACT are associate editors, and JG is an editorial board member for Systematic Reviews . None of these authors were involved in the peer review process or decision to publish. TCH has received personal fees from Elsevier outside the submitted work. EMW has received personal fees from the American Journal for Public Health , for which he is the editor for systematic reviews. VW is editor in chief of the Campbell Collaboration, which produces systematic reviews, and co-convenor of the Campbell and Cochrane equity methods group. DM is chair of the EQUATOR Network, IB is adjunct director of the French EQUATOR Centre and TCH is co-director of the Australasian EQUATOR Centre, which advocates for the use of reporting guidelines to improve the quality of reporting in research articles. JMT received salary from Evidence Partners, creator of DistillerSR software for systematic reviews; Evidence Partners was not involved in the design or outcomes of the statement, and the views expressed solely represent those of the author.

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient and public involvement: Patients and the public were not involved in this methodological research. We plan to disseminate the research widely, including to community participants in evidence synthesis organisations.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ .

  • Gurevitch J ,
  • Koricheva J ,
  • Nakagawa S ,
  • Liberati A ,
  • Tetzlaff J ,
  • Altman DG ,
  • PRISMA Group
  • Tricco AC ,
  • Sampson M ,
  • Shamseer L ,
  • Leoncini E ,
  • de Belvis G ,
  • Ricciardi W ,
  • Fowler AJ ,
  • Leclercq V ,
  • Beaudart C ,
  • Ajamieh S ,
  • Rabenda V ,
  • Tirelli E ,
  • O’Mara-Eves A ,
  • McNaught J ,
  • Ananiadou S
  • Marshall IJ ,
  • Noel-Storr A ,
  • Higgins JPT ,
  • Chandler J ,
  • McKenzie JE ,
  • López-López JA ,
  • Becker BJ ,
  • Campbell M ,
  • Sterne JAC ,
  • Savović J ,
  • Sterne JA ,
  • Hernán MA ,
  • Reeves BC ,
  • Whiting P ,
  • Higgins JP ,
  • ROBIS group
  • Hultcrantz M ,
  • Stewart L ,
  • Bossuyt PM ,
  • Flemming K ,
  • McInnes E ,
  • France EF ,
  • Cunningham M ,
  • Rethlefsen ML ,
  • Kirtley S ,
  • Waffenschmidt S ,
  • PRISMA-S Group
  • ↵ Higgins JPT, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions : Version 6.0. Cochrane, 2019. Available from https://training.cochrane.org/handbook .
  • Dekkers OM ,
  • Vandenbroucke JP ,
  • Cevallos M ,
  • Renehan AG ,
  • ↵ Cooper H, Hedges LV, Valentine JV, eds. The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation, 2019.
  • IOM (Institute of Medicine)
  • PRISMA-P Group
  • Salanti G ,
  • Caldwell DM ,
  • Stewart LA ,
  • PRISMA-IPD Development Group
  • Zorzela L ,
  • Ioannidis JP ,
  • PRISMAHarms Group
  • McInnes MDF ,
  • Thombs BD ,
  • and the PRISMA-DTA Group
  • Beller EM ,
  • Glasziou PP ,
  • PRISMA for Abstracts Group
  • Mayo-Wilson E ,
  • Dickersin K ,
  • MUDS investigators
  • Stovold E ,
  • Beecher D ,
  • Noel-Storr A
  • McGuinness LA
  • Sarafoglou A ,
  • Boutron I ,
  • Giraudeau B ,
  • Porcher R ,
  • Chauvin A ,
  • Schulz KF ,
  • Schroter S ,
  • Stevens A ,
  • Weinstein E ,
  • Macleod MR ,
  • IICARus Collaboration
  • Kirkham JJ ,
  • Petticrew M ,
  • Tugwell P ,
  • PRISMA-Equity Bellagio group

research methodology systematic review

Conducting a Systematic Review: A Practical Guide

  • Reference work entry
  • First Online: 13 January 2019
  • Cite this reference work entry

Book cover

  • Freya MacMillan 2 ,
  • Kate A. McBride 3 ,
  • Emma S. George 4 &
  • Genevieve Z. Steiner 5  

2245 Accesses

2 Citations

It can be challenging to conduct a systematic review with limited experience and skills in undertaking such a task. This chapter provides a practical guide to undertaking a systematic review, providing step-by-step instructions to guide the individual through the process from start to finish. The chapter begins with defining what a systematic review is, reviewing its various components, turning a research question into a search strategy, developing a systematic review protocol, followed by searching for relevant literature and managing citations. Next, the chapter focuses on documenting the characteristics of included studies and summarizing findings, extracting data, methods for assessing risk of bias and considering heterogeneity, and undertaking meta-analyses. Last, the chapter explores creating a narrative and interpreting findings. Practical tips and examples from existing literature are utilized throughout the chapter to assist readers in their learning. By the end of this chapter, the reader will have the knowledge to conduct their own systematic review.

  • Systematic review
  • Search strategy
  • Risk of bias
  • Heterogeneity
  • Meta-analysis
  • Forest plot
  • Funnel plot
  • Meta-synthesis

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Barbour RS. Checklists for improving rigour in qualitative research: a case of the tail wagging the dog? BMJ. 2001;322(7294):1115–7.

Article   Google Scholar  

Butler A, Hall H, Copnell B. A guide to writing a qualitative systematic review protocol to enhance evidence-based practice in nursing and health care. Worldviews Evid-Based Nurs. 2016;13(3):241–9.

Cook DJ, Mulrow CD, Haynes RB. Systematic reviews: synthesis of best evidence for clinical decisions. Ann Intern Med. 1997;126(5):376–80.

Dixon-Woods M, Bonas S, Booth A, Jones DR, Miller T, Sutton AJ, … Young B. How can systematic reviews incorporate qualitative research? A critical perspective. Qual Res. 2006;6(1):27–44. https://doi.org/10.1177/1468794106058867 .

Greenhalgh T. How to read a paper: the basics of evidence-based medicine. 4th ed. Chichester/Hoboken: Wiley-Blackwell; 2010.

Google Scholar  

Hannes K, Lockwood C, Pearson A. A comparative analysis of three online appraisal instruments’ ability to assess validity in qualitative research. Qual Health Res. 2010;20(12):1736–43. https://doi.org/10.1177/1049732310378656 .

Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions (Version 5.1.0 [updated March 2011]). The Cochrane Collaboration; 2011.  http://handbook-5-1.cochrane.org/

Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, … Sterne JAC. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343. https://doi.org/10.1136/bmj.d5928 .

Hillier S, Grimmer-Somers K, Merlin T, Middleton P, Salisbury J, Tooher R, Weston A. FORM: an Australian method for formulating and grading recommendations in evidence-based clinical guidelines. BMC Med Res Methodol. 2011;11:23. https://doi.org/10.1186/1471-2288-11-23 .

Humphreys DK, Panter J, Ogilvie D. Questioning the application of risk of bias tools in appraising evidence from natural experimental studies: critical reflections on Benton et al., IJBNPA 2016. Int J Behav Nutr Phys Act. 2017; 14 (1):49. https://doi.org/10.1186/s12966-017-0500-4 .

King R, Hooper B, Wood W. Using bibliographic software to appraise and code data in educational systematic review research. Med Teach. 2011;33(9):719–23. https://doi.org/10.3109/0142159x.2011.558138 .

Koelemay MJ, Vermeulen H. Quick guide to systematic reviews and meta-analysis. Eur J Vasc Endovasc Surg. 2016;51(2):309. https://doi.org/10.1016/j.ejvs.2015.11.010 .

Lucas PJ, Baird J, Arai L, Law C, Roberts HM. Worked examples of alternative methods for the synthesis of qualitative and quantitative research in systematic reviews. BMC Med Res Methodol. 2007;7:4–4. https://doi.org/10.1186/1471-2288-7-4 .

MacMillan F, Kirk A, Mutrie N, Matthews L, Robertson K, Saunders DH. A systematic review of physical activity and sedentary behavior intervention studies in youth with type 1 diabetes: study characteristics, intervention design, and efficacy. Pediatr Diabetes. 2014;15(3):175–89. https://doi.org/10.1111/pedi.12060 .

MacMillan F, Karamacoska D, El Masri A, McBride KA, Steiner GZ, Cook A, … George ES. A systematic review of health promotion intervention studies in the police force: study characteristics, intervention design and impacts on health. Occup Environ Med. 2017. https://doi.org/10.1136/oemed-2017-104430 .

Matthews L, Kirk A, MacMillan F, Mutrie N. Can physical activity interventions for adults with type 2 diabetes be translated into practice settings? A systematic review using the RE-AIM framework. Transl Behav Med. 2014;4(1):60–78. https://doi.org/10.1007/s13142-013-0235-y .

Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. BMC Med Res Methodol. 2001;1:2. https://doi.org/10.1186/1471-2288-1-2 .

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097 .

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4:1. https://doi.org/10.1186/2046-4053-4-1 .

Mulrow CD, Cook DJ, Davidoff F. Systematic reviews: critical links in the great chain of evidence. Ann Intern Med. 1997;126(5):389–91.

Peters MDJ. Managing and coding references for systematic reviews and scoping reviews in EndNote. Med Ref Serv Q. 2017;36(1):19–31. https://doi.org/10.1080/02763869.2017.1259891 .

Steiner GZ, Mathersul DC, MacMillan F, Camfield DA, Klupp NL, Seto SW, … Chang DH. A systematic review of intervention studies examining nutritional and herbal therapies for mild cognitive impairment and dementia using neuroimaging methods: study characteristics and intervention efficacy. Evid Based Complement Alternat Med. 2017;2017:21. https://doi.org/10.1155/2017/6083629 .

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, … Higgins JP. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355. https://doi.org/10.1136/bmj.i4919 .

Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–57. https://doi.org/10.1093/intqhc/mzm042 .

Tong A, Palmer S, Craig JC, Strippoli GFM. A guide to reading and using systematic reviews of qualitative research. Nephrol Dial Transplant. 2016;31(6):897–903. https://doi.org/10.1093/ndt/gfu354 .

Uman LS. Systematic reviews and meta-analyses. J Can Acad Child Adolesc Psychiatry. 2011;20(1):57–9.

Download references

Author information

Authors and affiliations.

School of Science and Health and Translational Health Research Institute (THRI), Western Sydney University, Penrith, NSW, Australia

Freya MacMillan

School of Medicine and Translational Health Research Institute, Western Sydney University, Sydney, NSW, Australia

Kate A. McBride

School of Science and Health, Western Sydney University, Sydney, NSW, Australia

Emma S. George

NICM and Translational Health Research Institute (THRI), Western Sydney University, Penrith, NSW, Australia

Genevieve Z. Steiner

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Freya MacMillan .

Editor information

Editors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

MacMillan, F., McBride, K.A., George, E.S., Steiner, G.Z. (2019). Conducting a Systematic Review: A Practical Guide. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_113

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_113

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Systematic Review | Definition, Examples & Guide

Systematic Review | Definition, Examples & Guide

Published on 15 June 2022 by Shaun Turney . Revised on 17 October 2022.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question ‘What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?’

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs meta-analysis, systematic review vs literature review, systematic review vs scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce research bias . The methods are repeatable , and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesise the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesising all available evidence and evaluating the quality of the evidence. Synthesising means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism, run a free check.

Systematic reviews often quantitatively synthesise the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesise results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarise and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimise bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimise research b ias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinised by others.
  • They’re thorough : they summarise all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fourth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomised control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective(s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesise the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Grey literature: Grey literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of grey literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of grey literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Grey literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarise what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgement of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomised into the control and treatment groups.

Step 6: Synthesise the data

Synthesising the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesising the data:

  • Narrative ( qualitative ): Summarise the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarise and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analysed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a dissertation , thesis, research paper , or proposal .

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarise yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2022, October 17). Systematic Review | Definition, Examples & Guide. Scribbr. Retrieved 2 April 2024, from https://www.scribbr.co.uk/research-methods/systematic-reviews/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, what is a literature review | guide, template, & examples, exploratory research | definition, guide, & examples, what is peer review | types & examples.

Introduction to Systematic Reviews

In this guide.

  • Introduction
  • Lane Research Services
  • Types of Reviews
  • Systematic Review Process
  • Protocols & Guidelines
  • Data Extraction and Screening
  • Resources & Tools
  • Systematic Review Online Course

What is a Systematic Review?

Knowledge synthesis is a term used to describe the method of synthesizing results from individual studies and interpreting these results within the larger body of knowledge on the topic. It requires highly structured, transparent and reproducible methods using quantitative and/or qualitative evidence. Systematic reviews, meta-analyses, scoping reviews, rapid reviews, narrative syntheses, practice guidelines, among others, are all forms of knowledge syntheses. For more information on types of reviews, visit the "Types of Reviews" tab on the left.

A systematic review varies from an ordinary literature review in that it uses a comprehensive, methodical, transparent and reproducible search strategy to ensure conclusions are as unbiased and closer to the truth as possible. The Cochrane Handbook for Systematic Reviews of Interventions  defines a systematic review as:

"A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question. Researchers conducting systematic reviews use explicit methods aimed at minimizing bias, in order to produce more reliable findings that can be used to inform decision making [...] This involves: the a priori specification of a research question; clarity on the scope of the review and which studies are eligible for inclusion; making every effort to find all relevant research and to ensure that issues of bias in included studies are accounted for; and analysing the included studies in order to draw conclusions based on all the identified research in an impartial and objective way." ( Chapter 1: Starting a review )

What are systematic reviews? from Cochrane on Youtube .

  • Next: Lane Research Services >>
  • Last Updated: Nov 1, 2023 2:50 PM
  • URL: https://laneguides.stanford.edu/systematicreviews

Methodology of a systematic review

Affiliations.

  • 1 Hospital Universitario La Paz, Madrid, España. Electronic address: [email protected].
  • 2 Hospital Universitario Fundación Alcorcón, Madrid, España.
  • 3 Instituto Valenciano de Oncología, Valencia, España.
  • 4 Hospital Universitario de Cabueñes, Gijón, Asturias, España.
  • 5 Hospital Universitario Ramón y Cajal, Madrid, España.
  • 6 Hospital Universitario Gregorio Marañón, Madrid, España.
  • 7 Hospital Universitario de Canarias, Tenerife, España.
  • 8 Hospital Clínic, Barcelona, España; EAU Guidelines Office Board Member.
  • PMID: 29731270
  • DOI: 10.1016/j.acuro.2018.01.010

Context: The objective of evidence-based medicine is to employ the best scientific information available to apply to clinical practice. Understanding and interpreting the scientific evidence involves understanding the available levels of evidence, where systematic reviews and meta-analyses of clinical trials are at the top of the levels-of-evidence pyramid.

Acquisition of evidence: The review process should be well developed and planned to reduce biases and eliminate irrelevant and low-quality studies. The steps for implementing a systematic review include (i) correctly formulating the clinical question to answer (PICO), (ii) developing a protocol (inclusion and exclusion criteria), (iii) performing a detailed and broad literature search and (iv) screening the abstracts of the studies identified in the search and subsequently of the selected complete texts (PRISMA).

Synthesis of the evidence: Once the studies have been selected, we need to (v) extract the necessary data into a form designed in the protocol to summarise the included studies, (vi) assess the biases of each study, identifying the quality of the available evidence, and (vii) develop tables and text that synthesise the evidence.

Conclusions: A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. To improve scientific writing, the methodology is shown in a structured manner to implement a systematic review.

Keywords: Meta-analysis; Metaanálisis; Methodology; Metodología; Revisión sistemática; Systematic review.

Copyright © 2018 AEU. Publicado por Elsevier España, S.L.U. All rights reserved.

  • Systematic Reviews as Topic*
  • Research article
  • Open access
  • Published: 03 February 2011

Methodology in conducting a systematic review of systematic reviews of healthcare interventions

  • Valerie Smith 1 ,
  • Declan Devane 2 ,
  • Cecily M Begley 1 &
  • Mike Clarke 3  

BMC Medical Research Methodology volume  11 , Article number:  15 ( 2011 ) Cite this article

192k Accesses

642 Citations

54 Altmetric

Metrics details

Hundreds of studies of maternity care interventions have been published, too many for most people involved in providing maternity care to identify and consider when making decisions. It became apparent that systematic reviews of individual studies were required to appraise, summarise and bring together existing studies in a single place. However, decision makers are increasingly faced by a plethora of such reviews and these are likely to be of variable quality and scope, with more than one review of important topics. Systematic reviews (or overviews) of reviews are a logical and appropriate next step, allowing the findings of separate reviews to be compared and contrasted, providing clinical decision makers with the evidence they need.

The methods used to identify and appraise published and unpublished reviews systematically, drawing on our experiences and good practice in the conduct and reporting of systematic reviews are described. The process of identifying and appraising all published reviews allows researchers to describe the quality of this evidence base, summarise and compare the review's conclusions and discuss the strength of these conclusions.

Methodological challenges and possible solutions are described within the context of (i) sources, (ii) study selection, (iii) quality assessment (i.e. the extent of searching undertaken for the reviews, description of study selection and inclusion criteria, comparability of included studies, assessment of publication bias and assessment of heterogeneity), (iv) presentation of results, and (v) implications for practice and research.

Conducting a systematic review of reviews highlights the usefulness of bringing together a summary of reviews in one place, where there is more than one review on an important topic. The methods described here should help clinicians to review and appraise published reviews systematically, and aid evidence-based clinical decision-making.

Peer Review reports

The healthcare literature contains hundreds of thousands of studies of healthcare interventions, growing at tens of thousands per year [ 1 ]. In most areas of health care, there are too many studies for people involved in providing care to identify and consider when making decisions. Researchers have recognised this problem and many have accepted the challenge of preparing systematic reviews of individual studies in order to appraise, summarise and bring together existing studies in a single place. More recently, calls have been made for 'rapid reviews' to provide decision-makers with the evidence they need in a shorter time frame, but the possible limitations of such 'rapid reviews', compared to full systematic reviews, require further research [ 2 ]. There are now several organisations dedicated to the preparation of systematic reviews, including the National Institute of Health and Clinical Excellence (NICE) in the UK, the Evidence-based Practice Centre Program, funded by AHRQ in the USA, the Joanna Briggs Institute, and the international Campbell and Cochrane Collaborations, with the latter being the largest single producer of systematic reviews in health care, with more than 4200 published by the end of 2010 [ 3 ]. In recent years however, decision makers who were once overwhelmed by the number of individual studies have become faced by a plethora of reviews [ 4 , 5 ]. These reviews are likely to be of variable quality and scope, with more than one systematic review on important topics. For example, a comprehensive search of twelve health related citation databases (using database specific search strategies) identified over thirty reviews evaluating the effectiveness of nurse and midwife-led interventions on clinical outcomes, as part of an on-going study into the impact of the role of nurse and midwife specialist and advanced practitioners in Ireland. A logical and appropriate next step is to conduct a systematic review of reviews of the topic under consideration, allowing the findings of separate reviews to be compared and contrasted, thereby providing clinical decision makers with the evidence they need. We have been involved in several examples of systematic reviews (or overviews) of reviews [ 6 – 9 ] and The Cochrane Collaboration introduced a new type of Cochrane review in 2009 [ 10 ], the overview of Cochrane reviews, with two full overviews [ 11 , 12 ] and protocols for five more [ 13 – 17 ] published by October 2010. These reviews of reviews aims to provide a summary of evidence from more than one systematic review at a variety of different levels, including the combination of different interventions, different outcomes, different conditions, problems or populations, or the provision of a summary of evidence on the adverse effects of an intervention [ 10 ].

This paper describes the conduct and methods used to identify and appraise published and unpublished systematic reviews systematically. It draws on our experience of conducting several of these reviews of reviews in recent years. The purpose of such an overview, in identifying and appraising all published reviews is to describe their quality, summarise and compare their conclusions and discuss the strength of these conclusions, so that best evidence is made available to clinical decision-makers. During the review process a number of methodological challenges can arise. We describe these challenges and offer possible solutions to overcome them. We hope to provide a guide to clinicians and researchers who wish to conduct a systematic review of reviews and to share our experiences.

The objective and the reasons for conducting a systematic review of reviews should be made explicit at the start of the process, as this is likely to influence the methods used for the review. In formulating the scope for the review of reviews, the PICOS (participants, interventions, comparators, outcomes, and study design) structure may be helpful. This can help the reviewers to delineate clearly if they wish, for example, to compare and summarise systematic reviews that address the same treatment comparison or a particular intervention for a population or condition, or a range of interventions for people with a specific condition. Following this, the methods in conducting a systematic review of reviews require consideration of the following aspects, akin to the planning for a systematic review of individual studies: sources, review selection, quality assessment of reviews, presentation of results and implications for practice and research.

Sources and searching

Locating and retrieving relevant literature is challenging, yet crucial to the success of a systematic review. The material sourced provides the information from which evidence, conclusions and recommendations are drawn. For many, the literature search may appear overwhelming, given the sheer volume of material to check through. However, establishing a systematic search strategy, before commencing the literature search, is fundamental to appropriate and successful information retrieval. This planning assists in meeting the requirements of the systematic review and in answering the research question. In developing a search strategy, the scope of the search, its thoroughness and the time available to conduct it, all need to be considered. The aim is to ensure that the systematic review of reviews is comprehensive, thorough and objective.

The methods used in sourcing relevant literature to conduct a systematic review of reviews are similar to those adopted in conducting a systematic review of individual studies with some subtle differences described here. A realistic time-frame to conduct the systematic review of reviews should be established. It has been estimated that a typical systematic review would take between six and eighteen months [ 18 ] but this is very dependent on the research question and the staffing, funding and other resources available. The process might be faster for a systematic review of reviews if the time-frame to complete the literature search is significantly reduced through the ability to target the searching of articles most likely to be reports of a systematic review. In a systematic review of individual studies, the search should be as wide as possible to maximize the likelihood of capturing all relevant data and minimizing the effects of reporting biases. A search of a variety of electronic databases relevant to the topic of interest is recommended [ 18 ]. However, in a systematic review of reviews, it may be possible to limit the searches to databases specific to systematic reviews such as the Cochrane Database of Systematic Reviews and the Database of Abstracts of Reviews of Effects. Likewise, although the search for a review of individual studies might need to cover many decades [ 19 ], limiting the search to period from the early 1990 s onwards is likely to identify all but the very small minority of systematic reviews conducted before then [ 20 , 21 ]. Furthermore, researchers might find that identifying and highlighting a recent high quality systematic review will prove of most benefit to decision makers using their review or reviews. However, a summary of the earlier reviews can still prove helpful if these contain relevant information that is not included in the recent review. Applying language restrictions is not recommended; but, unavoidable constraints such as a lack of access to translation services or funds to pay for these may make it necessary to restrict the systematic review or reviews to English language publications. In such instances, this limitation should be acknowledged when reporting the review and it might be worthwhile reporting the difference between searches with and without language restrictions in order to estimate the amount of literature that might have been excluded.

The search terms used for the literature search should be clearly described, with information on their relevance to the research question. Furthermore, search terms should be focused so that they are broad enough in scope to capture all the relevant data yet narrow enough to minimize the capture of extraneous literature that may result in unnecessary time and effort being spent assessing irrelevant articles. In conducting a systematic review of reviews, systematic reviews rather than individual studies are of interest to the reviewer and several search strategies have been developed to identify this type of research [ 22 , 23 ] which could be combined with the terms for the relevant healthcare topic. In developing the search strategy for a systematic review of reviews, researchers might wish to consider the PRESS initiative, developed as a means for peer reviewing literature searches [ 24 ] to check that the various elements of the electronic search strategy have been considered. To minimize the risk of missing relevant reviews, a manual search of key journals and of the reference lists of reviews captured by the initial searches is also recommended. The literature search can also be complemented by contacting experts in the topic under review and by checking articles which cite individual studies that are known to be relevant to the topic. This may prove relevant in learning of published systematic reviews that are not indexed in the bibliographic databases searched, and of ongoing systematic reviews near completion. The development of a prospective register of systematic reviews should help further with this [ 25 ].

Review Selection

A major challenge to review selection is identifying all reviews relevant to the topic of interest, and of potential importance to answering the research question. During the planning phase, before commencing the systematic review of reviews, a review team should be established. The review team should include at least one person with methodological expertise in conducting systematic reviews and at least one person with expertise on the topic under review. The review team is responsible for developing a review selection strategy. An agreement of inclusion and exclusion criteria should be made before starting the review selection process. Aspects of this process might include decisions regarding the type of reviews that may be included in the systematic review. For example, in our review on interventions for preventing preterm birth [ 6 ], we restricted the inclusion criteria to reviews of randomized controlled trials. Another example of inclusion criteria might be to limit the systematic review of reviews to reviews of a particular type of participant (such as women having their first baby) or which assess a particular type of pain relief.

When a selection strategy has been developed, the selection process is carried out in a similar way to a review of individual studies:

Assess retrieved titles and abstracts for relevance and duplication.

Select those you wish to retrieve and appraise further.

Obtain full text copies of these potentially eligible reviews.

Assess these reviews for relevance and quality; ideally, using independent assessment by at least two members of the review team. This reduces bias in review selection and allows for appropriate discussion should uncertainty arise.

Quality Assessment of Reviews

The quality and strength of evidence presented in the individual, included reviews should influence the conclusions drawn in the systematic review of these. The quality and scope of published reviews varies widely. The strength of the conclusions and the ability to provide decision-makers with reliable information depends on the inclusion of reviews that meet a minimum standard of quality. When assessing the quality of the reviews, one should try to avoid being influenced by extraneous variables, such as authors, institutional affiliations and journal names; and should focus on the quality of the conduct of the review. Although the researchers will usually have to do this via an assessment of the quality of report, with the hope that initiatives such as PRISMA (formerly, QUOROM) which assist by facilitating adequate standards of reporting [ 26 ].

The AMSTAR tool [ 27 ], which became available after we started work on our review of reviews, is the only tool that we are aware of that has been validated as a means to assess the methodological quality of systematic reviews and could be used in the review of reviews to determine if the potentially eligible reviews meet minimum requirements based on quality. While the authors of the AMSTAR paper [ 27 ] recognise the need for further testing of the AMSTAR tool, important domains identified within the tool are: establishing the research question and inclusion criteria before the conduct of the review, data extraction by at least two independent data extractors, comprehensive literature review with searching of at least two databases, key word identification, expert consultation and limits applied, detailed list of included/excluded studies and study characteristics, quality assessment of included studies and consideration of quality assessments in analysis and conclusions, appropriate assessment of homogeneity, assessment of publication bias and a statement of any conflict of interest.

Although our review of reviews began before the publication of the AMSTAR tool, we used similar domains to assess review quality. Our assessment criteria are shown below and provide a structure that can be used to report the quality and comparability of the included reviews to help readers assess the strength of the evidence in the review of reviews:

▪ The extent of searching undertaken: Are the databases searched, years searched and restrictions applied in the original review clearly described? Information on the extent of searching should be clearly provided, to allow for a comprehensive assessment of the scope of the review.

▪ Description of review selection and inclusion criteria: Do the authors of the original review provide details of study selection and eligibility criteria and what are these details? This information should be clearly reported in the systematic review of reviews.

▪ Assessment of publication bias: Did the authors of the original review seek additional information from authors of the studies they included? Are there any details of statistical tests (such as funnel plot analysis) to assess for publication bias?

▪ Assessment of heterogeneity: Did the authors of the original review discuss or provide details of any tests of heterogeneity? In the presence of significant heterogeneity, were statistical tests used to address this?

▪ Comparability of included reviews: Are the reviews comparable in terms of eligibility criteria, study characteristics and primary outcome of interest? For example, in our review of reviews on fetal fibronectin and transvaginal cervical ultrasound for predicting preterm birth, [ 8 ] we included reviews that had incorporated studies among women who were both symptomatic and asymptomatic for preterm birth. As a means of addressing comparability of the included reviews, we provided details of the number of women in each group separately and reported the results for each group separately, where applicable.

Presentation of Results

When the results of a systematic review of reviews are presented, this should present the reader with the major conclusions of the review through the provision of answers to the research question, as well as the evidence on which these conclusions are based and an assessment of the quality of the evidence supporting each conclusion; for example, using the GRADE approach as adopted for the 'Summary of Findings' table in Cochrane reviews [ 28 ]. It is important to be specific in reporting the primary outcome of interest for the review, and this can reduce workload by limiting data extraction to only those results relevant to the topic of interest from reviews that report on several outcome measures. For example, some systematic reviews on antibiotic therapy for the prevention of preterm birth [ 29 , 30 ] report a variety of outcome measures other than preterm birth (e.g. neonatal outcomes). However, in our systematic reviews of reviews [ 6 , 8 ], our research focus on preterm birth meant that only results for the effects on preterm birth were extracted.

The use of summary tables and figures is helpful in presenting results in a structured and clear format that will enhance textual commentary. Table 1 is an example of the provision of details of the scope of the reviews included in a systematic review of reviews (3). Sources of evidence and some quality assessment criteria are included. The quality assessment is enhanced by a narrative discussion of heterogeneity and publication bias.

Table 2 provides an example of how summary results from each original review might be presented in the systematic review of reviews.

The use of a checklist or reporting tool may also guide the reviewer when reporting on a systematic review of reviews. Although we did not identify a tool specific to reporting of systematic reviews of reviews, the PRISMA statement provides a useful framework to follow [ 26 ]. This guidance, developed for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions, can be used to assess item inclusion in a systematic review of systematic reviews.

Implications for practice and research

One of the problems faced by decision makers who encounter multiple reviews of the same topic is inconsistency in the results or conclusions of these reviews. Jadad et al (1997) provide guidance on how to address discordant results from individual reviews [ 31 ] and conducting systematic reviews of reviews will help to address this issue further. A systematic review of reviews can provide reassurances that the conclusions of individual reviews are consistent, or not. The quality of individual reviews may be assessed, so that evidence from the best quality reviews can be highlighted and brought together in a single document, providing definitive summaries that could be used to inform clinical practice.

Meta-analyses in systematic reviews of reviews

A major challenge in conducting a systematic review of reviews is the creation of a 'meta-analysis' of the included reviews, which are themselves meta-analyses. In doing this, it is important that data from individual studies are not used more than once. This would give too much statistical power, with the risk that a misleading estimate will be produced and that this will be overly precise. Overcoming this challenge would require the unpicking of each of the included reviews and the subsequent combination of the results of the individual, included studies. This may prove to be a complex and time-consuming task and careful consideration should be given to its value when planning the systematic review of reviews, highlighting the importance of having clear reasons for conducting the review.

A systematic review of reviews allows the creation of a summary of reviews in a single document. In this paper, we have discussed the methods for conducting such a review. The methods we have described and discussed draw on our experiences, and should be useful to healthcare practitioners who wish to conduct a systematic review of reviews to enhance their evidence-based knowledge and to support well-informed clinical decision making. They should also be useful to practitioners who will find that the ideal starting point for knowledge from research will be a systematic review of reviews of the topic of interest to them.

Ghersi D, Pang T: From Mexico to Mali: four years in the history of clinical trial registration. Journal of Evidence-Based Medicine. 2009, 2: 1-7. 10.1111/j.1756-5391.2009.01014.x.

Article   PubMed   Google Scholar  

Gannan R, Ciliska D, Thomas H: Expediating systematic reviews: methods and implications of rapid reviews. Implementation Science. 2010, 5 (56):

The Cochrane Collaboration. [ http://www.cochrane.org ]

Bastian H, Glasziou P, Chalmers I: Seventy-five trials and eleven systematic reviews a day: how will we ever keep up. PLoS Medicine. 2010, 7 (9): 10.1371/journal.pmed.1000326.

Moher D, et al: Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine. 2007, 4 (3): e78-10.1371/journal.pmed.0040078.

Article   PubMed   PubMed Central   Google Scholar  

Smith V, et al: A systematic review and quality assessment of systematic reviews of randomised trials of interventions for preventing and treating preterm birth. Eur J Obstet Gynecol Reprod Biol. 2009, 142: 3-11. 10.1016/j.ejogrb.2008.09.008.

Clarke M: Systematic reviews of reviews of risk factors for intracranial aneurysms. Neuroradiology. 2008, 50: 653-664. 10.1007/s00234-008-0411-9.

Smith V, et al: A systematic review and quality assessment of systematic reviews of fetal fibronectin and transvaginal sonographic cervical length for predicting preterm birth. Eur J Obstet Gynecol Reprod Biol. 2007, 133: 134-142. 10.1016/j.ejogrb.2007.03.005.

Article   CAS   PubMed   Google Scholar  

Williams C, et al: Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy. Health Technology Assessment. 2006, 10 (34): 1-204.

Becker L, Oxman AD: Overviews of reviews. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.0.02 [updated September 2009]. Edited by: Higgins JP, Green S. 2009, Oxford: The Cochrane Collaboration

Google Scholar  

Singh J, et al: Biologics for rheumatoid arthritis: an overview of Cochrane reviews (Protocol). The Cochrane Database of Systematic Reviews. 2009, CD007848-4

Keus F, van Laarhoven CJH: Open, small-incision, or laproscopic cholecystectomy for patients with symptomatic cholecystolithiasis. An overview of Cochrane Hepato-Biliary Group reviews. Cochrane Database of Systematic Reviews. 2010, CD008318-1

Singh JA, et al: Adverse effects of biologics: a network meta-anlysis and Cochrane overview (Protocol). Cochrane Database of Systematic Reviews. 2010, CD008794-10

Ryan R, et al: Consumer-oriented interventions for evidence-based prescribing and medicine use: an overview of Cochrane reviews (Protocol). Cochrane Database of Systematic Reviews. 2009, CD007768-2

Yang M, et al: Interventions for preventing influenza: an overview of Cochrane systematic reviews (Protocol). Cochrane Database of Systematic Reviews. 2010, CD008501-5

Eccles MP, et al: An overview of reviews evaluating the effects of financial incentives in changing healthcare professional behaviours and patient outcomes (Protocol). Cochrane Database of Systematic Reviews. 2010, CD008608-7

Aaserud M, et al: Pharmaceutical policies: effects on rational drug use, an overview of 13 reviews (Protocol). Cochrane Database of Systematic Reviews. 2006, CD004397-2

Critical Reviews Advisory Group: Introduction to systematic reviews. School for Health and Related Research. 1996, [ http://www.shef.ac.uk/scharr ]

Lichtenstein A, Yetley E, Lau J: Application of systematic review methodology to the field of nutrition. Journal of Nutrition. 2008, 2297-2306. 10.3945/jn.108.097154.

Chalmers I, Hedges LV, Cooper H: Abrief history of research synthesis. Evaluation and the Health Professions. 2002, 25: 12-37. 10.1177/0163278702025001003.

Starr M, et al: The origins, evolution and future of the Cochrane Database of Systematic Reviews. Cochrane Database of Systematic Reviews. 2009, 25 (suppl 1): 182-195.

Montori VM, et al: Optimal search strategies for retreiving systematic reviews from Medline: analytical survey. BMJ. 2005, 330 (7482): 10.1136/bmj.38336.804167.47.

Wilczynski NL, Haynes RB, Hedges Team: EMBASE search strategies achieved high sensitivity and specificity for retrieving methdologically sound systematic reviews. J Clin Epidemiol. 2005, 60 (1): 29-33. 10.1016/j.jclinepi.2006.04.001.

Article   Google Scholar  

Sampson M, et al: An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009, 62 (9): 944-52. 10.1016/j.jclinepi.2008.10.012.

Booth A, et al: An international registry of systematic-review protocols. Lancet. 2011, 377 (9760): 108-109. 10.1016/S0140-6736(10)60903-8.

Liberati A, et al: The PRISMA Statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Medicine. 2009, 6 (7): e1000100-10.1371/journal.pmed.1000100.

Shea BJ, et al: AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009, 62 (10): 1013-20. 10.1016/j.jclinepi.2008.10.009.

Schunemann HJ, et al: Presenting results and 'Summary of findings' tables , in Cochrane Handbook for Systematic Reviews of Interventions. Version 5.0.02 [updated September 2009]. Edited by: Higgins JP, Green NS. 2009, The Cochrane Collaboration: Oxford

Simcox R, et al: Prophylactic antibiotics for the prevention of preterm birth in women at risk: a meta-analysis. Aus NZ J Obs & Gynaecol. 2007, 47: 368-377.

King J, Flenady V: Prophylactic antibiotics for inhibiting preterm labour with intact membranes. The Cochrane Database of Systematic Reviews. 2002, CD000246-

Jadad AR, Cook DJ, Browman GP: A guide to interpreting discordant systematic reviews. CMAJ. 1997, 156 (10): 1411-6.

CAS   PubMed   PubMed Central   Google Scholar  

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/11/15/prepub

Download references

Author information

Authors and affiliations.

School of Nursing and Midwifery, University of Dublin, Trinity College Dublin, 24 D'Olier Street, Dublin 2, Ireland

Valerie Smith & Cecily M Begley

School of Nursing and Midwifery, National University of Ireland, Galway, Galway, Ireland

Declan Devane

UK Cochrane Centre, National Institute for Health Research, Middle Way, Oxford, OX2 7LG, UK

Mike Clarke

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Valerie Smith .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors' contributions

VS participated in the sequence content and drafted the manuscript. MC conceived and contributed to the rationale for the manuscript. VS, CB, DD and MC contributed to the design of the manuscript. CB, DD and MC read and critically revised the draft manuscript for important intellectual content. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Smith, V., Devane, D., Begley, C.M. et al. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med Res Methodol 11 , 15 (2011). https://doi.org/10.1186/1471-2288-11-15

Download citation

Received : 04 June 2010

Accepted : 03 February 2011

Published : 03 February 2011

DOI : https://doi.org/10.1186/1471-2288-11-15

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic Review
  • Preterm Birth
  • Review Team
  • Rapid Review
  • Included Review

BMC Medical Research Methodology

ISSN: 1471-2288

research methodology systematic review

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
  • UNC Libraries
  • HSL Academic Process
  • Systematic Reviews

Systematic Reviews: Home

Created by health science librarians.

HSL Logo

  • Systematic review resources

What is a Systematic Review?

A simplified process map, how can the library help, publications by hsl librarians, systematic reviews in non-health disciplines, resources for performing systematic reviews.

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct Literature Searches
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies
  • Step 7: Extract Data from Included Studies
  • Step 8: Write the Review

  Check our FAQ's

   Email us

  Chat with us (during business hours)

   Call (919) 962-0800

   Make an appointment with a librarian

  Request a systematic or scoping review consultation

Sign up for a systematic review workshop or watch a recording

A systematic review is a literature review that gathers all of the available evidence matching pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods, documented in a protocol, to minimize bias , provide reliable findings , and inform decision-making.  ¹  

There are many types of literature reviews.

Before beginning a systematic review, consider whether it is the best type of review for your question, goals, and resources. The table below compares a few different types of reviews to help you decide which is best for you. 

  • Scoping Review Guide For more information about scoping reviews, refer to the UNC HSL Scoping Review Guide.

Systematic Reviews: A Simplified, Step-by-Step Process Map

  • UNC HSL's Simplified, Step-by-Step Process Map A PDF file of the HSL's Systematic Review Process Map.
  • Text-Only: UNC HSL's Systematic Reviews - A Simplified, Step-by-Step Process A text-only PDF file of HSL's Systematic Review Process Map.

Creative commons license applied to systematic reviews image requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.

The average systematic review takes 1,168 hours to complete. ¹   A librarian can help you speed up the process.

Systematic reviews follow established guidelines and best practices to produce high-quality research. Librarian involvement in systematic reviews is based on two levels. In Tier 1, your research team can consult with the librarian as needed. The librarian will answer questions and give you recommendations for tools to use. In Tier 2, the librarian will be an active member of your research team and co-author on your review. Roles and expectations of librarians vary based on the level of involvement desired. Examples of these differences are outlined in the table below.

  • Request a systematic or scoping review consultation

The following are systematic and scoping reviews co-authored by HSL librarians.

Only the most recent 15 results are listed. Click the website link at the bottom of the list to see all reviews co-authored by HSL librarians in PubMed

Researchers conduct systematic reviews in a variety of disciplines.  If your focus is on a topic outside of the health sciences, you may want to also consult the resources below to learn how systematic reviews may vary in your field.  You can also contact a librarian for your discipline with questions.

  • EPPI-Centre methods for conducting systematic reviews The EPPI-Centre develops methods and tools for conducting systematic reviews, including reviews for education, public and social policy.

Cover Art

Environmental Topics

  • Collaboration for Environmental Evidence (CEE) CEE seeks to promote and deliver evidence syntheses on issues of greatest concern to environmental policy and practice as a public service

Social Sciences

research methodology systematic review

  • Siddaway AP, Wood AM, Hedges LV. How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses. Annu Rev Psychol. 2019 Jan 4;70:747-770. doi: 10.1146/annurev-psych-010418-102803. A resource for psychology systematic reviews, which also covers qualitative meta-syntheses or meta-ethnographies
  • The Campbell Collaboration

Social Work

Cover Art

Software engineering

  • Guidelines for Performing Systematic Literature Reviews in Software Engineering The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students.

Cover Art

Sport, Exercise, & Nutrition

Cover Art

  • Application of systematic review methodology to the field of nutrition by Tufts Evidence-based Practice Center Publication Date: 2009
  • Systematic Reviews and Meta-Analysis — Open & Free (Open Learning Initiative) The course follows guidelines and standards developed by the Campbell Collaboration, based on empirical evidence about how to produce the most comprehensive and accurate reviews of research

Cover Art

  • Systematic Reviews by David Gough, Sandy Oliver & James Thomas Publication Date: 2020

Cover Art

Updating reviews

  • Updating systematic reviews by University of Ottawa Evidence-based Practice Center Publication Date: 2007

Looking for our previous Systematic Review guide?

Our legacy guide was used June 2020 to August 2022

  • Systematic Review Legacy Guide
  • Next: Step 1: Complete Pre-Review Tasks >>
  • Last Updated: Mar 28, 2024 9:43 AM
  • URL: https://guides.lib.unc.edu/systematic-reviews

Search & Find

  • E-Research by Discipline
  • More Search & Find

Places & Spaces

  • Places to Study
  • Book a Study Room
  • Printers, Scanners, & Computers
  • More Places & Spaces
  • Borrowing & Circulation
  • Request a Title for Purchase
  • Schedule Instruction Session
  • More Services

Support & Guides

  • Course Reserves
  • Research Guides
  • Citing & Writing
  • More Support & Guides
  • Mission Statement
  • Diversity Statement
  • Staff Directory
  • Job Opportunities
  • Give to the Libraries
  • News & Exhibits
  • Reckoning Initiative
  • More About Us

UNC University Libraries Logo

  • Search This Site
  • Privacy Policy
  • Accessibility
  • Give Us Your Feedback
  • 208 Raleigh Street CB #3916
  • Chapel Hill, NC 27515-8890
  • 919-962-1053
  • Open access
  • Published: 26 March 2024

Barriers and enablers to the implementation of patient-reported outcome and experience measures (PROMs/PREMs): protocol for an umbrella review

  • Guillaume Fontaine   ORCID: orcid.org/0000-0002-7806-814X 1 , 2 ,
  • Marie-Eve Poitras 3 , 4 ,
  • Maxime Sasseville 5 , 6 ,
  • Marie-Pascale Pomey 7 , 8 ,
  • Jérôme Ouellet 9 ,
  • Lydia Ould Brahim 1 ,
  • Sydney Wasserman 1 ,
  • Frédéric Bergeron 10 &
  • Sylvie D. Lambert 1 , 11  

Systematic Reviews volume  13 , Article number:  96 ( 2024 ) Cite this article

684 Accesses

1 Altmetric

Metrics details

Patient-reported outcome and experience measures (PROMs and PREMs, respectively) are evidence-based, standardized questionnaires that can be used to capture patients’ perspectives of their health and health care. While substantial investments have been made in the implementation of PROMs and PREMs, their use remains fragmented and limited in many settings. Analysis of multi-level barriers and enablers to the implementation of PROMs and PREMs has been hampered by the lack of use of state-of-the-art implementation science frameworks. This umbrella review aims to consolidate available evidence from existing quantitative, qualitative, and mixed-methods systematic and scoping reviews covering factors that influence the implementation of PROMs and PREMs in healthcare settings.

An umbrella review of systematic and scoping reviews will be conducted following the guidelines of the Joanna Briggs Institute (JBI). Qualitative, quantitative, and mixed methods reviews of studies focusing on the implementation of PROMs and/or PREMs in all healthcare settings will be considered for inclusion. Eight bibliographical databases will be searched. All review steps will be conducted by two reviewers independently. Included reviews will be appraised and data will be extracted in four steps: (1) assessing the methodological quality of reviews using the JBI Critical Appraisal Checklist; (2) extracting data from included reviews; (3) theory-based coding of barriers and enablers using the Consolidated Framework for Implementation Research (CFIR) 2.0; and (4) identifying the barriers and enablers best supported by reviews using the Grading of Recommendations Assessment, Development and Evaluation-Confidence in the Evidence from Reviews of Qualitative research (GRADE-CERQual) approach. Findings will be presented in diagrammatic and tabular forms in a manner that aligns with the objective and scope of this umbrella review, along with a narrative summary.

This umbrella review of quantitative, qualitative, and mixed-methods systematic and scoping reviews will inform policymakers, researchers, managers, and clinicians regarding which factors hamper or enable the adoption and sustained use of PROMs and PREMs in healthcare settings, and the level of confidence in the evidence supporting these factors. Findings will orient the selection and adaptation of implementation strategies tailored to the factors identified.

Systematic review registration

PROSPERO CRD42023421845.

Peer Review reports

Capturing patients’ perspectives of their health and healthcare needs using standardized patient-reported outcome and experience measures (referred to herein as PROMs and PREMs, respectively) has been the focus of over 40 years of research [ 1 , 2 ]. PROMs/PREMs are standardized, validated questionnaires (generic or disease-specific); PROMs are completed by patients about their health, functioning, and quality of life, whereas PREMs are focused on patients’ experiences whilst receiving care [ 1 ]. PROMs/PREMs are associated with a robust evidence-base across multiple illnesses; they can increase charting of patients’ needs [ 3 ], and improve patient-clinician communication [ 3 , 4 , 5 ], which in turn can lead to improved symptom management [ 4 , 5 , 6 ], thereby improving patients’ quality of life, reducing health care utilization [ 5 ], and increasing survival rates [ 7 ].

Multipurpose applications of PROMs/PREMs have led to substantial investments in their implementation. In the USA, PROMs are part of payer mandates; in the United Kingdom, they are used for benchmarking and included in a national registry; and Denmark has embedded them across healthcare sectors [ 8 , 9 , 10 , 11 ]. In Canada, the Canadian Institute for Health Information (CIHI) has advocated for a standardized core set of PROMs [ 12 ], and the Canadian Partnership Against Cancer (CPAC) recently spearheaded PROM implementation in oncology in 10 provinces/territories. In 2017, the Organisation for Economic Co-operation and Development (OECD) launched the Patient-Reported Indicators Surveys (PaRIS) to build international capacity for PROMs/PREMs in primary care [ 13 ]. Yet, in many countries across the globe, their use remains fragmented, characterized by broad swaths of pre-implementation, pilots, and full implementation in narrow domains [ 12 , 14 , 15 ]. PROM/PREM implementation remains driven by silos of local healthcare networks [ 16 ].

Barriers and enablers to the implementation of PROMs/PREMs exist at the patient level (e.g., low health literacy), [ 17 ] clinician level (e.g., obtaining PROM/PREM results from external digital platforms) [ 17 , 18 , 19 ], service level (e.g., lack of integration in clinics’ workflow) [ 17 , 20 ] and organizational/system-level (e.g., organizational policies conflicting with PROM implementation goals) [ 21 ]. Foster and colleagues [ 22 ] conducted an umbrella review on the barriers and facilitators to implementing PROMs in healthcare settings. The umbrella review identified a number of bidirectional factors arising at different stages that can impact the implementation of PROMs; these factors were related to the implementation process, the organization, and healthcare providers [ 22 ]. However, the umbrella review focused solely on PROMs, excluding PREMs, and the theory-based analysis of implementation factors was limited. Another ongoing umbrella review is restricted to investigating barriers and enablers at the healthcare provider level, omitting the multilevel changes required for successful PROM/PREM implementation [ 23 ].

State-of-the-art approaches from implementation science can support the identification of multilevel factors influencing the implementation of PROMs and PREMs in different healthcare settings [ 24 , 25 , 26 ]. The second version of the Consolidated Framework for Implementation Research (CFIR 2.0) can guide the exploration of determinants influencing the implementation of PROMs and PREMs [ 27 ]. The CFIR is a meta-theoretical framework providing a repository of standardized implementation-related constructs at the individual, organizational, and external levels that can be applied across the spectrum of implementation research [ 27 ]. CFIR 2.0 includes five domains pertaining to the characteristics of the innovation targeted for implementation, the implementation process, the individuals involved in the implementation, the inner setting, and the outer setting [ 27 ]. Using an implementation framework to identify the multilevel factors influencing the implementation of PROMs/PREMs is critical to select and tailor implementation strategies to address barriers [ 28 , 29 , 30 , 31 ]. Implementation strategies are the “how”, the specific means or methods for promoting the adoption of evidence-based innovations (e.g., role revisions, audit, provide feedback) [ 32 ]. Selecting and adapting implementation strategies to facilitate the implementation of PROMs/PREMs can be time-consuming, as there are more than 73 implementation strategies to choose from [ 33 ]. Thus, a detailed understanding of the barriers to PROM/PREM implementation can inform and streamline the selection and adaptation of implementation strategies, saving financial, human, and material resources [ 24 , 25 , 26 , 32 , 34 ].

Review objective and questions

In this umbrella review, we aim to consolidate available evidence from existing quantitative, qualitative, and mixed-methods systematic and scoping reviews covering factors that influence the implementation of PROMs and PREMs in healthcare settings.

We will address the following questions:

What are the factors that hinder or enable the implementation of PROMs and PREMs in healthcare settings, and what is the level of confidence in the evidence supporting these factors?

What are the similarities and differences in barriers and enablers across settings and geographical regions?

What are the similarities and differences in the perceptions of barriers and enablers between patients, clinicians, managers, and decision-makers?

What are the implementation theories, models, and frameworks that have been used to guide research in this field?

Review design and registration

An umbrella review of systematic and scoping reviews will be conducted following the guidelines of the Joanna Briggs Institute (JBI) [ 35 , 36 ]. The umbrella review is a form of evidence synthesis that aims to address the challenge of collating, assessing, and synthesizing evidence from multiple reviews on a specific topic [ 35 ]. This protocol was registered on PROSPERO (CRD42023421845) and is presented according to the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) guidelines (see Supplementary material  1 ) [ 37 ]. We will use the Preferred Reporting Items for Overviews of Reviews (PRIOR) guidelines [ 38 ] and the PRISMA guidelines [ 39 ] to report results (e.g., flowchart, search process).

Eligibility criteria

The eligibility criteria were developed following discussions among the project team including researchers with experience in the implementation of PROMs and PREMs in different fields (e.g., cancer care, primary care) and implementation science. These criteria were refined after being piloted on a set of studies. The final eligibility criteria for the review are detailed in Table  1 . We will consider for inclusion all qualitative, quantitative, and mixed methods reviews of studies focusing on the implementation of PROMs or PREMs in any healthcare setting.

Information sources

Searches will be conducted in eight databases: CINAHL, via EBSCOhost (1980 to present); Cochrane Database of Systematic Reviews; Evidence-Based Medicine Reviews; EMBASE, via Ovid SP (1947 to present); ERIC, via Ovid SP (1966 to present); PsycINFO, via APA PsycNet (1967 to present); PubMed (including MEDLINE), via NCBI (1946 to present); Web of Science, via Clarivate Analytics (1900 to present). CINAHL is a leading database for nursing and allied health literature. The Cochrane Database of Systematic Reviews and Evidence-Based Medicine Reviews are essential for accessing high-quality systematic reviews and meta-analyses. EMBASE is a biomedical and pharmacological database offering extensive coverage of drug research, pharmacology, and medical devices, complementing PubMed. ERIC provides valuable insights from educational research that are relevant to our study given the intersection of healthcare and education in PROMs and PREMs. PsycINFO is crucial for accessing research on the psychological aspects of PROMs and PREMs. PubMed, encompassing MEDLINE, is a primary resource for biomedical literature. Web of Science offers a broad and diverse range of scientific literature providing interdisciplinary coverage. We will use additional strategies to complement our exploration including examining references cited in eligible articles, searching for authors who have published extensively in the field, and conducting backward/forward citation searches of related systematic reviews and influential articles.

Search strategy

A comprehensive search strategy was developed iteratively by the review team in collaboration with an experienced librarian with a Master’s of Science in Information (FB). First, an initial limited search of MEDLINE and CINAHL will be undertaken to identify reviews on PROM/PREM implementation. The text words contained in the titles and abstracts, and the index terms used to describe these reviews will be analyzed and applied to a modified search strategy (as needed). We adapted elements from the search strategies of two recent reviews in the field of PROM/PREM implementation [ 22 , 23 ] to fit our objectives. The search strategy for PubMed is presented in Supplementary material 2 . The search strategy will be tailored for each information source. The complete search strategy for each database will be made available for transparency and reproducibility in the final manuscript.

Selection process

All identified citations will be collated and uploaded into the Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia), and duplicates removed. Following training on 50 titles, titles will be screened by two independent reviewers for assessment against the inclusion criteria for the review. Multiple rounds of calibration might be needed. Once titles have been screened, retained abstracts will be reviewed, preferably by the same two reviewers. However, inter-rater reliability will be re-established on 50 abstracts to re-calibrate (as needed). Lastly, the full texts of retained abstracts will be located and assessed in detail against the inclusion criteria by two independent reviewers. Reasons for excluding articles from full-text review onwards will be recorded in the PRIOR flow diagram (PRISMA-like flowchart) [ 38 ]. Any disagreements that arise between the reviewers at each stage of the selection process will be resolved through discussion, or with an additional reviewer. More specifically, throughout the project, weekly team meetings will be held and will provide the opportunity for the team to discuss and resolve any disagreement that arises during the different stages, from study selection to data extraction.

Quality appraisal and data extraction

As presented in Fig.  1 , included reviews will be appraised and data will be extracted and analyzed in four steps using validated tools and methodologies [ 27 , 36 , 40 ]. All four steps will be conducted by two reviewers independently, and a third will be involved in case of disagreement. More reviewers may be needed depending on the number of reviews included.

figure 1

Tools/methodology applied in each phase of the umbrella review. Figure adapted from Boudewijns and colleagues [ 41 ] with permission. CFIR 2.0 = Consolidated Framework for Implementation Research, version 2 [ 27 ]. GRADE–CERQual = Grading of Recommendations Assessment Development and Evaluation–Confidence in the Evidence from Reviews of Qualitative Research [ 42 ]. JBI = Joanna Briggs Institute [ 36 ]

Step 1—assessing the quality of included reviews

In the first step, two reviewers will independently assess the methodological quality of the reviews using the JBI Critical Appraisal Checklist for Systematic Reviews and Research Syntheses, presented in Supplementary material  3 . We have selected this checklist for its comprehensiveness, applicability to different types of knowledge syntheses, and ease of use, requiring minimal training for reviewers to apply it. The checklist consists of 11 questions. It evaluates whether the review question is clearly and explicitly stated, the inclusion criteria were appropriate for that question, and the search strategy and sources used to determine if they were suitable and adequate for capturing relevant studies. It also assesses the appropriateness of the criteria used for appraising studies, as well as whether the critical appraisal was conducted independently by two or more reviewers. The checklist further examines if there were methods in place to minimize errors during data extraction, if the methods used to combine studies were appropriate, and whether the likelihood of publication bias was assessed. Additionally, it verifies if the recommendations for policy and/or practice are supported by the reported data and if the directives for new research are appropriate. Each question should be answered as “yes”, “no”, or “unclear”. Not applicable “NA” is also provided as an option and may be appropriate in rare instances. The results of the quality appraisal will provide the basis for assessing confidence in the evidence in step four. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer, or at team meetings.

Step 2—extracting data from included reviews

For the second step, we have developed a modified version of the JBI Data Extraction Form for Umbrella Reviews, presented in Supplementary material  3 . We will pilot our data extraction form on two of the included reviews, and it will be revised for clarity, as needed. Subsequently, two independent reviewers will conduct all extraction for each review independently. We will collect the following data: (a) authors and date; (b) country; (c) review aims, objectives; (d) focus of the review; (e) context; (f) population; (g) eligibility criteria; (f) review type and methodology; (g) data sources; (h) dates of search; (i) number of included studies; (j) characteristics of included studies (including study type, critical appraisal score); (k) implementation framework guiding analysis; (l) implementation strategies discussed; (m) results and significance; and (n) conclusions. Barriers and enablers will be extracted separately in step 3. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer, or at team meetings.

Step 3—theory-based coding of barriers and enablers

In the third step, we will use the second version of the Consolidated Framework for Implementation Research (CFIR) [ 27 ] to guide our proposed exploration of determinants influencing the implementation of PROMs and PREMs (see Fig.  2 ). The CFIR is a meta-theoretical framework providing a repository of standardized implementation-related constructs at the individual, organizational, and external levels that can be applied across the spectrum of implementation research. CFIR contains 48 constructs and 19 subconstructs representing determinants of implementation across five domains: Innovation (i.e., PROMs and PREMs), Outer Setting (e.g., national policy context), Inner Setting (e.g., work infrastructure), Individuals (e.g., healthcare professional motivation) and Implementation Process (e.g., assessing context) [ 27 ]. To ensure that coding remains grounded in the chosen theoretical framework, we have developed a codebook based on the second version of the CFIR, presented in Supplementary material 3 . Furthermore, an initial training session and regular touchpoints will be held to discuss coding procedures among the team members involved.

figure 2

The second version of the Consolidated Framework for Implementation Research and its five domains: innovation, outer setting, inner setting, individuals, and implementation process [ 27 , 43 ]

To code factors influencing the implementation of PROMs and PREMs using the CFIR, we will upload all PDFs of the included reviews and their appendices in the NVivo qualitative data analysis software (QSR International, Burlington, USA). All reviews will be independently coded by two reviewers. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer.

Step 4—identifying the barriers and enablers best supported by the reviews

In the fourth and final step, we will use the Grading of Recommendations Assessment, Development, and Evaluation-Confidence in the Evidence from Reviews of Qualitative research (GRADE-CERQual) approach to assess the level of confidence in the barriers and enablers to PROM/PREM implementation identified in step 3 (see Supplementary material  3 ). This process will identify which barriers and enablers are best supported by the evidence in the included reviews. GRADE-CERQual includes four domains: (a) methodological limitations, (b) coherence and (c) adequacy of data, and (d) relevance (see Table  2 ). For each review finding, we will assign a score per domain from one point (substantial concerns) to four points (no concerns to very minor concerns). The score for the methodological limitations of the review will be assigned based on the JBI Critical Appraisal (step 1). The score for coherence will be assigned based on the presence of contradictory findings as well as ambiguous/incomplete data for that finding in the umbrella review. The score for adequacy of data will be assigned based on the richness of the data supporting the umbrella review finding. Finally, the score for relevance will be assigned based on how well the included reviews supporting a specific barrier or enabler to the implementation of PROMs/PREMs are applicable to the umbrella review context. This will allow us to identify which factors are supported by evidence with the highest level of confidence, and their corresponding level of evidence. A calibration exercise will be conducted on three systematic reviews with team members involved in this stage of the umbrella review, and adjustments to procedures will be discussed in team meetings.

The data synthesis plan for the umbrella review has been meticulously designed to present extracted data in a format that is both informative and accessible, aiding in decision-making and providing a clear overview of the synthesized evidence.

Data extracted from the included systematic reviews will be organized into diagrams and tables, ensuring the presentation is closely aligned with our objectives and scope. These will categorize the distribution of reviews in several ways: by the year or period of publication, country of origin, target population, context, type of review, and various implementation factors. This stratification will allow for an at-a-glance understanding of the breadth and focus of the existing literature. To further assist in the application of the findings, a Summary of Qualitative Findings (SoQF) table will be constructed. This table will list each barrier and enabler identified within the systematic reviews and provide an overall confidence assessment for each finding. The confidence assessment will be based on the methodological soundness and relevance of the evidence supporting each identified barrier or enabler. Importantly, the SoQF table will include explanations for these assessments, making the basis for each judgement transparent [ 42 ]. Additionally, a CERQual Evidence Profile will be prepared, offering a detailed look at the reviewers’ judgements concerning each component of the CERQual approach. These components contribute to the overall confidence in the evidence for each identified barrier or enabler. The CERQual Evidence Profile will serve as a comprehensive record of the quality and applicability of the evidence [ 42 ].

Finally, we will conduct a narrative synthesis accompanying the tabular and diagrammatic presentations, summarizing the findings and discussing their implications concerning the review’s objectives and questions. This narrative will interpret the significance of the barriers and enablers identified, explaining how the synthesized evidence fits into the existing knowledge base and pointing out potential directions for future research or policy formulation.

This protocol outlines an umbrella review aiming to consolidate available evidence on the implementation of PROMs and PREMs in healthcare settings. Through our synthesis of quantitative, qualitative, and mixed-methods systematic and scoping reviews, we will answer two key questions: which factors hinder or enable the adoption and sustained use of PROMs and PREMs in healthcare settings, and what is the level of confidence in the evidence supporting these factors? Our findings will indicate which factors can influence the adoption of PROMs and PREMs, including clinician buy-in, patient engagement, and organizational support. Furthermore, our review will provide key insights regarding how barriers and enablers to PROM/PREM implementation differ across settings and how perceptions around their implementation differ between patients, clinicians, managers, and decision-makers. The consideration of different healthcare settings and the inclusion of studies from different geographical regions and healthcare systems will provide a global perspective, essential for understanding how context-specific factors might influence the generalizability of findings.

Strengths of this umbrella review include the use of a state-of-the-art implementation framework (CFIR 2.0) to identify, categorize, and synthesize multilevel factors influencing the implementation of PROMs/PREMS, and the use of the GRADE-CERQual approach to identify the level of confidence in the evidence supporting these factors. Using CFIR 2.0 will address a key limitation of current research in the field, since reviews and primary research are often focused on provider- and patient-level barriers and enablers, omitting organizational- and system-level factors affecting PROM/PREM implementation. This umbrella review will expose knowledge gaps to orient further research to improve our understanding of the complex factors at play in the adoption and sustained use of PROMs and PREMs in healthcare settings. Importantly, using CFIR 2.0 will allow the mapping of barriers and enablers identified to relevant implementation strategy taxonomies, such as the Expert Recommendations for Implementing Change (ERIC) Taxonomy [ 34 ]. This is crucial for designing tailored implementation strategies, as it can ensure that the chosen approaches to support implementation are directly aligned with the specific barriers and enablers to the uptake of PROMs and PREMs.

Umbrella reviews are also associated with some limitations, including being limited to the inclusion of systematic reviews and other knowledge syntheses, while additional primary studies are likely to have since been published. These additional empirical studies will not be captured, but we will minimize this risk by updating the search strategy at least once before the completion of the umbrella review. A second key challenge in umbrella reviews is the overlap between the primary studies, as many studies will have been included in different systematic reviews on the same topic. To address this issue, we will prepare a matrix of primary studies included in systematic reviews to gain insight into double counting of primary studies.

We will maintain an audit trail document amendments to this umbrella review protocol and report these in both the PROSPERO register and subsequent publications. Findings will be disseminated through publications in peer-reviewed journals in the fields of implementation, medicine, as well as health services, and policy research. We will also disseminate results through relevant conferences and social media using different strategies (e.g., graphical abstract). Furthermore, we will leverage existing connections between SDL and decision-makers at a provincial and national level in Canada to disseminate the findings of the review to a wider audience (e.g., the Director of Quebec Cancerology Program, Canadian Association of Psychosocial Oncology).

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed for the purposes of this publication.

Abbreviations

Confidence in the evidence from reviews of qualitative research

Consolidated framework for implementation research

Canadian Institute for Health Information

Canadian Partnership Against Cancer

Expert recommendations for implementing change

Grading of recommendations assessment, development and evaluation

Joanna Briggs Institute

Organisation for economic co-operation and development

Patient-reported indicators surveys

Patient-reported experience measure

Preferred reporting items for overviews of reviews

Preferred reporting items for systematic review and meta-analysis

Preferred reporting items for systematic review and meta-analysis protocols

Patient-reported outcome measure

Kingsley C, Patel S. Patient-reported outcome measures and patient-reported experience measures. BJA Education. 2017;16:137–44.

Article   Google Scholar  

Jamieson Gilmore K CI, Coletta L, Allin S. The uses of patient reported experience measures in health systems: a systematic narrative review. Health Policy. 2022.

Gibbons CPI, Gonçalves-Bradley DC, et al. Routine provision of feedback from patientreported outcome measurements to healthcare providers and patients in clinical practice. Cochrane Database Syst Rev. 2021;10:Cd011589.

PubMed   Google Scholar  

Howell DMS, Wilkinson K, et al. Patient-reported outcomes in routine cancer clinical practice: a scoping review of use, impact on health outcomes, and implementation factors. Ann Oncol. 2015;26:1846–58.

Article   CAS   PubMed   Google Scholar  

Kotronoulas GKN, Maguire R, et al. What is the value of the routine use of patient-reported outcome measures toward improvement of patient outcomes, processes of care, and health service outcomes in cancer care? A systematic review of controlled trials. J Clin Oncol. 2014;32:1480–501.

Article   PubMed   Google Scholar  

Chen J OL, Hollis SJ. A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting. BMC Health Serv Res. 2013;13(211).

Basch E. Symptom monitoring With patient-reported outcomes during routine cancer treatment: A randomized controlled trial. J Clin Oncol. 2016;34:2198–2198.

Forcino RCMM, Engel JA, O’Malley AJ, Elwyn G. Routine patient-reported experience measurement of shared decision-making in the USA: a qualitative study of the current state according to frontrunners. BMJ Open. 2020;10: e037087.

Article   PubMed   PubMed Central   Google Scholar  

Timmins N. NHS goes to the PROMS. BMJ. 2008;336:1464–5.

Mjåset C. Value-based health care in four different health care systems. NEJM Catalyst. 2020.

Sekretariatet P. PRO – patient reported outcome. https://pro-danmark.dk/da/proenglish .

Terner MLK, Chow C, Webster G. Advancing PROMs for health system use in Canada and beyond. J Patient Rep Outcomes. 2021;5:94.

Slawomirski L, van den Berg M, Karmakar-Hore S. Patient-Reported indicator survey (Paris): aligning practice and policy for better health outcomes. World Med J. 2018;64(3):8–14.

Google Scholar  

Ahmed SBL, Bartlett SJ, et al. A catalyst for transforming health systems and person-centred care: Canadian national position statement on patient-reported outcomes. Curr Oncol. 2020;27:90–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Pross C, Geissler A, Busse R. Measuring, reporting, and rewarding quality of care in 5 nations: 5 policy levers to enhance hospital quality accountability. Milbank Q. 2017;95(1):136–83.

Ernst SCK, Steinbeck V, Busse R, Pross C. Toward system-wide implementation of patient-reported outcome measures: a framework for countries, states, and regions. Value in Health. 2022;25(9):1539–47.

Nguyen HBP, Dhillon H, Sundaresan P. A review of the barriers to using Patient-Reported Outcomes (PROs) and Patient-Reported Outcome Measures (PROMs) in routine cancer care. J Med Radiation Sci. 2021;68:186–95.

Davis SAM, Smith M, et al. Paving the way for electronic patient-centered measurement in team-based primary care: integrated knowledge translation approach. JMIR Form Res. 2022;6: e33584.

Bull CTH, Watson D, Callander EJ. Selecting and implementing patient-reported outcome and experience measures to assess health system performance. JAMA Health Forum. 2022;3: e220326.

Schepers SAHL, Zadeh S, Grootenhuis MA, Wiener L. Healthcare professionals’ preferences and perceived barriers for routine assessment of patient-reported outcomes in pediatric oncology practice: moving toward international processes of change. Pediatr Blood Cancer. 2016;63:2181–8.

Glenwright BG, Simmich J, Cottrell Mea. Facilitators and barriers to implementing electronic patient-reported outcome and experience measures in a health care setting: a systematic review. J Patient Rep Outcomes. 2023;7(13).  https://doi.org/10.1186/s41687-023-00554-2

Foster A, Croot L, Brazier J, Harris J, O’Cathain A. The facilitators and barriers to implementing patient reported outcome measures in organisations delivering health related services: a systematic review of reviews. J Patient Rep Outcomes. 2018;2(1):1–16.

Wolff AC, Dresselhuis A, Hejazi Sea. Healthcare provider characteristics that influence the implementation of individual-level patient-centered outcome measure (PROM) and patient-reported experience measure (PREM) data across practice settings: a protocol for a mixed methods systematic review with a narrative synthesis. Syst Rev. 2021;10(169).  https://doi.org/10.1186/s13643-021-01725-2

Grimshaw JM, Eccles MP, Lavis JN, Hill SJ, Squires JE. Knowledge translation of research findings. Implement Sci. 2012;7(1):50. https://doi.org/10.1186/1748-5908-7-50 .

French SD, Green SE, O’Connor DA, et al. Developing theory-informed behaviour change interventions to implement evidence into practice: a systematic approach using the Theoretical Domains Framework. Implement Sci. 2012;7(1):38. https://doi.org/10.1186/1748-5908-7-38 .

Wolfenden L, Foy R, Presseau J, Grimshaw J M, Ivers N M, al. PBJe. Designing and undertaking randomised implementation trials: guide for researchers. BMJ. 2021;372.  https://doi.org/10.1136/bmj.m3721

Damschroder LJ, Reardon, C.M., Widerquist, M.A.O. et al. ,. The updated Consolidated Framework for Implementation Research based on user feedback. Implementation Science. 2022;17:75. https://doi.org/10.1186/s13012-022-01245-0

Bradshaw ASM, Mulderrig M, et al. Implementing person-centred outcome measures in palliative care: An exploratory qualitative study using Normalisation Process Theory to understand processes and context. Palliat Med. 2021;35:397–407.

Stover AMHL, van Oers HA, Greenhalgh J, Potter CM. Using an implementation science approach to implement and evaluate patient-reported outcome measures (PROM) initiatives in routine care settings. Qual Life Res. 2021;30:3015–33.

Manalili KSM. Using implementation science to inform the integration of electronic patient-reported experience measures (ePREMs) into healthcare quality improvement: description of a theory-based application in primary care. Qual Life Res. 2021;30:3073–84.

Patey AM, Fontaine, G., Francis, J. J., McCleary, N., Presseau, J., & Grimshaw, J. M. Healthcare professional behaviour: health impact, prevalence of evidence-based behaviours, correlates and interventions. Psychol Health. 2022:766–794. https://doi.org/10.1080/08870446.2022.2100887

Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recommendations for specifying and reporting. Implement Sci. 2013;8(1):1–11. https://doi.org/10.1186/1748-5908-8-139 .

Powell BJ, Waltz TJ, Chinman MJ, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implementation Sci. 2017;10:21. https://doi.org/10.1186/s13012-015-0209-1 .

Waltz TJ, Powell BJ, Matthieu MM, et al. Use of concept mapping to characterize relationships among implementation strategies and assess their feasibility and importance: results from the Expert Recommendations for Implementing Change (ERIC) study. Implement Sci. 2015;10:109. https://doi.org/10.1186/s13012-015-0295-0 .

Aromataris E MZ. Chapter 11: Umbrella Reviews. In: Aromataris E, Munn Z, eds. Joanna Briggs Institute Reviewer's Manual. The Joanna Briggs Institute; 2020.

Aromataris E, Fernandez R, Godfrey C, Holly C, Kahlil H, Tungpunkom P. Summarizing systematic reviews: methodological development, conduct and reporting of an Umbrella review approach. Int J Evid Based Healthc. 2015;13(3):132–40.

Moher D, Shamseer, L., Clarke, M. et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1). https://doi.org/10.1186/2046-4053-4-1

Gates MGA, Pieper D, Fernandes RM, Tricco AC, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378: e070849. https://doi.org/10.1136/bmj-2022-070849 .

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D. The. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2020;2021(88).

Dixon-Woods M, Agarwal, S., Young, B., Jones, D., & Sutton, A. Integrative approaches to qualitative and quantitative evidence. Health Development Agency; 2004.

Boudewijns EA, Trucchi, M., van der Kleij, R. M., Vermond, D., Hoffman, C. M., Chavannes, N. H., ... & Brakema, E. A. Facilitators and barriers to the implementation of improved solid fuel cookstoves and clean fuels in low-income and middle-income countries: an umbrella review. Lancet Planet Health. 2022.

Lewin SGC, Munthe-Kaas H, et al. Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLoS Med. 2015;12(10):e1001895. https://doi.org/10.1371/journal.pmed.1001895 .

The Centre for Implementation. The Consolidated Framework for Implementation Research (CFIR) 2.0. Adapted from "The updated Consolidated Framework for Implementation Research based on user feedback," by Damschroder, L.J., Reardon, C.M., Widerquist, M.A.O. et al., 2022, Implementation Sci 17, 75. Image copyright 2022 by The Center for Implementation. https://thecenterforimplementation.com/toolbox/cfir

Lewin S, Booth A, Glenton C, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings: introduction to the series. Implementation Sci 2018;13(Suppl 1):2. https://doi.org/10.1186/s13012-017-0688-3 .

Download references

Acknowledgements

We wish to acknowledge the involvement of a patient-partner on the RRISIQ grant supporting this project (Lisa Marcovici). LM will provide feedback and guidance on the findings of the umbrella review, orienting the interpretation of findings and the next steps of this project.

We wish to acknowledge funding from the Quebec Network on Nursing Intervention Research/Réseau de recherche en intervention en sciences infirmières du Québec (RRISIQ), a research network funded by the Fonds de recherche du Québec en Santé (FRQ-S). Funders had no role in the development of this protocol.

Author information

Authors and affiliations.

Ingram School of Nursing, Faculty of Medicine and Health Sciences, McGill University, 680 Rue Sherbrooke O #1800, Montréal, QC, H3A 2M7, Canada

Guillaume Fontaine, Lydia Ould Brahim, Sydney Wasserman & Sylvie D. Lambert

Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Sir Mortimer B. Davis Jewish General Hospital, CIUSSS West-Central Montreal, 3755 Chem. de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada

Guillaume Fontaine

Department of Family Medicine and Emergency Medicine, Faculty of Medicine and Health Sciences, Université de Sherbrooke, 3001 12 Ave N Building X1, Sherbrooke, QC, J1H 5N4, Canada

Marie-Eve Poitras

Centre Intégré Universitaire de Santé Et de Services Sociaux (CIUSSS) du Saguenay-Lac-Saint-Jean du Québec, 930 Rue Jacques-Cartier E, Chicoutimi, QC, G7H 7K9, Canada

Faculty of Nursing, Université Laval, 1050 Av. de La Médecine, Québec, QC, G1V 0A6, Canada

Maxime Sasseville

Centre de Recherche en Santé Durable VITAM, CIUSSS de La Capitale-Nationale, 2480, Chemin de La Canardière, Quebec City, QC, G1J 2G1, Canada

Faculty of Medicine & School of Public Health, Université de Montréal, Pavillon Roger-Gaudry, 2900 Edouard Montpetit Blvd, Montreal, QC, H3T 1J4, Canada

Marie-Pascale Pomey

Centre de Recherche du Centre Hospitalier de L, Université de Montréal (CR-CHUM), 900 Saint Denis St., Montreal, QC, H2X 0A9, Canada

Direction of Nursing, CIUSSS de L’Ouest de L’Île-de-Montréal, 3830, Avenue Lacombe, Montreal, QC, H3T 1M5, Canada

Jérôme Ouellet

Université Laval Library, Pavillon Alexandre-Vachon 1045, Avenue de La Médecine, Québec, Québec), G1V 0A6, Canada

Frédéric Bergeron

St. Mary’s Research Centre, CIUSSS de L’Ouest de L’Île-de-Montréal, 3777 Jean Brillant St, Montreal, QC, H3T 0A2, Canada

Sylvie D. Lambert

You can also search for this author in PubMed   Google Scholar

Contributions

GF and SDL conceptualized the study. GF, MEP, MS, MP, and SDL developed the study methods. GF drafted the manuscript, with critical revisions and additions by SDL. All authors provided intellectual content and reviewed and edited the manuscript. GF is the guarantor of this review. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guillaume Fontaine .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Fontaine, G., Poitras, ME., Sasseville, M. et al. Barriers and enablers to the implementation of patient-reported outcome and experience measures (PROMs/PREMs): protocol for an umbrella review. Syst Rev 13 , 96 (2024). https://doi.org/10.1186/s13643-024-02512-5

Download citation

Received : 24 May 2023

Accepted : 13 March 2024

Published : 26 March 2024

DOI : https://doi.org/10.1186/s13643-024-02512-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Patient-reported outcome measures
  • Patient-reported experience measures
  • Implementation science
  • Umbrella review
  • Systematic review
  • Overview of reviews
  • Facilitators

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

research methodology systematic review

  • Open access
  • Published: 24 August 2022

Identifying and understanding benefits associated with return-on-investment from large-scale healthcare Quality Improvement programmes: an integrative systematic literature review

  • S’thembile Thusini 1 ,
  • Maria Milenova 1 ,
  • Noushig Nahabedian 2 ,
  • Barbara Grey 2 ,
  • Tayana Soukup 1 &
  • Claire Henderson 1  

BMC Health Services Research volume  22 , Article number:  1083 ( 2022 ) Cite this article

4707 Accesses

3 Citations

1 Altmetric

Metrics details

We previously developed a Quality Improvement (QI) Return-on-Investment (ROI) conceptual framework for large-scale healthcare QI programmes. We defined ROI as any monetary or non-monetary value or benefit derived from QI. We called the framework the QI-ROI conceptual framework. The current study describes the different categories of benefits covered by this framework and explores the relationships between these benefits.

We searched Medline, Embase, Global health, PsycInfo, EconLit, NHS EED, Web of Science, Google Scholar, organisational journals, and citations, using ROI or returns-on-investment concepts (e.g., cost–benefit, cost-effectiveness, value) combined with healthcare and QI. Our analysis was informed by Complexity Theory in view of the complexity of large QI programmes. We used Framework analysis to analyse the data using a preliminary ROI conceptual framework that was based on organisational obligations towards its stakeholders. Included articles discussed at least three organisational benefits towards these obligations, with at least one financial or patient benefit. We synthesized the different QI benefits discussed.

We retrieved 10 428 articles. One hundred and two (102) articles were selected for full text screening. Of these 34 were excluded and 68 included. Included articles were QI economic, effectiveness, process, and impact evaluations as well as conceptual literature. Based on these literatures, we reviewed and updated our QI-ROI conceptual framework from our first study. Our QI-ROI conceptual framework consists of four categories: 1) organisational performance, 2) organisational development, 3) external outcomes, and 4) unintended outcomes (positive and negative). We found that QI benefits are interlinked, and that ROI in large-scale QI is not merely an end-outcome; there are earlier benefits that matter to organisations that contribute to overall ROI. Organisations also found positive aspects of negative unintended consequences, such as learning from failed QI.

Discussion and conclusion

Our analysis indicated that the QI-ROI conceptual framework is made-up of multi-faceted and interconnected benefits from large-scale QI programmes. One or more of these may be desirable depending on each organisation’s goals and objectives, as well as stage of development. As such, it is possible for organisations to deduce incremental benefits or returns-on-investments throughout a programme lifecycle that are relevant and legitimate.

Peer Review reports

Introduction

Health services worldwide are faced with challenges to improve the safety and quality of care whilst managing rising healthcare costs [ 1 , 2 , 3 , 4 ]. One way to improve healthcare quality is through Quality Improvement (QI). QI is a systematic approach to improving healthcare quality as well as strengthening health systems and reducing costs [ 5 , 6 ]. QI uses sets of methods such as Lean and Plan-Do-Study-Act (PDSA) [ 7 ]. These methods often incorporate analysis, improvement or reconfiguring, and monitoring of systems. QI is guided by Implementation and Improvement Sciences in the targeted design of improvement strategies to maximise programmes’ success [ 8 ]. QI can be implemented as small projects or large programmes aimed at benefiting entire organisations or health systems [ 9 , 10 ]. Healthcare is a complex system as it involves connections, actions and interactions of multiple stakeholders and processes [ 11 ]. Therefore, QI in healthcare is a complex intervention. This complexity can be costly.

QI may require significant investment to implement and maintain [ 12 , 13 ]. QI implementers must therefore demonstrate its value to help leaders justify and account for their investment decisions [ 14 , 15 ]. QI outcomes are assessed through programme evaluations, comparative research, and economic evaluations such as Return on Investment (ROI). ROI is increasingly being recommended for evaluating or forecasting financial returns (making a business case) for healthcare programmes [ 16 , 17 ]. Originally from accounting and economics, ROI methods calculate a programme’s costs against its benefits [ 18 ]. All perceived programme benefits must be converted to money (monetised) and reported as a single ratio or percentage, e.g., ROI of 1:1 means a 100% profit was made [ 19 ]. A favourable ROI is where a positive estimation of a financial return from an investment can be made [ 19 , 20 ]. However, most healthcare benefits are not amenable to monetisation [ 20 ]. Additionally, healthcare QI programmes do not often make a profit or save costs [ 21 ]. This raises questions of ROI utility in QI programmes.

ROI was introduced into healthcare as a simple objective measure of a programme’s success [ 16 ]. However, in practice, ROI methodology has been found to be complicated and sophisticated [ 22 ]. ROI has also been found to misrepresent reality due to its inability to incorporate certain crucial programme outcomes that are valued in healthcare [ 23 ]. The concerns over ROI have resulted in various attempts to refashion it. Today, there are ROI methods that encourage detailing of non-monetisable qualitative benefits in some way in addition to monetised benefits [ 24 , 25 ]. However, these methods still prioritise monetisable benefits [ 19 , 20 ]. As such, some have referred to ROI as insincere and synthetic [ 24 , 26 ].

The suitability of ROI as a method for evaluating the value of QI in healthcare and other service industries has been contested for decades [ 23 , 27 , 28 , 29 , 30 , 31 , 32 ]. Within and outside healthcare, others have requested a ‘return to value’ rather a focus on financial outcomes [ 33 ] or renamed ROI as ROQ ‘return on quality’ where quality and not profit is favoured [ 34 ]. This hints at ROI being a concept. As a concept, ROI encapsulates mental abstractions of how costs and benefits are perceived [ 35 , 36 ]. Thus, the apparent lack of ROI acceptance in healthcare suggests a need to understand ROI as a concept of a return-on-investment. Understanding the meaning of concepts in research is deemed a crucial step in advancing scientific inquiry [ 36 ].

This report is the second part of a larger study on the concept and determinants of ROI from large-scale healthcare QI. The current and previous studies were to develop the ROI concept and a framework for understanding the ROI concept in the healthcare context. The third study will focus on the determinants. In the first part (under submission), we developed the QI-ROI concept by differentiating ROI from similar concepts. In that study, we found that patient outcomes were seen as of primary importance. In addition, several other organisational benefits including financial benefits were also seen as important. We concluded that ROI in healthcare QI represents any valued benefit. We translated this conceptualisation as follows: attaining a return-on-investment whatever that is, is valued and therefore of benefit, and any benefit is of value in and of itself. We then proposed a framework for analysis of return-on-investment from QI programmes. We called this a QI-ROI conceptual framework.

In the current study, we sought to deepen our understanding of the QI-ROI concept. Gelman and Kalish stated that “concepts correspond to categories of things in the real world and are embedded in larger knowledge structures…the building blocks of ideas” [ 35 ] (p. 298). Therefore, in the current study, we aimed to search for these building blocks of the QI-ROI concept. The objective was to further develop the QI-ROI framework by exploring the categories of goals and benefits that reflect ROI from large-scale QI programmes. In other words, what QI authors and experts would deem or have deemed a return-on-investment from QI programmes. This knowledge was then used to compile types of benefits that if achieved, represent the QI-ROI. We also explored if and how QI benefits are linked to each other. The linkages were crucial in gaining insights into how the complexity of healthcare as well as QI as a complex intervention may impact ROI evaluation.

Underpinning theory

Our wider research project on the ROI concept is informed and underpinned by Complexity Theory. We deemed this theory pertinent, given the multiple QI objectives of multiple healthcare stakeholders. Complexity Theory encompasses a group of theories from different disciplines that highlight the interdependent, interconnected, and interrelated nature of a system i.e., human and technological components of an organisation [ 11 , 37 , 38 ]. These components influence each other in unpredictable ways with emergent consequences [ 11 ]. Therefore, complexity may lead to uncertainties, benefits, and challenges that may impact ROI. Various tools can be used to study this complexity in QI programmes [ 8 , 39 , 40 ]. However, in this study, Complexity Theory was used only to highlight the complexity during our analysis rather than to study it. We will examine this complexity in detail in our next study on ROI determinants.

Review type

This paper is part of a larger Integrative Systematic Review on the ROI concept and its determinants from healthcare QI programmes. Our review is registered with PROSPERO, CRD42021236948. We have amended the protocol firstly to add additional authors as the complexity of the review called for more author perspectives. Secondly, we added the use of Framework analysis instead of Thematic analysis. A link to our PRISMA reporting checklist [ 25 ] is included in the supplementary files . We followed review guidance on Integrated Reviews by Whittemore and Knafl [ 41 ] and Conceptual Framework Development by Jabareen [ 42 ]. This led to 8 separate review stages. Stage 1; clarifying research question, involved background reading as is discussed in our protocol on PROSPERO. The remainder of the stages are reported here. Stages 2–3 involved searching and selecting literature. In stage 4 we assessed the quality of research studies, stages 5–8 are reported in the synthesis, analysis, and results sections below.

Search strategy

We searched Medline, Embase, Global health, PsycInfo, EconLit, NHS EED, Web of Science, Google, Google scholar, organisational journals, as well hand-searched citations. Search terms were from these three categories: (1) healthcare or health*, (2) ROI related economic evaluation terms (SROI, CBA, CEA, CUA), as well as terms value, benefit, and outcomes, and (3) QI, and its specific methods. Table 1 contains definitions of search terms. No language/date limits were set to enable us to note any changes in QI-ROI conceptualisation over time. The search ended on January 30 th , 2021. The search strategy is provided as Supplementary Table  1 .

Eligibility

During our initial search, many articles identified themselves as large-scale QI programmes. To focus our selection criteria, we developed a preliminary ROI conceptual framework (Fig.  1 ). This framework contained various needs and obligations of healthcare organisations [ 53 , 54 ], which we assumed to signal desired organisational outcomes. The Framework had four criteria: 1) organisational performance (patients and financial outcomes), 2) organisational capacity and capability, 3) external relations (e.g., accreditation), and 4) unintended consequences (positive/negative). Organisational performance is a marker of how well organisations perform on delivering value for its stakeholders [ 55 ]. Thus, in a way it includes external relations, e.g., population health. However, they have been isolated here to deduce some unique external outcomes and obligations towards external stakeholders. We then used this framework to decide on eligibility. We included literature on discussions and evaluations of large-scale QI programmes at all healthcare levels (primary, secondary, tertiary) globally.

figure 1

Preliminary QI ROI conceptual framework

We included literature that mentioned at least three QI organisational goals or benefits, two of which had to be patient or financial outcomes. By doing this, we sought to isolate articles that discussed a wide range of QI outcomes, with patient and financial outcomes as basic organisational QI performance goals. In addition, articles had to mention use of at least one QI method and involvement of various stakeholders, in at least two organisational units. Altogether, this denoted a three-dimensional criteria: depth, breath, and complexity of programmes per organisation. Table 2 has Included/excluded article types.

Screening and selection of articles

Citations were downloaded onto Endnote, Clarivate [ 56 ] to compile a list of citations and remove duplicates. Rayyan software [ 57 ] was used to screen abstracts and full titles as per our search criteria. Screening and selection were performed by two independent reviewers, ST and MM. To refine our selection criteria, five articles were initially selected and discussed to clarify any uncertainties. The two reviewers then completed the screening and selection of the remaining articles independently: ST 100%, MM 5%. Overall agreement was over 90%. Disagreements were discussed and settled by ST and MM, as well as with co-author CH.

Data extraction

Data extraction was performed using words and phrases in the preliminary conceptual framework as well as outcomes in the review’s search terms. We searched for these from all parts of an article where QI benefits, outcomes, and goals may be discussed. This included the introduction, aims, objectives, results as well as discussion and conclusion. Articles were tabulated according to type of article, country, setting, programme type, and outcomes discussed. Data extraction file has been included as Supplementary Table 2 .

Quality assessment

For researchers of integrative reviews and conceptual development, quality assessment is optional as the quality of studies has little or no bearing on concept development [ 41 , 42 ]. As such, there was no intention to exclude articles based on their quality. However, to understand the scientific context in which QI benefits are discussed, we assessed all empirical studies using specific quality assessment and reporting tools. For reviews, we used Critical Appraisal Skills Programme (CASP) [ 58 ], for mixed methods, the Mixed Methods Appraisal Tool (MMAT) [ 59 ], for implementation studies; Standards for Reporting Implementation Studies (STaRI) [ 60 ]. For economic evaluations, the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) [ 61 ], and for QI, the Standards for QUality Improvement Reporting Excellence (SQUIRE) [ 62 ]. As these are different tools, there was no single criteria to judge collective study quality. We therefore assessed the number of appropriate items reported or addressed as per respective study’s tool. We assigned good if 80–100% items were addressed, moderate if 50–79% of items were addressed, and poor if less than 50%.

Data integration, synthesis, and analysis

We followed Framework Analysis [ 63 ], using guidance by Braun & Clarke [ 64 ] thematic analysis, and deductive-inductive hybrid analysis by Fereday & Muir-Cochrane [ 65 ]. This allowed us to identify data from our ROI preliminary conceptual framework as well as any emerging data related to ROI. During the synthesis we summarised findings from the integrated literature and compiled a table of themes, sub-themes, and related outcomes. In the analysis, we noted the complexity and relationships between these themes and outcomes.

The result was a developed QI-ROI framework that outlines the ROI concepts from our first study (e.g., efficiency, productivity, cost-management, cost-saving). Productivity is the quantity of outputs/returns (e.g., patients seen) per investment/input (e.g., staff). Efficiency is achieving those outputs from same or less inputs with least or no waste (e.g., in time, money, effort) [ 66 ]. Cost management are certain strategies used to manage cost [ 67 ]. Cost saving can be an outcome of efficiency, productivity, and cost-management. This initial QI-ROI framework was combined with the categories of QI benefits from the current study to form an extended QI-ROI framework.

A total of 10 428 articles were retrieved, 10 327 were excluded for various reason as shown in Fig.  2 . One hundred and two (102) articles were eligible, 34 were excluded and 68 included. Included articles were: Conceptual n  = 24, Quantitative studies n  = 19 , Qualitative studies n  = 3, Mixed-Methods studies n  = 8, Systematic Reviews n  = 8, Literature reviews n  = 2, Brief Reports n  = 4. Together, the articles represent 18 years of QI evaluation (2002–2020). Excluded articles were where programmes engaged a single department and/or discussed two or fewer QI outcomes/goals. Thirteen of these were collaboratives. There was one pre-print. A link to the excluded studies document is available in the supplementary files .

figure 2

PRISMA flow chart

Article characteristics

Included articles covered different healthcare levels and disciplines. Primary care included public health, child and maternal health, and mental health. Secondary and tertiary healthcare included mental health, medical and surgical care, critical care, accident and emergency and acute care services, paediatrics and neonatal care, outpatients, pharmacy, and laboratories. One article covered both health and social care, and another article was about QI in a healthcare related charitable organisation. Articles were from these global regions: Africa, Asia, Europe, Australia, and Canada. The mostly represented regions were the US and the UK. Only 15 of 68 articles were economically focused. ROI was a specific subject of only four articles [ 68 , 69 , 70 , 71 ], and five authors discussed development of QI business cases [ 33 , 72 , 73 , 74 , 75 ]. One article discussed cost–benefit analysis from a qualitative perspective [ 76 ], there were two economic systematic reviews, and three economic evaluations. de la Perrelle et al. [ 77 ] also found this lack of economic evaluations in their systematic review. However some authors reported their implementation costs [ 78 , 79 , 80 ]. The summary of included studies can be found as Supplementary Table 3 .

Quality of studies

Thirty articles were not subject to quality assessment. These were conceptual articles, unsystematic literature reviews, and brief reports. Thirty-eight articles were subjected to quality assessment: 19 quantitative studies, three qualitative studies, eight mixed-methods studies, and eight systematic reviews. Of the 38 studies, 39% reported or addressed 80%-100% of all items required, 43% reported on 50%-79% of the data required, and 18% reported below 50% of items required by their respective reporting tool. The main areas of poor rigour were: ethics (29%), statistical analysis methods (75%), discussion and management of study limitations (42%). For some mixed methods studies (29%), integration of quantitative and qualitative data was unclear. In some cases, these issues may merely reflect poor reporting. However in the absence of data, poor rigour was assumed. Overall, the quality of the studies was summed-up as moderate. The quality assessment summary is provided as Supplementary Table 4 .

Data synthesis and analysis

Authors either directly studied QI outcomes, reported additional QI outcomes and benefits, and or discussed QI goals and missed opportunities. A number of papers reported financial savings or had savings as a goal [ 77 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 ]. Gandjour & Lauterbach [ 89 ] noted that cost-saving was more likely when improving an over-use or misuse problem. For example, an article reported cost-reduction from malpractice suits [ 74 ]. Financial benefits through QI were mostly internal to organisations, and a small number involved societies and healthcare funders [ 73 , 75 ].

There was a shared view that quality and patient safety should be more central to QI and investment goals than financial outcomes [ 72 , 88 , 90 , 91 , 92 , 93 , 94 , 95 ]. This view had not changed over time. Thus, QI goals were primarily improving patient outcomes through systems, structural, process, and behavioural improvements. This enabled improved efficiency and productivity. Efficiency and productivity enabled managers’ abilities to manage, minimise, or reduce costs, and eventually save costs [ 73 , 94 , 96 , 97 , 98 ]. Systems efficiency helped improve staff efficiency, effectiveness, productivity, and experience, which benefited patients [ 84 , 99 , 100 ]. Improved systems enabled improved organisational capacity, capability, and resilience [ 93 , 101 , 102 , 103 , 104 , 105 , 106 ].

Most authors highlighted that good quality and patient safety relied upon good staff outcomes and leadership. A few studies focused on some of these specific areas. Examples include Mery et al. [ 71 ] who studied QI programmes as an organisational capability and capacity development tool. Hatcher [ 83 ] studied QI as a staff safety promotion tool, Lavoie-Tremblay et al. [ 99 ] evaluated QI as a tool for team effectiveness. Furukawa et al. [ 107 ] and Heitmiller et al. [ 84 ] focused QI towards environment sustainability. MacVane [ 96 ] saw QI as a governance tool. Williams et al. [ 100 ] focused on both staff and patient outcomes. QI was also used to operationalise organisations’ strategies [ 93 , 108 ]. Staines et al. [ 108 ] found that a positive QI reputation allowed recruitment of a suitable CEO.

There was a general recognition that QI does not always achieve its intended goals. Additionally, some QI strategies were more successful than others [ 80 ]. Particularly, some literature reviews and empirical studies reported variable, mixed, or inconclusive results [ 86 , 109 , 110 , 111 , 112 , 113 , 114 , 115 ], even a decline in quality [ 99 ]. A few articles discussed negative unintended outcomes [ 81 , 100 , 104 , 110 , 112 , 114 , 116 , 117 , 118 , 119 ]. de la Perrelle et al. [ 77 ] noted this lack of reporting of negative findings in their review. They suspected this to be due to publication bias. Rationales for not achieving goals were given as implementation difficulties related to contextual and behavioural challenges [ 78 , 114 , 120 , 121 ].

Some authors noted that overall benefits accrued over time during phases of a programme’s implementation process [ 80 , 122 ]. Morganti et al. [ 123 ] noted different measures of QI success but suggested that spread of a programme was a measure of lasting success. Sustainability of outcomes was therefore also seen as an important achievement by most authors. This was supported by some of the literature which also indicated that successful QI built legacies mainly through spreading, embedding, and sustaining improvements [ 78 , 93 , 101 , 102 , 103 , 104 , 105 , 106 ]. This finding was confirmed by impact studies, extensive QI programme evaluations and discussions of overall QI impacts [ 69 , 85 , 87 , 93 , 103 , 104 , 105 , 106 , 108 , 115 , 116 , 119 , 121 , 124 , 125 , 126 ]. These literatures elaborated on QI goals, failures, and successes, as well as the lessons learnt. Authors suggested that lessons and cultural changes as a result of QI were essential to meeting patient safety needs [ 93 , 109 ]. Authors highlighted that ultimately, QI benefited a wide range of stakeholders at different levels in different ways.

Based on the findings, we compiled data into four overarching themes (Table 3 ). These themes aligned with our ROI preliminary framework; however, adjustments were made to reflect the findings. Organisational capacity and capability was renamed organisational development to acknowledge the broader organisational outcomes. This included all the outcomes that develop and improve organisations’ ability to fulfil their duties. Resilience and QI legacy were additional sub-themes under organisational development. External relations was renamed external outcomes to reflect the broad outcomes beyond relationships with regulators, communities, and other organisations. External outcomes were extended to include collaboration, societal and environmental outcomes, and incentives. Incentives included accreditation, awards, ranking, competitiveness, influence, power, and financial rewards.

Negative unintended outcomes include any negative impact resulting from a QI programme. These were external imposition, top-down distortions, duplication, high resource demands, loss of revenue, and loss of buy-in. Authors reported that at times external or managerial agendas were superimposed over other QI goals [ 108 , 116 , 127 , 128 ]. At times this caused duplication of processes (e.g., data collection) and or increased demand on already stretched services. In addition, successful QI can cause loss of funding as services become absolute [ 108 ]. Eventually different negative outcomes may cause staff or leaders to disengage from current or future QI.

Positive unintended outcomes were difficult to delineate as often programmes were geared towards patient outcomes but impacted other parts of an organisation in the process. However, as improvement strategies involved changing systems and human behaviours, improvement of these aspects must be intended. We therefore had this sub-theme only include new innovations and opportunities. The final overarching themes were named 1) organisational performance (two sub-themes), 2) organisational development (12 sub-themes), 3) external outcomes (five sub-themes), 4) unintended outcomes (two sub-themes).

Based on the themes, we updated our ROI preliminary conceptual framework to map the four overarching themes that represent QI-ROI (Fig.  3 ). The beneficial outcomes are presented under the headings “gains, benefits, returns”, whilst negative outcomes are presented as “losses, costs, investments”. These concepts are technically defined differently. They are used together here to denote their co-existence within QI programmes. For example, loss of revenue is a potential investment lost, high resource demands may require investment or incur costs, duplication is inefficient and costly, loss of buy-in is a costly setback. All will raise money spent or lost if not well managed or avoided. They may also affect organisational performance and development, as well as stakeholder engagement in future programmes. Thus, impacts are both monetary and non-monetary.

figure 3

Updated preliminary ROI Conceptual Framework. Most QI goals and outcomes affect an organisation’s culture. The four overarching themes are connected and influence one another e.g., improved performance enabled attainment of external incentives. An overlap exists amongst these themes, e.g., collaboration was improved both internally (organisational development, and externally as an external QI benefit)

Authors also saw investments as both in monetary and non-monetary forms. These were viewed as both equally essential for patient safety and quality. Some of these investments were part of ongoing organisational strategies. Investments included staff time, recruitment and retention costs, training costs, patient engagement costs [ 68 , 69 , 77 , 95 , 108 , 113 , 114 , 116 ]. Some investments depended on the goodwill of the staff and patients and were seen as priceless [ 119 ]. Staines et al. [ 108 ] referred to two types of investments: “hard” infrastructure (e.g., technology) and “soft” infrastructure (e.g., awareness, commitment, and culture).

The literature also noted that QI outcomes are interlinked and interrelated, and as such QI-ROI may not be readily observable. Deducing ROI may require studying “cause-and-effect chains” [ 92 ] (p. 2) or an ROI chain; the link between events from a given investment to a given outcome. Sibthorpe et al. [ 113 ] saw this as important for understanding QI impacts and attracting QI investment. This can be done by tracking inputs, processes, outputs, and outcomes as much as possible throughout a programme. By doing this, the integrity of the ROI chain may be assessed by identifying areas where QI-ROI is created, lost, or influenced. This may then help maximise QI-ROI. However, tracking this chain in complex contexts may be a challenge.

The QI-ROI chain

In complex systems, programme inputs, processes, outputs are not a once-only event, occurring only at initial implementation. Outcomes of earlier inputs, outputs, and processes become inputs in the next phase and so forth until the final impact is achieved (end-ROI). It may therefore be helpful to recognise and celebrate earlier achievements [ 33 , 97 ]. Further, before a final impact is realised, a programme may act and interact with several variables. Due to this complexity, the linkages may resemble a web rather than a chain. The literature attested to the fact that QI impacts are unpredictable, and difficult to measure [ 33 , 113 , 119 ]. QI inputs may or may not be converted into active QI ingredients that will affect organisational change and improvement [ 80 ]. For example, if one of the strategies is to train staff; do they actually learn what is needed? The answer would depend on several internal and external determining factors [ 78 , 79 , 114 , 120 , 121 ]. Such factors may force adaptations, influence fidelity to strategies, sustainability, and decisions to proceed, de-implement or disinvest.

The ROI chain is illustrated here in Figs.  4 and 5 . Figure  4 demonstrates that the overall ROI results from changes in processes, structures, and systems. These may be visible through behavioural (human and systems), and technological improvements, before final impact and ROI can be detected. Two-tier order mechanisms are alluded to here; the first order mechanisms operationalise QI strategies and provide non-monetary ROI, whilst the second order mechanisms convert QI efforts into financial returns. A first order mechanisms may be for example increased staff proficiency leading staff development, whilst a second order may be improved productivity due to increased proficiency. Productivity may then help save costs.

figure 4

QI-ROI Chain

figure 5

Extended QI-ROI conceptual framework: phased format

In summary, different investments are made towards a QI programme and a change is propagated through changing and improving processes, behaviours, systems, and structures. Technical (e.g., skills) and social (e.g., culture) changes and improvements may be achieved. These changes and improvements can then lead to improved efficiency and productivity. Efficiency and productivity can then improve cost-management. Better cost-management and control can then lead to cost-reduction, cost-minimisation, cost-avoidance, cost-containment, and cost-saving. All these are outputs, immediate and intermediate outcomes that become mechanisms through which monetary ROI is achieved. Before then, the outputs present as non-monetary returns-on-investments either as enabled abilities (e.g., cost-management, cost-reduction, cost-minimisation, cost-avoidance, cost-containment), outputs or intermediate outcomes (e.g., improved behaviour, productivity, efficiency).

Non-monetary ROI can also be achieved through organisational development e.g., staff development and collaboration. Organisational development is the basis for safe healthcare systems and may lead to cost-saving, and hard cash ROI. Improvements in staff and process outcomes may improve culture, which may also improve patient and financial outcomes. Improvements in patient outcomes may lead to further benefits (e.g., incentives), and become an organisation’s legacy (culture, capacity and capabilities). This can help an organisation become more resilient and sustainable. QI culture and QI legacies are the basis from which future organisational development as well as patient and financial outcomes can be achieved.

Altogether, the QI outcomes contribute to higher goals such as organisational learning, transformation, financial stability, value-based healthcare, and high reliability [ 101 , 102 , 105 , 116 ]. Although intended goals and short-term outcomes may be achieved earlier, long-term sustainable impacts depend on successful implementation, embedding a QI safety culture, and developing legacies that support future improvement efforts. Whatever the end-outcome, lessons may be learnt, research, innovation and development may ensue, capacities and capabilities may improve. As Banke-Thomas et al. [ 68 ] stated, “ The application of (S)ROI … could be used to inform policy and practice such that the most cost-beneficial interventions are implemented to solve existing (public health) challenges” (p.10).

Figure  5 illustrates the updated QI-ROI conceptual framework in a phased format. This figure represents the current conceptualisation of QI-ROI based on our analysis of the healthcare QI evaluation literature. The processes described here are more complex but have been simplified for clarity. The figure contains the ROI-like concepts from our first study (e.g., efficiency, productivity, effectiveness, cost saving). These concepts are seen here as building blocks of financial ROI. However, some of these also form part of improvements in other organisational performance and developmental goals. Such improvements can be seen as non-monetary ROI which includes improved abilities, development, and overall improved outputs and outcomes. Together, these are the building blocks of the QI-ROI concept as indicated by the literature.

The aim of this part of the review was to further develop a framework for understanding the benefits that reflect the concept of ROI from large-scale healthcare QI programmes (the QI-ROI). We achieved this by reviewing different QI literatures on the goals and or benefits from QI programmes. The goals embody aspirations or QI-ROI as imagined, whilst the reported outcomes and benefits represent QI-ROI as experienced. Together, these form a concept of QI-ROI. We considered negative outcomes to be part of this conceptualisation as they may highlight perceptions of the absence of the QI-ROI. We grounded our theoretical assumptions on organisational needs, duties, and obligations as defined by organisations themselves as well as various internal and external stakeholders.

Our assumption was that at a minimum, a QI programme that delivers on any organisational needs and obligations, delivers a return-on-investment for healthcare organisations. The reviewed literature revealed numerous QI goals and outcomes. These included aspects of an organisation’s performance and development, as well as external and unintended QI outcomes. Through the Complexity Theory lens, we noted the different connections of these outcomes. This deepened our understanding of QI-ROI as a collection of interlinked QI benefits that occur incrementally throughout a programme’s lifecycle. These benefits include systems, processual, and structural improvements. Central to these, are sustainable improved patient outcomes.

Although QI effectiveness was not the focus of this review, it is related to QI-ROI. In-fact some view ROI as an overall measure of QI effectiveness [ 22 ]. Since the induction of QI into healthcare, a sizeable body of literature have questioned QI’s value and effectiveness [ 136 , 137 , 138 , 139 , 140 , 141 , 142 ]. Several factors have been found to determine QI’s success. These include aspects of organisations’ structures, systems, behaviours, cultures, and leadership [ 143 , 144 ]. The collection of benefits referred to in this review as QI-ROI largely contribute towards these QI effectiveness determinants [ 145 , 146 , 147 ]. Thus, improvement in these aspects must be of value for organisations. Further, achieving QI’s pre-defined goals (effectiveness) is not the end, but part of the journey towards QI-ROI. This is important to note as depending on the QI resources required, costs may increase, rendering QI value inversely related to its cost [ 21 , 148 , 149 ].

The insights into the building blocks of good quality healthcare are not new and inter-disciplinary health services research attest to this [ 150 , 151 , 152 , 153 ]. Wider health and social science as well as organisational literature have repeatedly pointed to the importance of improving staff capacities and capabilities, as well as experience [ 154 ]. A systematic review by Hall et al. [ 155 ], found that poor staff wellbeing and burnout are frequently associated with poor patient outcomes. Latino [ 156 ] argued that the intellectual capital of human beings is one of the greatest benefits not captured through financial outcomes. Implementation and Improvement Sciences have highlighted the importance of understanding contexts, interventions, and human behaviour and their influence on QI programme success and sustainability [ 39 , 40 ].

Effective leadership was a consistent patient safety pre-requisite in the Francis Mid-Staffordshire review [ 157 ]. The Francis review also highlighted negative cultures and failure to learn as contributing factors to poor quality care. Negative QI outcomes and failed attempts must be avoided, but they are part of learning safety cultures [ 158 ]. Patient engagement has also been found to be crucial in leaning and safety cultures [ 159 ]. A safety culture: one that prioritises safe care, is thus deemed foundational to efforts to improve quality and safety [ 158 , 160 , 161 , 162 , 163 , 164 ].

There are of-course other ways to improve healthcare, and organisations do invest in various programmes that specifically target some of the themes within our QI-ROI conceptual framework, for example leadership programmes [ 165 ]. Determining whether QI or other types of investments and programmes led to any specific improvement is known to be challenging [ 166 , 167 ]. As a result, claims of causality are not possible. Through Complexity Theory, QI-ROI can be viewed in terms of contribution or correlation to organisational outcomes rather than direct attribution [ 11 , 37 , 166 ]. Understanding of QI contribution to organisational outcomes may be achieved through methods such as contribution analysis and the action effect method [ 166 , 167 ]. These methods can help detect the type and level of QI contribution.

QI’s key contributions to healthcare improvement are evident in the reviewed literature, and external bodies such as the UK Care Quality Commission (CQC) attest to this. In 2018, 80% of Trusts rated “Outstanding” by the CQC had organisational improvement programmes [ 101 ]. As a result, QI was identified in the UK National Health Service (NHS) Long-term Plan as an approach for improving every aspect of how the NHS operates [ 168 ]. Further, organisations that have mature improvement cultures claim to have benefited in several of the QI-ROI conceptual framework’s dimensions [ 169 , 170 , 171 ]. Mature organisations indicate that, in addition to organisational development and performance, environmental and social impacts [ 172 ], reputation, [ 173 ], and resilience [ 174 ], are crucial organisational outcomes. QI programmes are now also used to engage with modern healthcare agendas like value-based healthcare and environmental sustainability [ 175 , 176 ]. In achieving such goals, QI programmes can be cost-effective without saving actual costs [ 177 ].

However, QI-ROI is not a one-time event. ROI may be created or lost at different stages of a programme [ 25 ]. In a complex healthcare programme, QI-ROI is iterative and dynamic with many determinants, some of them outside the control of QI implementers alone [ 13 , 39 , 167 ]. Additionally, QI may affect various levels of stakeholders from frontline, to societies, to policymakers differently. [ 13 , 39 , 167 ]. These levels interact with and influence each other [ 11 , 39 ]. As such, it is important to note the co-dependencies of QI outcomes when planning and evaluating QI. As Donabedian [ 178 ] stated; structures, processes and outcomes are mutually dependent. This means that it is important to take small wins with big wins through observing the QI-ROI chain [ 179 ]. Therefore, not only is the traditional ROI approach unreliable as a forecasting tool, as an evaluation tool, it is a distal and an incomplete marker of QI value.

Finally, large-scale programmes took many forms, some internal and some involving external collaborators. Collaborations have been recommended as a way to improve patient safety and experience, and save costs [ 180 , 181 ]. However, unless formally integrated, organisations run internal budgets, their performance assessed individually, and with own governance structures [ 14 , 182 , 183 , 184 ]. Notably, collaboratives appear to be geared towards health system-wide benefits and indirectly address organisational-level impacts [ 138 ]. Therefore, collaboratives may bring unique challenges as well as benefits. This may mean that different organisations at different developmental levels deduce different outcomes from the same QI programmes [ 102 , 146 ]. Research developments here will be valuable to improve understanding of QI-ROI, for example how and why collaboratives work (or not) [ 51 , 185 ]. Nonetheless, this review reveals largely shared QI goals and outcomes regardless of the type of large-scale programme.

Strengths and limitations

A strength of our review is that our theoretical assumptions were grounded on organisational needs, duties, and obligations as defined by organisations and external stakeholders. This step preceded the first study where we analysed different returns-on-investments concepts in healthcare QI. The current study sought to strengthen the first study’s QI-ROI conceptual framework by connecting the QI-ROI concept with categories of QI benefits as seen by healthcare QI stakeholders. Additionally, our review lens through complexity theory gave us a glimpse of the processes though which these QI-ROI building blocks independently or in concert may influence ROI. As such, our framework provides clues to its practical application.

A limitation of this review is that it was broad, encompassing various disciplines in various countries, reporting on different types of programmes. The review was meant to be an exploration of the QI field’s view of QI returns-on-investment. Researchers may wish to explore these in specific contexts, for example by studying particular “building blocks” of QI-ROI in a specific context or programme. Additionally, some of the literature is quite dated, however newer literature do suggest continuance of some trends and issues in QI-ROI and QI business case matters. Lastly, subjectivity in the synthesis and analysis cannot be ruled out as it is inherent in qualitative analyses [ 63 ].

Implications for research and practice

Economic evaluation of large-scale programmes are a new phenomenon, and research is needed to help identify the most suitable evaluation methods. This need is compounded by the fact that large-scale QI programmes come in many forms. It is important to assess QI’s contribution to organisational performance and development through suitable and innovative research methods such as realist reviews rather than seek a definitive causal link which may be imperceptible in complex large QI programmes. A study of collaboratives alone or in comparison to internal organisation-wide QI programmes may help explore the best ways to approach large-scale QI programmes to maximise ROI. In addition, a thorough study of the relationships of the QI-ROI determinants as well as QI benefits may help to understand why and how QI benefits influence one another. Lastly, guidance on how to weigh different QI benefits, and how to develop a standardisable yet flexible QI-ROI tools will be crucial for future research and practical application.

ROI in healthcare is a highly debated topic. This review is but one contribution to this ongoing debate. Our review suggests that in healthcare, ROI must reflect value-based healthcare principles, with value defined as patient and organisational benefits. We hope that by defining the ROI concept in this manner, links between wider large-scale QI benefits and organisational strategic intents will be highlighted. In doing this, leaders may be able to frame QI value, benefits and thus ROI in a useful way. This broader view is crucial if healthcare organisations and health systems are to continue investing in essential healthcare quality improvements. ROI is not a one-time event and may be created or lost at different stages of a programme. Further, many factors determine whether it can be deduced, many of them outside the control of QI implementers. Such factors must be taken into consideration in valuing healthcare QI.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. Some data has been included in this published article as its supplementary information files.

Abbreviations

  • Quality Improvement
  • Return on Investment

Social Return on Investment

Return on Investment from healthcare nhquality improvement

Cost Effectiveness Analysis

Cost Utility Analysis

Cost Benefit Analysis

Alderwick H, Charles A, Jones B, Warburton W. Making the case for quality improvement: lessons for NHS boards and leaders. London: King's Fund. 2017.

Hadad S, Hadad Y, Simon-Tuval T. Determinants of healthcare system’s efficiency in OECD countries. Eur J Health Econ. 2013;14(2):253–65.

Article   PubMed   Google Scholar  

Knapp M, Wong G. Economics and mental health: the current scenario. World Psychiatry. 2020;19(1):3–14.

Article   PubMed   PubMed Central   Google Scholar  

Pollack J, Helm J, Adler D. What is the Iron Triangle, and how has it changed? 2018.

Book   Google Scholar  

Batalden PB, Davidoff F. What is “quality improvement” and how can it transform healthcare? Qual Saf Health Care. 2007;16(1):2–3.

Øvretveit J, Gustafson D. Evaluation of quality improvement programmes. Qual Saf Health Care. 2002;11(3):270–5.

Boaden R. Quality improvement: theory and practice. Br J Healthc Manag. 2009;15(1):12–6.

Article   Google Scholar  

Nilsen P. Making sense of implementation theories, models and frameworks. Implement Sci. 2015;10(1):53.

Ovretveit J, Gustafson D. Using research to inform quality programmes. BMJ. 2003;326(7392):759–61.

Benn J, Burnett S, Parand A, Pinto A, Iskander S, et al. Studying large-scale programmes to improve patient safety in whole care systems: challenges for research. Soc Sci Med. 2009;69(12):1767–76.

Braithwaite J, Churruca K, Long JC, Ellis LA, Herkes J. When complexity science meets implementation science: a theoretical and empirical analysis of systems change. BMC Med. 2018;16:1–4.

Roberts SLE, Healey A, Sevdalis N. Use of health economic evaluation in the implementation and improvement science fields-a systematic literature review. Implementation Sci. 2019;14(1):72.

Saldana L, Chamberlain P, Bradford WD, Campbell M, Landsverk J. The Cost of Implementing New Strategies (COINS): a method for mapping implementation resources using the stages of implementation completion. Child Youth Serv Rev. 2014;39:177–82.

Brinkerhoff DW. Accountability and health systems: toward conceptual clarity and policy relevance. Health Policy Plan. 2004;19(6):371–9.

Chua KC, Henderson C, Grey B, Holland M, Sevdalis N. Evaluating quality improvement at scale: routine reporting for executive board governance in a UK National Health Service organisation. medRxiv. 2021: p. 2020.02.13.20022475.

Pokhrel S. Return on investment (ROI) modelling in public health: strengths and limitations. Eur J Pub Health. 2015;25(6):908–9.

World Health Organisation (WHO). Making the investment case for mental health: a WHO/UNDP methodological guidance note. Geneva: World Health Organization; 2019.

Google Scholar  

Botchkarev A. Estimating the Accuracy of the Return on Investment (ROI) Performance Evaluations. 2015. arXiv:1404.1990.

Botchkarev A, Andru P. A Return on investment as a metric for evaluating information systems: taxonomy and application. Interdiscip J Inf Knowl Manag. 2011;6:245–69.

Solid CA. Return on investment for healthcare quality improvement. 2020: Springer.

Rauh SS, Wadsworth EB, Weeks WB, Weinstein JN. The savings illusion - Why clinical quality improvement fails to deliver bottom-line results. N Engl J Med. 2011;365(26):e48.

De Meuse KP, Dai G, Lee RJ. Evaluating the effectiveness of executive coaching: beyond ROI? Coaching Int J Theory Res Pract. 2009;2(2):117–34.

Masters R, Anwar E, Collins B, Cookson R, Capewell S. Return on investment of public health interventions: a systematic review. J Epidemiol Community Health. 2017;71(8):827.

Bukhari H, Andreatta P, Goldiez B, Rabelo L. A framework for determining the return on investment of simulation-based training in health care. Inquiry. 2017;54:0046958016687176.

PubMed Central   Google Scholar  

Phillips PP, Phillips JJ, Edwards LA. Measuring the success of coaching: a step-by-step guide for measuring impact and calculating ROI: American Society for Training and Development. 2012.

Andru P, Botchkarev A. Return on investment: a placebo for the Chief Financial Officer… and other paradoxes. J MultiDiscip Eval. 2011;7(16):201–6.

Boyd J, Epanchin-Niell R, Siikamäki J. Conservation planning: a review of return on investment analysis. Rev Environ Econ Policy. 2015;9(1):23-42.42.

Brousselle A, Benmarhnia T, Benhadj L. What are the benefits and risks of using return on investment to defend public health programs? Prev Med Rep. 2016;3:135–8.

Dearden J. Case against ROI control. Harvard Business Review. 1969.

Ozminkowski RJ, Serxner S, Marlo K, Kichlu R, Ratelis E, Van de Meulebroecke J. Beyond ROI: using value of investment to measure employee health and wellness. Popul Health Manag. 2016;19(4):227–9.

Price CP, McGinley P, John AS. What is the return on investment for laboratory medicine? The antidote to silo budgeting in diagnostics. Br J Healthc Manag. 2020;26(6):1–8.

Lurie N, Somers SA, Fremont A, Angeles J, Murphy EK, Hamblin A. Challenges to using a business case for addressing health disparities. Health Aff. 2008;27(2):334–8.

Fischer HR, Duncan SD. The business case for quality improvement. J Perinatol. 2020;40(6):972–9.

Rust RT, Zahorik AJ, Keiningham TL. Return on quality (ROQ): Making service quality financially accountable. J Mark. 1995;59(2):58–70.

Gelman SA, Kalish CW. Conceptual development. Handbook of child psychology. 2007. p. 2.

Hupcey J, Penrod J. Concept analysis: examining the state of the science. Res Theory Nurs Pract. 2005;19:197–208.

Fenwick T. Response to Jeffrey McClellan. Complexity theory, leadership, and the traps of utopia. Complicity: Int J Complex Educ. 2010;7(2):90-96.

Manson SM. Simplifying complexity: a review of complexity theory. Geoforum. 2001;32(3):405–14.

Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the Context and Implementation of Complex Interventions (CICI) framework. Implement Sci. 2017;12(1):21.

Damschroder LJ, Reardon CM, Lowery JC. The consolidated framework for implementation research (CFIR). Handbook on implementation science: Edward Elgar Publishing; 2020.

Whittemore R, Knafl K. The integrative review: updated methodology. J Adv Nurs. 2005;52(5):546–53.

Jabareen Y. Building a conceptual framework: philosophy, definitions, and procedure. Int J Qual Methods. 2009;8(4):49–62.

Berdot S, Korb-Savoldelli V, Jaccoulet E, Zaugg V, Prognon P, Lê LMM, et al. A centralized automated-dispensing system in a French teaching hospital: return on investment and quality improvement. Int J Qual Health Care. 2019;31(3):219–24.

Drummond MF, Sculpher MJ, Claxton K, Stoddart GL, Torrance GW, Methods for the Economic Evaluation of Health Care Programmes. Oxford. United Kingdom: Oxford University Press; 2015.

Mason J, Freemantle N, Nazareth I, Eccles M, Haines A, Drummond M. When is it cost-effective to change the behavior of health professionals? JAMA. 2001;286(23):2988–92.

Article   CAS   PubMed   Google Scholar  

Viner J. The utility concept in value theory and its critics. J Polit Econ. 1925;33(6):638–59.

Chartier LB, Cheng AH, Stang AS, Vaillancourt S. Quality improvement primer part 1: preparing for a quality improvement project in the emergency department. Can J Emerg Med. 2018;20(1):104–11.

Jones B, Vaux E, Olsson-Brown A. How to get started in quality improvement. BMJ. 2019;364:k5408.

Healthcare Quality Improvement Partnership (HQIp). A guide to Quality Improvement methods. Healthcare Quality Improvement Partnership. 2015.

Øvretveit J, Klazinga N. Learning from large-scale quality improvement through comparisons. Int J Qual Health Care. 2012;24(5):463–9.

Schouten LM, Grol RP, Hulscher ME. Factors influencing success in quality-improvement collaboratives: development and psychometric testing of an instrument. Implement Sci. 2010;5(1):1–9.

Guidance: The health and care system explained. UK Department of Health and Social Care 2013.

Gartner JB, Lemaire C. Dimensions of performance and related key performance indicators addressed in healthcare organisations: A literature review. Int J Health Plann Manage. 2022: 1–12.

Kruk ME, Gage AD, Arsenault C, Jordan K, Leslie HH, Roder-DeWan S, Adeyi O, Barker P, Daelmans B, Doubova SV, English M. High-quality health systems in the Sustainable Development Goals era: time for a revolution. Lancet Glob Health. 2018;6(11):e1196–252.

Elg M, Broryd KP, Kollberg B. Performance measurement to drive improvements in healthcare practice. Int J Oper Prod Manag. 2013;79(3):13-24.

The EndNote Team, Clarivate. EndnoteTM. [EndNote X9]. 2013.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):1–10.

Critical Appraisal Skills Programme (CASP) 2019. Available from: https://caspuk.net/referencing/#:~:text=Referencing%20%E2%80%93%20We%20would%20recommend%20using,at%3A%20Accessed%3A%20Date%20Accessed . [Cited 2021 24/11].

Hong QN, Pluye P, Fàbregues S, Bartlett G, Boardman F, Cargo M, et al. Mixed methods appraisal tool (MMAT), version 2018. Registration of copyright. 2018;1148552:10.

Pinnock H, Barwick M, Carpenter CR, Eldridge S, Grandes G, Griffiths CJ, et al. Standards for Reporting Implementation Studies (StaRI) Statement. BMJ. 2017;356:i6795.

Husereau D, Drummond M, Petrou S, Carswell C, Moher D, Greenberg D, et al. Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement. BMJ. 2013;346:f1049.

Ogrinc G, Mooney SE, Estrada C, Foster T, Goldmann D, Hall LW, et al. The SQUIRE (Standards for QUality Improvement Reporting Excellence) guidelines for quality improvement reporting: explanation and elaboration. Qual Saf Health Care. 2008;17 Suppl 1(Suppl_1):i13-32.

Parkinson S, Eatough V, Holmes J, Stapley E, Midgley N. Framework analysis: a worked example of a study exploring young people’s experiences of depression. Qual Res Psychol. 2016;13(2):109–29.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.

Fereday J, Muir-Cochrane E. demonstrating rigor using thematic analysis: a hybrid approach of inductive and deductive coding and theme development. Int J Qual Methods. 2006;5(1):80–92.

Sheiner L, Malinovskaya A. Measuring productivity in healthcare: an analysis of the literature. Hutchins center on fiscal and monetary policy at Brookings. 2016.

Hoffman JM, Koesterer LJ, Swendrzynski RG. ASHP guidelines on medication cost management strategies for hospitals and health systems. Am J Health Syst Pharm. 2008;65(14):1368–84.

Banke-Thomas AO, Madaj B, Charles A, van den Broek N. Social Return on Investment (SROI) methodology to account for value for money of public health interventions: a systematic review. BMC Public Health. 2015;15(1):582.

Crawley-Stout LA, Ward KA, See CH, Randolph G. Lessons learned from measuring return on investment in public health quality improvement initiatives. J Public Health Manag Pract. 2016;22(2):E28–37.

Moody M, Littlepage L, Paydar N. Measuring social return on investment. Nonprofit Manag Leadersh. 2015;26(1):19–37.

Mery G, Dobrow MJ, Baker GR, Im J, Brown A. Evaluating investment in quality improvement capacity building: a systematic review. BMJ Open. 2017;7(2):e012431.1.

Bailit M, Dyer MB. Beyond bankable dollars: establishing a business case for improving health care. Issue Brief (Commonw Fund). 2004;754:1–12.

Leatherman S, Berwick D, Iles D, Lewin LS, Davidoff F, Nolan T, et al. The business case for quality: case studies and an analysis. Health Aff (Project Hope). 2003;22(2):17.

Perencevich EN, Stone PW, Wright SB, Carmeli Y, Fisman DN, Cosgrove SE, et al. Raising standards while watching the bottom line: making a business case for infection control. Infect Control Hosp Epidemiol. 2007;28(10):1121–33.

Swensen SJ, Dilling JA, Mc Carty PM, Bolton JW, Harper CM Jr. The business case for health-care quality improvement. J Patient Saf. 2013;9(1):44–52.

Rogers PJ, Stevens K, Boymal J. Qualitative cost–benefit evaluation of complex, emergent programs. Eval Program Plann. 2009;32(1):83–90.

de la Perrelle L, Radisic G, Cations M, Kaambwa B, Barbery G, Laver K. Costs and economic evaluations of Quality Improvement Collaboratives in healthcare: a systematic review. BMC Health Serv Res. 2020;20(1):155.

Fortney J, Enderle M, McDougall S, Clothier J, Otero J, Altman L, et al. Implementation outcomes of evidence-based quality improvement for depression in VA community based outpatient clinics. Implement Sci. 2012;7(1):30.

Thursky K, Lingaratnam S, Jayarajan J, Haeusler GM, Teh B, Tew M, et al. Implementation of a whole of hospital sepsis clinical pathway in a cancer hospital: impact on sepsis management, outcomes and costs. BMJ Open Qual. 2018;7(3):e000355-e.

McGrath BA, Lynch J, Bonvento B, Wallace S, Poole V, Farrell A, et al. Evaluating the quality improvement impact of the Global Tracheostomy Collaborative in four diverse NHS hospitals. BMJ Qual Improv Rep. 2017;6(1):bmjqir.u220636.w7996.

Bielaszka-DuVernay C. Innovation profile: redesigning acute care processes in Wisconsin. Health Aff. 2011;30(3):422–5.

Comtois J, Paris Y, Poder TG, Chausse S. The organizational benefits of the Kaizen approach at the Centre Hospitalier Universitaire de Sherbrooke (CHUS). L'approche Kaizen au Centre Hospitalier Universitaire de Sherbrooke (CHUS) : un avantage organisationnel significatif. 2013;25(2):169–77.

Hatcher IB. Reducing sharps injuries among health care workers: a sharps container quality improvement project. Jt Comm J Qual Improv. 2002;28(7):410–4.

PubMed   Google Scholar  

Heitmiller ES, Hill RB, Marshall CE, Parsons BJ, Berkow LC, Barrasso CA, et al. Blood wastage reduction using Lean Sigma methodology. Transfusion. 2010;50(9):1887–96.

Niemeijer GC, Trip A, de Jong LJ, Wendt KW, Does RJ. Impact of 5 years of lean six sigma in a University Medical Center. Qual Manag Health Care. 2012;21(4):262–8.

Strauss R, Cressman A, Cheung M, Weinerman A, Waldman S, Etchells E, et al. Major reductions in unnecessary aspartate aminotransferase and blood urea nitrogen tests with a quality improvement initiative. BMJ Qual Saf. 2019;28(10):809–16.

van den Heuvel J, Does RJMM, Bogers AJJC, Berg M. Implementing six sigma in the Netherlands. Jt Comm J Qual Patient Saf. 2006;32(7):393–9.

Yamamoto J, Abraham D, Malatestinic B. Improving insulin distribution and administration safety using lean six sigma methodologies. Hosp Pharm. 2010;45(3):212–24.

Article   CAS   Google Scholar  

Gandjour A, Lauterbach KW. Cost-effectiveness of quality improvement programs in health care. Med Klin. 2002;97(8):499–502.

Bridges JFP. Lean systems approaches to health technology assessment: a patient-focused alternative to cost-effectiveness analysis. Pharmacoeconomics. 2006;24:101–9.

Chow-Chua C, Goh M. Framework for evaluating performance and quality improvement in hospitals. Manag Serv Qual: Int J. 2002;12(1):54–66.

Ciarniene R, Vienazindiene M, Vojtovic S. Process improvement for value creation: a case of health care organization. Inzinerine Ekonomika-Engineering Economics. 2017;28(1):79–88.

O'Sullivan Owen P, Chang Nynn H, Baker P, Shah A. Quality improvement at East London NHS Foundation Trust: the pathway to embedding lasting change. Governance IJoH, editor: International Journal of Health Governance; 2020.

Shah A, Course S. Building the business case for quality improvement: a framework for evaluating return on investment. Future Healthc J. 2018;5(2):132–7.

Wood J, Brown B, Bartley A, Margarida Batista Custodio Cavaco A, Roberts AP, Santon K, et al. Reducing pressure ulcers across multiple care settings using a collaborative approach. BMJ Open Qual. 2019;8(3):e000409.

MacVane PF. Chasing the golden fleece: Increasing healthcare quality, efficiency and patient satisfaction while reducing costs. Int J Health Gov. 2019;24(3):182–6.

McLees AW, Nawaz S, Thomas C, Young A. Defining and assessing quality improvement outcomes: a framework for public health. Am J Public Health. 2015;105:S167–73.

Neri RA, Mason CE, Demko LA. Application of Six Sigma/CAP methodology: controlling blood-product utilization and costs. J Healthcare Manag/ American College of Healthcare Executives. 2008;53(3):183–6.

Lavoie-Tremblay M, O’Connor P, Biron A, Lavigne GL, Frechette J, Briand A, et al. The effects of the transforming care at the bedside program on perceived team effectiveness and patient outcomes. Health Care Manag. 2017;36(1):10–20.

Williams B, Hibberd C, Baldie D, Duncan EAS, Elders A, Maxwell M, et al. Evaluation of the impact of an augmented model of The Productive Ward: Releasing Time to Care on staff and patient outcomes: a naturalistic stepped-wedge trial. BMJ Qual Saf. 2020;30:27–37.

Care Qualty Commission (CQC). Quality improvement in hospital trusts: sharing learning from trusts on a journey of QI, C.Q. Commission, Editor. 2018.

Bevan H, Plsek P, Winstanley L. Part 1: Leading large scale change: a practical guide What the NHS Academy for Large Scale Change learnt and how you can apply these principles within your own health and healthcare setting. In: Improvement NIfIa, editor. NHS Academy for Large Scale Change. 2011.

The Healthcare Foundation. Safer Patients Initiative: Lessons from the first major improvement programme addressing patient safety in the UK, T.H. Foundation, Editor. 2011.

Hunter DJ, Erskine J, Hicks C, McGovern T, Small A, Lugsden E, et al. Health Services and Delivery Research, in A mixed-methods evaluation of transformational change in NHS North East. 2014. NIHR Journals Library.

NHS Insitute. The Productive Ward: Releasing time to careTM Learning and Impact Review Final report, NHS Institute, Editor. 2011.

Jones B, Horton T, Warburton W. The improvement Journey. The Health Foundation. 2019.

Furukawa PdO, Cunha ICKO, Pedreira MdLG. Avaliação de ações ecologicamente sustentáveis no processo de medicação. Revista Brasileira de Enfermagem. 2016;69(1):23–9.

Staines A, Thor J, Robert G. Sustaining improvement? The 20-year Jonkoping quality improvement program revisited. Qual Manag Health Care. 2015;24(1):21–37.

Crema M, Verbano C. Lean Management to support Choosing Wisely in healthcare: the first evidence from a systematic literature review. Int J Qual Health Care. 2017;29(7):889-95.5.

DelliFraine JL, Langabeer JR 2nd, Nembhard IM. Assessing the evidence of Six Sigma and Lean in the health care industry. Qual Manag Health Care. 2010;19(3):211–25.

Moraros J, Lemstra M, Nwankwo C. Lean interventions in healthcare: do they actually work? A systematic literature review. Int J Qual Health Care. 2016;28(2):150–65.

Power M, Brewster L, Parry G, Brotherton A, Minion J, Ozieranski P, et al. Multimethod study of a large-scale programme to improve patient safety using a harm-free care approach. BMJ Open. 2016;6(9):e011886.

Sibthorpe B, Gardner K, Chan M, Dowden M, Sargent G, McAullay D. Impacts of continuous quality improvement in Aboriginal and Torres Strait islander primary health care in Australia: a scoping systematic review. J Health Organ Manag. 2018;32(4):545–71.

Stephens TJ, Peden CJ, Pearse RM, Shaw SE, Abbott TEF, Jones EL, et al. Improving care at scale: process evaluation of a multi-component quality improvement intervention to reduce mortality after emergency abdominal surgery (EPOCH trial). Implement Sci. 2018;13(1):142.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wells S, Tamir O, Gray J, Naidoo D, Bekhit M, Goldmann D. Are quality improvement collaboratives effective? A systematic review. BMJ Qual Saf. 2018;27(3):226–40.

Collins B, Fenney D. Improving patient safety through collaboration a rapid review of the academic health science networks’ patient safety collaboratives. In: Fund Ks, editor. 2019.

Goodridge D, Rana M, Harrison EL, Rotter T, Dobson R, Groot G, et al. Assessing the implementation processes of a large-scale, multi-year quality improvement initiative: survey of health care providers. BMC Health Serv Res. 2018;18:237.

White M, Wells JS, Butterworth T. The transition of a large-scale quality improvement initiative: a bibliometric analysis of the Productive Ward: Releasing Time to Care Programme. J Clin Nurs. 2014;23(17–18):2414–23.

Worrall A, Ramsay A, Gordon K, Maltby S, Beecham J, King S, et al. Evaluation of the Mental Health Improvement Partnerships programme. National Co-ordinating Centre for NHS Service Delivery and Organisation R&D (NCCSDO). 2008.

de Miranda Costa MM, Santana HT, Saturno Hernandez PJ, Carvalho AA, da Silva Gama ZA. Results of a national system-wide quality improvement initiative for the implementation of evidence-based infection prevention practices in Brazilian hospitals. J Hosp Infect. 2020;105(1):24–34.

Morrow E, Robert G, Maben J, Griffiths P. Implementing large-scale quality improvement: lessons from The Productive Ward: Releasing Time to Care. Int J Health Care Qual Assur. 2012;25(4):237–53.

Brink AJ, Messina AP, Feldman C, Richards GA, van den Bergh D, Netcare AS. From guidelines to practice: a pharmacist-driven prospective audit and feedback improvement model for peri-operative antibiotic prophylaxis in 34 South African hospitals. J Antimicrob Chemother. 2017;72(4):1227–34.

CAS   PubMed   Google Scholar  

Morganti KG, Lovejoy S, Haviland AM, Haas AC, Farley DO. Measuring success for health care quality improvement interventions. Med Care. 2012;50(12):1086–92.

Benning A, Ghaleb M, Suokas A, Dixon-Woods M, Dawson J, Barber N, et al. Large scale organisational intervention to improve patient safety in four UK hospitals: mixed method evaluation. BMJ. 2011;342(7793):369.

Honda AC, Bernardo VZ, Gerolamo MC, Davis MM. How lean six sigma principles improve hospital performance. Qual Manag J. 2018;25(2):70–82.

Pearson M, Hemsley A, Blackwell R, Pegg L, Custerson L. Improving Hospital at Home for frail older people: insights from a quality improvement project to achieve change across regional health and social care sectors. BMC Health Serv Res. 2017;17:387.

Masso M, Robert G, McCarthy G, Eagar K. The Clinical Services Redesign Program in New South Wales: perceptions of senior health managers. Aust Health Rev. 2010;34(3):352–9.

Robert G, Sarre S, Maben J, Griffiths P, Chable R. Exploring the sustainability of quality improvement interventions in healthcare organisations: a multiple methods study of the 10-year impact of the ‘Productive Ward: Releasing Time to Care’ programme in English acute hospitals. BMJ Qual Saf. 2020;29(1):31.

Beers LS, Godoy L, John T, Long M, Biel MG, Anthony B, et al. Mental Health Screening Quality Improvement Learning Collaborative in Pediatric Primary Care. Pediatrics. 2017;140(6):e20162966.

Bosse G, Abels W, Mtatifikolo F, Ngoli B, Neuner B, Wernecke K-D, et al. Perioperative care and the importance of continuous quality improvement-A controlled intervention study in Three Tanzanian Hospitals. Plos One. 2015;10(9):e0136156.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Botros S, Dunn J. Implementation and spread of a simple and effective way to improve the accuracy of medicines reconciliation on discharge: a hospital-based quality improvement project and success story. BMJ Open Qual. 2019;8(3):e000363-e.

Kanamori S, Sow S, Castro MC, Matsuno R, Tsuru A, Jimba M. Implementation of 5S management method for lean healthcare at a health center in Senegal: a qualitative study of staff perception. Glob Health Action. 2015;8:27256.

Roney JK, Whitley BE, Long JD. Implementation of a MEWS-Sepsis screening tool: transformational outcomes of a nurse-led evidence-based practice project. Nurs Forum. 2016;55(2):144–8.

Schouten LMT, Niessen LW, van de Pas JWAM, Grol RPTM, Hulscher MEJL. Cost-effectiveness of a quality improvement collaborative focusing on patients with diabetes. Med Care. 2010;48(10):884–91.

Sermersheim ER, Moon MC, Streelman M, McCullum-Smith D, Fromm J, Yohannan S, et al. Improving patient throughput with an electronic nursing handoff process in an academic medical center a rapid improvement event approach. J Nurs Adm. 2020;50(3):174–81.

Appleby J. The quest for quality in the NHS: still searching? BMJ. 2005;331(7508):63–4.

Baines R, Langelaan M, de Bruijne M, Spreeuwenberg P, Wagner C. How effective are patient safety initiatives? A retrospective patient record review study of changes to patient safety over time. BMJ Qual Saf. 2015;24(9):561–71.

Clay-Williams R, Nosrati H, Cunningham FC, Hillman K, Braithwaite J. Do large-scale hospital- and system-wide interventions improve patient outcomes: a systematic review. BMC Health Serv Res. 2014;14(1):369.

Dixon-Woods M, Martin GP. Does quality improvement improve quality? Future Hosp J. 2016;3(3):191–4.

Knudsen SV, Laursen HVB, Johnsen SP, Bartels PD, Ehlers LH, Mainz J. Can quality improvement improve the quality of care? A systematic review of reported effects and methodological rigor in plan-do-study-act projects. BMC Health Serv Res. 2019;19(1):683.

Nembhard IM, Alexander JA, Hoff TJ, Ramanujam R. Why does the quality of health care continue to lag? Insights from Management Research. Acad Manag Perspect. 2009;23(1):24–42.

Shortell SM, Bennett CL, Byck GR. Assessing the impact of continuous quality improvement on clinical practice: what it will take to accelerate progress. Milbank Q. 1998;76(4):593–624.

Dixon-Woods M. The problem of context in quality improvement. Perspectives on context London: Health Foundation; 2014. p. 87–101.

McDonald KM, Schultz EM, Chang C. Evaluating the state of quality-improvement science through evidence synthesis: insights from the closing the quality gap series. Perm J. 2013;17(4):52–61.

Farokhzadian J, Nayeri ND, Borhani F. The long way ahead to achieve an effective patient safety culture: challenges perceived by nurses. BMC Health Serv Res. 2018;18:654.

Goodwin VA, Hill JJ, Fullam JA, Finning K, Pentecost C, Richards DA. Intervention development and treatment success in UK health technology assessment funded trials of physical rehabilitation: a mixed methods analysis. BMJ Open. 2019;9(8):e026289.

Irwin R, Stokes T, Marshall T. Practice-level quality improvement interventions in primary care: a review of systematic reviews. Prim Health Care Res Dev. 2015;16(6):556–77.

Canovas JJG, Hernandez PJS, Botella JJA. Effectiveness of internal quality assurance programmes in improving clinical practice and reducing costs. J Eval Clin Pract. 2009;15(5):813–9.

Lighter DE. How (and why) do quality improvement professionals measure performance? Int J Pediatr Adolesc Med. 2015;2(1):7–11.

Braithwaite J. Changing how we think about healthcare improvement. BMJ. 2018;361:k2014.

Haw JS, Narayan KMV, Ali MK. Quality improvement in diabetes–successful in achieving better care with hopes for prevention. Ann N Y Acad Sci. 2015;1353:138–51.

Palmer RH, Louis TA, Peterson HF, Rothrock JK, Strain R, Wright EA. What makes quality assurance effective? Results from a randomized, controlled trial in 16 primary care group practices. Med Care. 1996;34(9):SS29–39.

Rich N, Piercy N. Losing patients: a systems view on healthcare improvement. Prod Plan Control. 2013;24(10–11):962–75.

Brand SL, Thompson Coon J, Fleming LE, Carroll L, Bethel A, Wyatt K. Whole-system approaches to improving the health and wellbeing of healthcare workers: a systematic review. PLoS One. 2017;12(12):e0188418.

Hall LH, Johnson J, Watt I, Tsipa A, O’Connor DB. Healthcare staff wellbeing, burnout, and patient safety: a systematic review. PLoS One. 2016;11(7):e0159015.

Latino RJ. How is the effectiveness of root cause analysis measured in healthcare? J Healthc Risk Manag. 2015;35(2):21–30.

Francis R. Report of the Mid Staffordshire NHS Foundation Trust public inquiry: executive summary: The Stationery Office. 2013.

Jabbal J, Lewis M. Approaches to better value in the NHS Improving quality and cost. King’s Fund. 2018.

Hara JK, Lawton RJ. At a crossroads? Key challenges and future opportunities for patient involvement in patient safety. BMJ Qual Saf. 2016;25(8):565.

Illingworth J. Continuous improvement of patient safety. The case for change in the NHS. The Health Foundation. 2015.

Joly BM, Booth M, Mittal P, Shaler G. Measuring quality improvement in Public Health: the development and psychometric testing of a QI Maturity Tool. Eval Health Prof. 2012;35(2):119–47.

Parast L, Doyle B, Damberg CL, Shetty K, Ganz DA, Wenger NS, et al. Challenges in assessing the process-outcome link in practice. J Gen Intern Med. 2015;30(3):359–64.

Peden CJ, Campbell M, Aggarwal G. Quality, safety, and outcomes in anaesthesia: what’s to be done? An international perspective. Br J Anaesth. 2017;119:I5–14.

Zwijnenberg NC, Hendriks M, Delnoij DMJ, de Veer AJE, Spreeuwenberg P, Wagner C. Understanding and using quality information for quality improvement: the effect of information presentation. Int J Qual Health Care. 2016;28(6):689–97.

Parmelli E, Flodgren G, Beyer F, Baillie N, Schaafsma ME, Eccles MP. The effectiveness of strategies to change organisational culture to improve healthcare performance: a systematic review. Implement Sci. 2011;6(1):33.

Mayne J. Contribution analysis: Addressing cause and effect. Evaluating the complex. 2011. p. 53–96.

Reed JE, McNicholas C, Woodcock T, Issen L, Bell D. Designing quality improvement initiatives: the action effect method, a structured approach to identifying and articulating programme theory. BMJ Qual Saf. 2014;23(12):1040.

National Health Service (NHS). NHS Mental Health Implementation Plan 2019/20 – 2023/24, NHS, Editor. 2019.

Middleton LP, Phipps R, Routbort M, Prieto V, Medeiros LJ, Riben M, et al. Fifteen-year journey to high reliability in pathology and laboratory medicine. Am J Med Qual. 2018;33(5):530–9.

Taylor N, Clay-Williams R, Hogden E, Braithwaite J, Groene O. High performing hospitals: a qualitative systematic review of associated factors and practical strategies for improvement. BMC Health Serv Res. 2015;15(1):244.

Woodhouse KD, Volz E, Maity A, Gabriel PE, Solberg TD, Bergendahl HW, et al. Journey toward high reliability: a comprehensive safety program to improve quality of care and safety culture in a large, Multisite Radiation Oncology Department. J Oncol Pract. 2016;12(5):480.

Zhu Q, Johnson S, Sarkis J. Lean six sigma and environmental sustainability: a hospital perspective. Supply Chain Forum: Int J. 2018;19(1):25–41.

Greenfield D, Iqbal U, Li Y-C. Healthcare improvements from the unit to system levels: contributions to improving the safety and quality evidence base. Int J Qual Health Care. 2017;29(3):313.

McNab D, Bowie P, Morrison J, Ross A. Understanding patient safety performance and educational needs using the “Safety-II” approach for complex systems. Educ Prim Care. 2016;27(6):443–50.

D’Andreamatteo A, Ianni L, Lega F, Sargiacomo M. Lean in healthcare: a comprehensive review. Health Policy. 2015;119(9):1197–209.

Teisberg E, Wallace S, O’Hara S. Defining and implementing value-based health care: a strategic framework. Acad Med. 2020;95(5):682–5.

Wu AW, Johansen KS. Lessons from Europe on quality improvement: report on the Velen Castle WHO meeting. Jt Comm J Qual Improv. 1999;25(6):316–29.

Donabedian A. The quality of care. how can it be assessed? JAMA. 1988;260(12):1743-8.3-8.

Kotter JP. Leading change: why transformation efforts fail. 1995.

Firth-Cozens J. Cultures for improving patient safety through learning: the role of teamwork. BMJ Qual Saf. 2001;10(suppl 2):ii26–31.

Piper D, Lea J, Woods C, Parker V. The impact of patient safety culture on handover in rural health facilities. BMC Health Serv Res. 2018;18(1):1–13.

Auschra C. Barriers to the integration of care in inter-organisational settings: a literature review. Int J Integr Care. 2018;18(1):5.

Lan Y, Chandrasekaran A, Goradia D, Walker D. Collaboration structures in integrated healthcare delivery systems: an exploratory study of accountable care organizations. Manufacturing & Service Operations Management. 2022.

Niemsakul J, Islam SM, Singkarin D, Somboonwiwat T. Cost-benefit sharing in healthcare supply chain collaboration. Int J Logist Syst Manag. 2018;30(3):406–20.

Aunger JA, Millar R, Greenhalgh J, Mannion R, Rafferty A-M, McLeod H. Why do some inter-organisational collaborations in healthcare work when others do not? A realist review. Syst Rev. 2021;10(1):82.

Download references

Acknowledgements

The authors would like to Dr Kia-Chong Chua, King's College London, UK for his very insightful contribution to the process and analysis of the review.

This work is supported by the Economic and Social Research Council, grant number ES/P000703/1. 

Author information

Authors and affiliations.

King’s College London, London, UK

S’thembile Thusini, Maria Milenova, Tayana Soukup & Claire Henderson

South London and Maudsley NHS Foundation Trust, London, UK

Noushig Nahabedian & Barbara Grey

You can also search for this author in PubMed   Google Scholar

Contributions

Two reviewers ST and MM worked independently under the guidance of senior co-author CH. MM reviewed 5% of articles from search to synthesis, and ST 100% of all stages. Agreement in the co-review stages was over 90%. ST completed the synthesis and analysis of the review. Any disagreements were discussed with NN, BG, TS, and CH. ST wrote the manuscript, compiled all the tables and figures in this manuscript. All authors advised, reviewed, and approved the development of this manuscript, its tables, and figures.

Authors’ information

S'thembile Thusini is a PhD student, with the Health Service and Population Research Dept, Institute of Psychiatry, Psychology and Neuroscience, King's College London.

Corresponding author

Correspondence to S’thembile Thusini .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

TS received funding from Cancer Alliance and Health Education England for training cancer multidisciplinary teams (MDTs) in assessment and quality improvement methods in the United Kingdom. TS received consultancy fees from Roche Diagnostics. The other authors declare that they have no competing interests. TS research is supported by the Welcome Trust (219425/Z/19/Z) and Diabetes UK (19/0006055).

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: supplementary table 1..

Example search strategy. Supplementary Table 2. Data extraction tool. Supplementary Table 3. Included studies. Supplementary Table 4. Summary of Quality assessment. Links Current study PRISMA Checklist. Search strategies. Data extraction tool. Excluded studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Thusini, S., Milenova, M., Nahabedian, N. et al. Identifying and understanding benefits associated with return-on-investment from large-scale healthcare Quality Improvement programmes: an integrative systematic literature review. BMC Health Serv Res 22 , 1083 (2022). https://doi.org/10.1186/s12913-022-08171-3

Download citation

Received : 31 January 2022

Accepted : 08 June 2022

Published : 24 August 2022

DOI : https://doi.org/10.1186/s12913-022-08171-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • QI programmes
  • Costs and benefits
  • QI business case

BMC Health Services Research

ISSN: 1472-6963

research methodology systematic review

Autistic Camouflaging and its Relationship with Mental Health: Systematic Review

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Camouflaging (also known as masking) refers to strategies used by autistic individuals to mask or hide their autistic characteristics in social situations to fit in or avoid negative reactions from others. Examples include forced eye contact, mimicking others’ social behaviors, and suppressing repetitive movements. While camouflaging may help autistic people navigate social situations, it often comes at a cost to their mental health and sense of authenticity. Camouflaging is thought to be more common in autistic females and may contribute to missed or late diagnosis. Understanding camouflaging is important for improving recognition and support for autistic individuals’ needs.

illustration of a person's silhouette partly obscured with a large cloud containing question marks.

This mixed methods systematic review synthesized qualitative and quantitative research on psychosocial factors associated with camouflaging and its relationship with mental well-being in autistic and non-autistic people.

Seven main themes were identified relating to psychosocial correlates and consequences of camouflaging:

  • Social norms and pressures drive camouflaging
  • Camouflaging is used to gain social acceptance and avoid rejection
  • Self-esteem and identity influence camouflaging
  • Camouflaging has some practical benefits for functioning
  • Camouflaging leads to difficulties being overlooked
  • Camouflaging negatively impacts relationships
  • Camouflaging causes identity confusion and low self-esteem

More research with diverse participants is needed to better understand psychosocial influences on camouflaging. The findings call for a whole society approach to increase acceptance of autistic people.

Camouflaging involves autistic individuals hiding their autistic characteristics in social situations , often to fit in and avoid stigma (Hull et al., 2017).

While a rapidly growing area of research, most studies have focused on the experiences of a narrow demographic – White, female, late-diagnosed autistic adults with average to above average abilities (Cook et al., 2021; Libsack et al., 2021).

Though camouflaging enables some to achieve social and functional goals (Hull et al., 2017; Livingston et al., 2019), it has been consistently associated with poorer mental health (Beck et al., 2020; Cassidy et al., 2018).

Qualitative accounts suggest various psychosocial factors may motivate camouflaging and explain its mental health impact, such as societal stigma and the desire for belonging (Cage & Troxell-Whitman, 2019; Cook et al., 2021). However, no review has systematically examined psychosocial influences on camouflaging and well-being.

This mixed-methods systematic review aimed to critically synthesize qualitative and quantitative research on psychosocial factors associated with camouflaging and its relationship with mental well-being in autistic and non-autistic people.

Understanding psychosocial motivations for camouflaging could inform support to promote more adaptive camouflaging and authentic self-expression, contributing to better mental health for autistic people.

This review followed PRISMA guidelines. Six databases were searched, and backward citation searching and expert consultations were conducted.

A thematic synthesis was conducted, where data were categorized and pooled together based on similar meanings.

Quantitative data were first transformed into themes and textual descriptions to allow integration with qualitative data. Then, line-by-line coding of findings was completed, codes were organized into descriptive themes, and analytical themes were developed.

Codes and themes were iteratively discussed among the research team, which included academics and two autistic advisors who provided input on the data synthesis and interpretation.

The first author conducted coding using NVivo software. A sample of 30 data extracts was independently coded by the last author using the finalized thematic map, demonstrating very good (80%) inter-rater agreement.

58 studies (40 qualitative, 13 quantitative, 5 mixed methods) were included, encompassing 4808 autistic and 1780 non-autistic participants.

Participants were predominantly White (85.9%), female (61.8%), and late-diagnosed (mean age of diagnosis 30.47 years) autistic adults with likely average to above average intellectual/verbal abilities.

Seven themes relating to psychosocial correlates and consequences of camouflaging were identified:
  • Social norms and pressures drive camouflaging. Participants faced expectations to conform to neurotypical norms and experienced stigma for autistic differences.
  • Camouflaging is used to gain social acceptance and avoid rejection. Camouflaging was a protective response to bullying. The desire for belonging was a key motivation.
  • Self-esteem and identity influence camouflaging. Internalized stigma motivated camouflaging, while self-acceptance reduced the perceived need to camouflage.
  • Camouflaging has some practical benefits for functioning. It enabled everyday functioning and impression management.
  • Camouflaging leads to difficulties being overlooked. Camouflaging resulted in participants’ needs being unmet, delayed diagnosis, and autistic burnout.
  • Camouflaging negatively impacts relationships. While camouflaging facilitated social connections, relationships felt inauthentic.
  • Camouflaging causes identity confusion and low self-esteem. Extensive camouflaging eroded participants’ sense of self.

The themes highlight the bidirectional influences between the individual and environment in camouflaging.

Camouflaging emerged as a largely socially-motivated yet self-reinforcing response that comes with serious costs to authenticity and mental well-being.

This review provides a novel and comprehensive synthesis of psychosocial factors implicated in camouflaging.

The findings indicate that camouflaging arises from the dynamic interplay between the individual and their social environment, challenging purely individual-focused explanations.

Autistic individuals’ camouflaging efforts were driven by societal pressures, stigma, and the need for acceptance. Over time, repeated exposure to adverse social experiences led them to anticipate rejection and develop camouflaging as a learned response.

Furthermore, internalizing stigmatizing narratives motivated individuals to mask their differences.

However, camouflaging often had the unintended effect of leaving stigma unchallenged while increasing internalized stigma. It also resulted in autistic people’s needs being overlooked and unmet.

Additionally, while the desire for social connections drove camouflaging, participants felt that the relationships formed through it were inauthentic. These “double binds” made it difficult for individuals to break out of the camouflaging cycle.

The findings call for a shift from changing the individual to fostering more inclusive environments.

A whole-society approach is needed to increase understanding and acceptance of autism, thus reducing pressures on autistic people to camouflage.

Encouragingly, participants described reducing their camouflaging when experiencing self-acceptance, often facilitated by their autism diagnosis and connections with the autistic community.

This review extends previous research by providing an in-depth examination of psychosocial influences on camouflaging.

Future studies should empirically test the conceptual model presented and prioritize diverse participant representation.

Additionally, research examining the influence of everyday psychosocial experiences on camouflaging can provide insight into how autistic individuals navigate camouflaging in daily life.

This study had several methodological strengths, including:
  • Rigorous mixed methods approach enabled a rich, comprehensive understanding of psychosocial factors in camouflaging
  • Utilized participatory methods by involving two autistic advisors who provided input on data synthesis and interpretation
  • Developed a novel conceptual model of psychosocial correlates and consequences of camouflaging that can guide future research
  • Highlighted critical gaps in current research, such as the lack of diverse participant representation

Limitations

Despite its strengths, there are several limitations of this study, including:
  • Overrepresentation of White, female, late-diagnosed autistic adults with average to above average cognitive abilities limits the generalizability of findings
  • Restricted to English-language articles and participants mainly from Western societies introduces potential language and cultural bias
  • Inadequate reporting of key demographics (e.g. race/ethnicity, education) in most included studies further limits generalizability
  • As a systematic review, cannot determine causal relationships between variables

Implications

The findings have important implications for increasing awareness, acceptance, and support for autistic people:
  • Highlights the need for anti-stigma interventions and a shift towards accommodating rather than pathologizing autistic differences
  • Professionals and educators should have a greater understanding of camouflaging to improve recognition of autistic people’s needs
  • Diagnostic processes should account for camouflaging behaviors, especially for females and late-diagnosed individuals
  • Psychosocial supports focused on strengthening autistic identity and community connections may enhance authenticity and well-being

However, as most included studies involved a narrow participant demographic, more research is needed to understand the relevance of findings for underrepresented groups, including racial/ethnic minorities, males, gender diverse individuals, those with intellectual disability, and people from non-Western cultures.

Additionally, prospective studies are required to establish directional relationships.

Primary reference

Zhuang, S., Tan, D. W., Reddrop, S., Dean, L., Maybery, M., & Magiati, I. (2023). Psychosocial factors associated with camouflaging in autistic people and its relationship with mental health and well-being: A mixed methods systematic review.  Clinical Psychology Review, 105,  1–16.  https://doi.org/10.1016/j.cpr.2023.102335

Other references

Beck, J. S., Lundwall, R. A., Gabrielsen, T., Cox, J. C., & South, M. (2020). Looking good but feeling bad: “Camouflaging” behaviors and mental health in women with autistic traits. Autism, 24 (4), 809–821. https://doi.org/10.1177/1362361320912147

Cage, E., & Troxell-Whitman, Z. (2019). Understanding the reasons, contexts and costs of camouflaging for autistic adults. Journal of Autism and Developmental Disorders, 49 (5), 1899–1911. . https://doi.org/10.1007/s10803-018-03878-x

Cassidy, S., Bradley, L., Shaw, R., & Baron-Cohen, S. (2018). Risk markers for suicidality in autistic adults. Molecular Autism, 9 (1), 42. https://doi.org/10.1186/s13229-018-0226-4

Cook, J., Crane, L., Bourne, L., Hull, L., & Mandy, W. (2021). Camouflaging in an everyday social context: An interpersonal recall study. Autism, 25 (5), 1444–1456. https://doi.org/10.1177/1362361321992641

Hull, L., Petrides, K. V., Allison, C., Smith, P., Baron-Cohen, S., Lai, M. C., & Mandy, W. (2017). “Putting on my best normal”: Social camouflaging in adults with autism spectrum conditions. Journal of Autism and Developmental Disorders, 47(8), 2519–2534. https://doi.org/10.1007/s10803-017-3166-5

Libsack, E. J., Keenan, E. G., Freden, C. E., Mirmina, J., Iskhakov, N., Krishnathasan, D., & Lerner, M. D. (2021). A systematic review of passing as non-autistic in autism spectrum disorder. Clinical Child and Family Psychology Review, 24(4), 783–812. https://doi.org/10.1007/s10567-021-00365-1

Livingston, L. A., Shah, P., & Happé, F. (2019). Compensatory strategies below the behavioral surface in autism: A qualitative study. The Lancet Psychiatry, 6(9), 766–777. https://doi.org/10.1016/s2215-0366(19)30224-x

Keep Learning

Here are some reflective questions related to this study that could prompt further discussion:
  • How might the social environment be changed to reduce pressures on autistic people to camouflage?
  • What kinds of supports would be most helpful for autistic individuals in managing decisions around camouflaging versus authentic self-expression?
  • How can mental health professionals, educators, and workplaces create environments where autistic people feel psychologically safe to unmask?
  • What are respectful ways for non-autistic people to respond when an autistic person discloses their diagnosis?
  • How might camouflaging present differently across cultures? What unique challenges might autistic people with multiple marginalized identities face?

Print Friendly, PDF & Email

  • Open access
  • Published: 31 March 2024

MicroRNA expression as a prognostic biomarker of tongue squamous cell carcinoma (TSCC): a systematic review and meta-analysis

  • Yiwei Sun 1   na1 ,
  • Yuxiao Li 2   na1 ,
  • Wenjuan Zhou 3 , 4 , 5 &
  • Zhonghao Liu 3 , 4 , 5  

BMC Oral Health volume  24 , Article number:  406 ( 2024 ) Cite this article

95 Accesses

Metrics details

Recent studies have indicated that microRNA (miRNA) expression in tumour tissues has prognostic significance in Tongue squamous cell carcinoma (TSCC) patients. This study explored the possible prognostic value of miRNAs for TSCC based on published research.

A comprehensive literature search of multiple databases was conducted according to predefined eligibility criteria. Data were extracted from the included studies by two researchers, and HR results were determined based on Kaplan‒Meier curves according to the Tierney method. The Newcastle‒Ottawa Scale (NOS) and GRADE (Grading of Recommendations Assessment, Development, and Evaluation) pro-GDT were applied to assess the quality of all studies. Publication bias was estimated by funnel plot, Egger’s rank correlation test and sensitivity analysis.

Eleven studies (891patients) were included, of which 6 reported up-regulated miRNAs and 7 mentioned down-regulated miRNAs. The pooled hazard ratio (HR) from the prognostic indicator overall survival (OS) was 1.34 (1.25–1.44), p  < 0.00001, indicating a significant difference in miRNA expression between TSCC patients with better or worse prognosis.

MiRNAs may have high prognostic value and could be used as prognostic biomarkers of TSCC.

Peer Review reports

Introduction

Oral squamous cell carcinoma (OSCC) accounts for approximately 90% of all oral malignancies, resulting in the highest morbidity rates in the head and neck region worldwide [ 1 , 2 , 3 ]. When compared with OSCC at other sites, tongue squamous cell carcinoma (TSCC), a principal subtype of oral cancer, represents 17.8% of occurrences and shares similarities in etiopathogenesis related to early regional lymph node metastasis [ 4 , 5 , 6 , 7 ] and malignant proliferation [ 8 , 9 ]. Notably, despite the application of advanced therapies, the 5-year survival rate and quality of life for TSCC patients are not promising [ 1 , 9 ]. Compared with the overt symptoms of advanced stage TSCC, assessments such as tissue biopsy are advantageous for identifying patients with early-stage disease and, as a result, increase the chances for better treatment and recovery [ 10 , 11 , 12 ]. Thus, in accordance with the World Health Organization (WHO) recommendations for the primacy of early diagnosis and prevention, measures should be taken to achieve early discovery and diagnosis of TSCC with the purpose of prolonging and improving the quality of patients’ lives [ 10 , 13 ].

According to current clinical practice, the present detection methods for TSCC include but not limited to brush biopsy, CT and MRI scanning, and tissue autofluorescence. Even though some of these applications are widely performed in routine clinical practice, there exist certain limitations, such as labour intensiveness, invasiveness and low sensitivity, that should not be ignored. Recent researches indicate that part of the genetic alterations for TSCC contain progressive change in DNA methylation, overexpression of carcinoembryonic antigen, histone modification and expression level alteration of miRNAs, etc. As a result, the epigenetic alterations, including DNA fragments in saliva [ 14 , 15 ], immune-related gene transcript [ 16 ], the neutrophil-to‐lymphocyte ratio [ 17 ], platelet-to-lymphocyte ratio [ 18 ] and miRNA expression [ 19 ], could be used as the promising biomarkers for early-stage detection for prognosis, which bridge the defect of traditional Tumor-Node-Metastasis (TNM) staging system [ 20 ]. In recent years, with the purpose of finding more reliable diagnostic techniques, research based on microRNA molecular mechanisms has become an important academic focus [ 10 ]. Studies have reported connections between miRNAs and predicting the prognosis of TSCC, as changes in miRNA expression can reflect the stage of a tumour to some extent.

MicroRNAs (miRNAs) are sophisticated epigenetic molecular markers linked to TSCC patient prognosis. Intrinsically, miRNAs are endogenous, noncoding, single-stranded RNA molecules that are approximately 20–24 nucleotides long and renowned for their conserved gene sequence and distinct expression patterns [ 21 , 22 , 23 , 24 ]. Generally, miRNAs are characterized by high gene sequence conservation, temporal expression specificity and tissue expression specificity [ 25 ]. By mediating dysregulation and translational repression, miRNAs regulate 30% of gene expression in the posttranscriptional stage [ 24 , 25 ]. During the biological process of TSCC deterioration, miRNAs partly act as oncogenes or suppressor genes in carcinogenesis by dysregulation, as well as having an indirect impact on the expression of proto-oncogenes and cancer suppressor gene [ 26 ]. Followed by that, miRNA deteriorate tumour cell proliferation by escaping normal suppressing signals, expediting malignant cell migration and stimulating angiogenesis in tumours [ 21 , 22 , 24 , 25 ].

Experiments based on miRNA as a biomarker of malignancy prognosis have made great progress in recent years. However, due to the comparatively high cost and complexity of the process, this technique has not been put into clinical use on a large scale. Distinct from existing meta-analyses of TSCC, this manuscript offers a fresh perspective, bridging the gap between the academic value of miRNAs and their usefulness in clinical applications. This meta-analysis aimed to consolidate and elucidate the prognostic significance of miRNAs in TSCC, paving the way for innovative clinical interventions to enhance patient longevity and quality of life.

Materials and methods

Protocol and registration.

Following the Preferred Reporting Items for Systematic Review and Meta-analyses (PRISMA) guidelines [ 27 ], researchers executed a systematic review encompassing aspects such as protocol, inclusion criteria, search strategy, and outcomes. This systematic review has been catalogued on PROSPERO under the registration number CRD 42,023,391,953.

Eligibility criteria

Researchers focused on all retrospective cohort studies evaluating the association between variations in miRNA expression and prognostic survival metrics of TSCC. The criteria were based on the PICO elements: participants (patients diagnosed with TSCC), intervention (deregulated miRNA levels in TSCC patients), control (TSCC patients with normal miRNA expression levels), and outcome (prognosis differences in TSCC patients based on miRNA expression variance). Consequently, the PICO question was as follows: is there a difference in prognostic survival indexes between TSCC patients with dysregulation of miRNA expression and those with normal miRNA expression?

The inclusion criteria were studies that reported interrelated prognostic survival indexes, including hazard ratios (HRs), Kaplan–Meier curves and univariate or multivariate Cox regression between TSCC patients with low and high miRNA expression levels.

Regarding the exclusion criteria, non-English studies and those without prognostic survival indexes from which HRs could be extracted or inferred were excluded. Case reports, meta-analyses, systematic reviews, historical reviews and studies at high risk of bias were excluded. Studies that did not elaborate clinical outcomes or that reported content that was not relevant to clinical patients were likewise excluded.

Sources of information, research and selection

For reference inclusion, relevant articles were identified through literature retrieval from online databases. Specifically, non-English articles were excluded. Databases, including PubMed, Web of Science, Scopus, EMBASE, Cochrane Library and Google Scholar, were searched to retrieve relevant articles focusing on miRNA dysregulation in the prognosis of TSCC.

A search strategy combining the terms (Tongue Squamous Cell Cancer OR TSCC OR neoplasm OR cancer OR malignancy OR malignant neoplasm OR malignant neoplasms OR neoplasia) AND (prognosis OR prognostic factor OR prognostic factors) AND (MicroRNAs OR Micro RNA OR MicroRNA OR Primary MicroRNA OR Primary miRNA OR RNA, Small Temporal OR Small Temporal RNA OR miRNA OR miRNAs OR pre-miRNA OR pri-miRNA OR stRNA) was developed. Moreover, identified articles were scrutinized separately by reviewing the content of abstracts and full texts for eligible articles meeting the inclusion criteria for this meta-analysis.

Data collection process and data characteristics

Investigators extracted data from studies that met the inclusion criteria and subsequently contrasted, collated and adopted the data mentioned above. If distinctions in data existed, one researcher expressed opinions that played a critical part in data selection. Data extracted from studies that met the inclusion criteria included Kaplan–Meier curves, first author and date, country, study design, number of patients, cut-off between low and high expression, miRNA types, miRNA expression levels and HRs of deregulated miRNA expression (OS, DSS, DFS, RFS, PFS). According to the Tierney method, prognostic index HR data, comprising overall survival (OS), disease-specific survival (DSS), disease-free survival (DFS), recurrence-free survival (RFS) and progression-free survival (PFS), were collected and extrapolated from Kaplan–Meier curves [ 28 ].

Risk of bias in individual studies, summary measures, summary of results, risk of bias between studies, and additional measures

Investigators evaluated the risk of bias in the studies respectively while ensuring the accuracy of the assessments. Under the guidance of reference factors derived from the Newcastle‒Ottawa Scale (NOS), each study was graded from 0 to 9. Those scoring above or equal to 7 were considered high-quality articles, while those scoring below 7 were eliminated from this meta-analysis. Forest plots were used to graphically illustrate the results mentioned above, and inconsistency indexes involving the Higgins index I² were used for evaluation. Software STATA 15.1 and Reviewer Manager 5.4 were used for statistical analysis. Collected data were aggregated to obtain a pooled sensitivity and specificity of OS, DSS, DFS, RFS and PFS. The risk of bias between the studies was assessed graphically through analysis of overlaps of the confidence intervals through the I² inconsistency index (an I² value greater than 50% was statistically significant, and a fixed-effect model was applied in specific cases). By using a funnel plot and Egger’s rank correlation test, publication bias was comprehensively estimated [ 29 ]. If the p value of Egger’s test was N  > 0.05 and the funnel plot was symmetrical, there was no obvious publication bias in this meta-analysis. Subsequently, we carried out sensitivity analysis to accurately judge the heterogeneity of the selected studies.

Study characteristics

A total of 403 bibliographic citations were initially identified, 240 were excluded due to repetition, and 140 others were excluded according to the eligibility criteria. Fourteen studies reported prognostic data for OSCC only and thus were excluded. Moreover, another 3 articles from Google Scholar that met the inclusion criteria were additionally included in the meta-analysis. In the end, 12 articles were included in our study and 11 of these articles were included in this meta-analysis (Fig.  1 ).

figure 1

Flow diagram of studies selection process according to PRISMA guidelines

Data characteristics

According to the characteristics of the extracted data mentioned in the Data Collection Process and Data Characteristics section, all the extracted data are shown in Table  1 . All 12 included retrospective clinical studies with a follow-up period of 3 to 15 years and a total of 970 TSCC patients were published from 2009 to 2020. The studies by Jia et al. (2013) [ 30 ], Supic et al. (2018) [ 31 ], Zheng et al. (2016) [ 32 ], Jia et al. (2014) [ 33 ], Li et al. (2009) [ 34 ], Maruyama et al. (2018) [ 35 ], Xie et al. (2016) [ 36 ], Berania et al. (2017) [ 37 ], Jia et al. (2015) [ 38 ], W. Chen et al. (2019) [ 39 ], and S. Chen et al. (2019) [ 40 ] reported OS as a prognostic parameter; Kawakita et al. (2014) [ 41 ]used DSS; PFS was selected as another prognostic indicator by Berania et al. (2017) [ 37 ], and Maruyama et al. (2018) [ 35 ] regarded DFS as a prognostic index, while RFS was indicated by Supic et al. (2018) [ 31 ].

Risk of bias within studies

The risk of bias was assessed through factors derived from the NOS. According to the NOS guidelines, each study was graded from 0 to 9, and those scoring above or equal to 7 were considered high-quality studies (Table  2 ).

Additionally, GRADE pro-GDT was used to assess the quality of OS prognostic outcome (Table  3 ). The results indicated that the quality of the evidence is critical for the outcome.

Correlation of miRNA expression with survival outcomes

HR extracted from the included studies was the primary index for assessing TSCC patients with up-regulated and down-regulated miRNA expression. A fixed model was chosen to calculate the pooled HR value (95% CI).

The prognostic outcome OS from different miRNA expression levels was represented by a forest plot (Fig.  2 ), showing a pooled HR of 1.34 (1.25–1.44); heterogeneity of Chi 2  = 13.90; df = 12 ( p  = 0.31); I 2  = 14%; and test for the overall effect of Z = 8.01 ( p  < 0.00001).

figure 2

Forest plot of the OS prognostic outcome of the meta-analysis

Moreover, even if the prognostic outcomes of DFS PFS RFS DSS were excluded from our meta-analysis in reason of only one study was included in each group, these outcomes still indicated great statistical significance in demonstrating miRNA as a plausible prognostic biomarker for TSCC patients.

DFS was regarded as a prognostic indicator, the HR was 3.06 (1.32–7.12) with a test for the overall effect of Z = 0.19 ( p  = 0.009) (Fig.  3 ).

figure 3

Forest plot of DFS outcome

PFS was considered as a prognostic index, whose data were collected from the study, the HR was 2.31 (1.28,4.17), and the test for the overall effect was Z = 2.78 ( p  = 0.005) (Fig.  4 ).

figure 4

Forest plot of PFS outcome

The HR of another prognostic indicator RFS was 1.31 (0.97,1.78), and the test for the overall effect was Z = 1.74 ( p  = 0.08) (Fig.  5 ).

figure 5

Forest plot of RFS outcome

The HR of DSS prognostic indicator was 1.08 (0.87–1.34) and a test for the overall effect was Z = 0.69 ( p  = 0.49) (Fig.  6 ).

figure 6

Forest plot of DSS outcome

Heterogeneity and sensitivity analysis

There was no significant heterogeneity in this meta-analysis. To ascertain the effect of each included study on the pooled HR, we conducted a sensitivity analysis. With each study excluded successively, the remaining studies were meta-analysed, and the results of pooled HRs were observed. Eventually, the results suggested that no study had an influence on the pooled HR, which means that the results of the meta-analysis were statistically stable (Fig.  7 ).

figure 7

Outcome of sensitivity analysis to assess the heterogeneity in this meta-analysis

Publication Bias

A funnel plot and Egger’s test were conducted to assess publication bias. According to the figure displayed, there was no visible asymmetry shown by the funnel plot (Fig.  8 ). To be more accurate, Egger’s test was performed to assess publication bias, and the results showed p  = 0.063 (≥ 0.05). Thus, there was no statistically significant indication of publication bias, which verified that our results were relatively stable (Fig.  9 ).

figure 8

Funnel plot for the main outcome in the fixed-effects model

figure 9

Egger’s test conducted to estimate publication bias; p  = 0.063 (≥ 0.05)

Of the different OSCC subtypes, TSCC has the highest morbidity rate [ 1 ], with a low 5-year survival rate despite advancements in treatment during recent years [ 2 , 4 ]. To extend and improve the quality of patients’ lives, studies have investigated novel indicators targeting tumorigenesis and tumour metastasis, including but not limited to centromeric probes, gene expression profiling and standard karyotyping [ 5 ]. Among these possible methods, miRNAs providing regulatory mechanisms based on their expression level have become a hotspot among academic research. Previous studies confirmed down-regulated miRNAs (miR-26a, miR-137, miR-139-5p, miR-143-3p, miR-184, miR-375, miR-32, miR-20a, miR-16 and miR-125b) and up-regulated miRNAs (miR-21, miR-31, miR-134, miR-155, miR-196a, miR-196b, miR-218, miR-455-5p, miR-372 and miR-373) in TSCC prognosis [ 42 , 43 , 44 , 45 ]. In general, miRNAs are expected to be advanced epigenetic biomarkers in the prognosis and optimized therapy of TSCC [ 46 , 47 , 48 ].

In this meta-analysis, the researchers included 11 articles and 891 patients. The investigators further summarized and proposed miRNA expression as a prognostic biomarker for TSCC survival. From the researchers’ review, the pooled HR of the prognostic outcome overall survival (OS) was 1.34 (1.25–1.44), elucidating the potential prognostic value of miRNA expression in TSCC patients. This meta-analysis demonstrated the high feasibility of applying miRNA as a prognostic biomarker in TSCC. Nevertheless, further investigation and improvement are needed.

Based on the included studies and forest plots mentioned above, the expression levels of miRNA-18a, miRNA-611, miRNA-21, miRNA-183, and miRNA-196a-5p significantly increased, whereas those of miRNA-548b, miRNA-5787, miRNA-195, miRNA-26a, miRNA-357, and miRNA-320a apparently decreased, revealing that abnormal expression of miRNAs could be considered a signal for poor prognosis. In addition, miRNA-21, miRNA-183 and miRNA-18a have already been suggested in multiple studies as potential prognostic indicators of head and neck squamous cell carcinoma or oral squamous cell carcinoma [ 44 , 45 ]. Furthermore, recent studies have made new progress in considering miRNA-5787 and miRNA-357 as TSCC prognostic biomarkers [ 38 , 39 ]. Altogether, this review offers a new perspective on addressing refractory TSCC prognosis.

With respect to the statistical process, this meta-analysis applied the NOS to assess the quality of selected retrospective cohort studies. According to the criteria of this protocol, 11 included studies scored more than 7 points, which means that they had a low risk of bias and high quality. Pooled HRs (95% CI) of prognostic factors extracted from the forest plot was OS-1.34 (1.25–1.44), which implied low heterogeneity. Additionally, researchers determined that the data collected were credible and accurate enough to demonstrate miRNA as a plausible prognostic biomarker for TSCC patients via sensitivity analysis and Egger’s test.

Nevertheless, the limitations of this meta-analysis should not be ignored. First, it is inevitable that the quantity of patients included is relatively insufficient since supplementary cases could enhance credibility. Second, due to the intrinsic deficiency of the Kaplan‒Meier curve, it is difficult for researchers to verify part of the accuracy of the initial results since extraction based on image capture could not be perfected. Third, due to limited resources in online databases, the cases contained were entirely extracted from only East Asia and American countries, which obviously led to regional limitations. Hence, patients from other nations could not completely benefit from the current results. Therefore, these problems must be scientifically addressed in future studies.

During the information retrieval process, the investigators noted that exosome research, especially involving miRNAs, has drawn much attention from academic communities and has even been experimentally used as diagnostic and prognostic tools in several types of malignant tumors [ 49 , 50 ]. In comparison with other advanced techniques for prognosing TSCC, miRNA expression levels provide an original method for preventing recurrence and evaluating therapeutic effects. Not only could miRNA expression prospectively be used as a biomarker of residual or recrudescent TSCC malignant tissue, but it could also be utilized to make more accurate prognoses prior to clinical manifestations, which has significance for the timely initiation of adequate treatment. In summary, this meta-analysis aimed to contribute to the development of novel treatments and provide persuasive references for practical clinical applications and therapeutic guidance to some extent.

In conclusion, in this meta-analysis, the data extracted from the OS prognostic outcome suggest that alterations, including up-regulation and down-regulation of miRNA expression levels, could be used as promising prognostic factors of TSCC. According to the figures and tables presented above, the down-regulated miRNAs correlated with poor prognosis are miR-195, miR-26a, miR-320a, miR-548b, miR-375 and miR-5787, while the up-regulated miRNAs are miR-183, miR-21, miR-196a-5p, miR-18a, and miR-611. Given the limitations of the existing studies, it is necessary to conduct repetitive studies with more statistical tests and large-scale experiments that include patients from various regions and nations as well as various age groups. Consequently, this meta-analysis could provide references and prerequisites for further clinical trials and therapeutic applications.

Data availability

All data generated or analysed during this study are included in this published article.

Abbreviations

hazard ratio

Oral squamous cell carcinoma

Newcastle‒Ottawa Scale

World Health Organization

overall survival

disease-specific survival

disease-free survival

recurrence-free survival

progression-free survival

Ding L, Fu Y, Zhu N, Zhao M, Ding Z, Zhang X, Song Y, Jing Y, Zhang Q, Chen S, et al. OXTR(High) stroma fibroblasts control the invasion pattern of oral squamous cell carcinoma via ERK5 signaling. Nat Commun. 2022;13(1):5124.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Goldberg M, Manzi A, Birdi A, Laporte B, Conway P, Cantin S, Mishra V, Singh A, Pearson AT, Goldberg ER, et al. A nanoengineered topical transmucosal cisplatin delivery system induces anti-tumor response in animal models and patients with oral cancer. Nat Commun. 2022;13(1):4829.

Li Z, Fu R, Wen X, Zhang L. Network analysis reveals miRNA crosstalk between periodontitis and oral squamous cell carcinoma. BMC Oral Health. 2023;23(1):19.

Huang L, Luo EL, Xie J, Gan RH, Ding LC, Su BH, Zhao Y, Lin LS, Zheng DL, Lu YG. FZD2 regulates cell proliferation and invasion in tongue squamous cell carcinoma. Int J Biol Sci. 2019;15(11):2330–9.

Weatherspoon DJ, Chattopadhyay A, Boroumand S, Garcia I. Oral cavity and oropharyngeal cancer incidence trends and disparities in the United States: 2000–2010. Cancer Epidemiol. 2015;39(4):497–504.

Article   PubMed   PubMed Central   Google Scholar  

Xie N, Wang C, Liu X, Li R, Hou J, Chen X, Huang H. Tumor budding correlates with occult cervical lymph node metastasis and poor prognosis in clinical early-stage tongue squamous cell carcinoma. J Oral Pathol Med. 2015;44(4):266–72.

Article   PubMed   Google Scholar  

Zhao W, Cui Y, Liu L, Qi X, Liu J, Ma S, Hu X, Zhang Z, Wang Y, Li H, et al. Splicing factor derived circular RNA circUHRF1 accelerates oral squamous cell carcinoma tumorigenesis via feedback loop. Cell Death Differ. 2020;27(3):919–33.

Article   CAS   PubMed   Google Scholar  

Li HF, Liu YQ, Shen ZJ, Gan XF, Han JJ, Liu YY, Li HG, Huang ZQ. Downregulation of MACC1 inhibits invasion, migration and proliferation, attenuates cisplatin resistance and induces apoptosis in tongue squamous cell carcinoma. Oncol Rep. 2015;33(2):651–60.

Xi L, Yang Y, Xu Y, Zhang F, Li J, Liu X, Zhang Z, Du Q. The enhanced genomic 6 mA metabolism contributes to the proliferation and migration of TSCC cells. Int J Oral Sci. 2022;14(1):11.

Abu-Ghanem S, Yehuda M, Carmel NN, Leshno M, Abergel A, Gutfeld O, Fliss DM. Elective neck dissection vs observation in early-stage squamous cell carcinoma of the oral tongue with no clinically apparent Lymph node metastasis in the neck: a systematic review and Meta-analysis. JAMA Otolaryngol Head Neck Surg. 2016;142(9):857–65.

Pich O, Bailey C, Watkins TBK, Zaccaria S, Jamal-Hanjani M, Swanton C. The translational challenges of precision oncology. Cancer Cell. 2022;40(5):458–78.

Sun L, Kang X, Wang C, Wang R, Yang G, Jiang W, Wu Q, Wang Y, Wu Y, Gao J, et al. Single-cell and spatial dissection of precancerous lesions underlying the initiation process of oral squamous cell carcinoma. Cell Discov. 2023;9(1):28.

Hussein AA, Forouzanfar T, Bloemena E, de Visscher J, Brakenhoff RH, Leemans CR, Helder MN. A review of the most promising biomarkers for early diagnosis and prognosis prediction of tongue squamous cell carcinoma. Br J Cancer. 2018;119(6):724–36.

Rapado-González Ó, López-Cedrún JL, Lago-Lestón RM, Abalo A, Rubin-Roger G, Salgado-Barreira Á, López-López R, Muinelo-Romay L, Suárez-Cunqueiro MM. Integrity and quantity of salivary cell-free DNA as a potential molecular biomarker in oral cancer: a preliminary study. J Oral Pathol Med. 2022;51(5):429–35.

Zhou Y, Liu Z. Saliva biomarkers in oral disease. Clin Chim Acta. 2023;548:117503.

Chen Y, Feng Y, Yan F, Zhao Y, Zhao H, Guo Y. A Novel Immune-related gene signature to identify the Tumor Microenvironment and Prognose Disease among patients with oral squamous cell carcinoma patients using ssGSEA: a Bioinformatics and Biological Validation Study. Front Immunol. 2022;13:922195.

Mariani P, Russo D, Maisto M, Troiano G, Caponio VCA, Annunziata M, Laino L. Pre-treatment neutrophil-to-lymphocyte ratio is an independent prognostic factor in head and neck squamous cell carcinoma: Meta-analysis and trial sequential analysis. J Oral Pathol Med. 2022;51(1):39–51.

Diana R, Pierluigi M, Dardo M, Claudia A, Rosario R, Luigi L. The prognostic role of pre-treatment platelet-to-lymphocyte ratio in head and neck squamous cell carcinoma. Meta-analysis and trial sequential analysis. J Evid Based Dent Pract. 2023;23(4):101898.

Bigagli E, Locatello LG, Di Stadio A, Maggiore G, Valdarnini F, Bambi F, Gallo O, Luceri C. Extracellular vesicles miR-210 as a potential biomarker for diagnosis and survival prediction of oral squamous cell carcinoma patients. J Oral Pathol Med. 2022;51(4):350–7.

Russo D, Mariani P, Caponio VCA, Lo Russo L, Fiorillo L, Zhurakivska K, Lo Muzio L, Laino L, Troiano G. Development and validation of prognostic models for oral squamous cell carcinoma: a systematic review and appraisal of the literature. Cancers (Basel). 2021;13(22).

Jadhav KB, Nagraj SK, Arora S. miRNA for the assessment of lymph node metastasis in patients with oral squamous cell carcinoma: systematic review and metanalysis. J Oral Pathol Med. 2021;50(4):345–52.

Lee YS, Dutta A. MicroRNAs in cancer. Annu Rev Pathol. 2009;4:199–227.

Vishnoi A, Rani S. miRNA biogenesis and regulation of diseases: an updated overview. Methods Mol Biol. 2023;2595:1–12.

Yete S, Saranath D. MicroRNAs in oral cancer: biomarkers with clinical potential. Oral Oncol. 2020;110:105002.

Budakoti M, Panwar AS, Molpa D, Singh RK, Büsselberg D, Mishra AP, Coutinho HDM, Nigam M. Micro-RNA: the darkhorse of cancer. Cell Signal. 2021;83:109995.

Di Stasio D, Romano A, Boschetti CE, Montella M, Mosca L, Lucchese A. Salivary miRNAs expression in potentially malignant disorders of the oral mucosa and oral squamous cell carcinoma: a pilot study on miR-21, miR-27b, and miR-181b. Cancers (Basel) 2022, 15(1).

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

Tierney JF, Stewart LA, Ghersi D, Burdett S, Sydes MR. Practical methods for incorporating summary time-to-event data into meta-analysis. Trials. 2007;8:16.

Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44(2):512–25.

Jia LF, Wei SB, Gong K, Gan YH, Yu GY. Prognostic implications of micoRNA miR-195 expression in human tongue squamous cell carcinoma. PLoS ONE. 2013;8(2):e56634.

Supic G, Zeljic K, Rankov AD, Kozomara R, Nikolic A, Radojkovic D, Magic Z. miR-183 and miR-21 expression as biomarkers of progression and survival in tongue carcinoma patients. Clin Oral Investig. 2018;22(1):401–9.

Zheng G, Li N, Jia X, Peng C, Luo L, Deng Y, Yin J, Song Y, Liu H, Lu M, et al. MYCN-mediated miR-21 overexpression enhances chemo-resistance via targeting CADM1 in tongue cancer. J Mol Med (Berl). 2016;94(10):1129–41.

Jia LF, Wei SB, Gan YH, Guo Y, Gong K, Mitchelson K, Cheng J, Yu GY. Expression, regulation and roles of miR-26a and MEG3 in tongue squamous cell carcinoma. Int J Cancer. 2014;135(10):2282–93.

Li J, Huang H, Sun L, Yang M, Pan C, Chen W, Wu D, Lin Z, Zeng C, Yao Y, et al. MiR-21 indicates poor prognosis in tongue squamous cell carcinomas as an apoptosis inhibitor. Clin Cancer Res. 2009;15(12):3998–4008.

Maruyama T, Nishihara K, Umikawa M, Arasaki A, Nakasone T, Nimura F, Matayoshi A, Takei K, Nakachi S, Kariya KI, et al. MicroRNA-196a-5p is a potential prognostic marker of delayed lymph node metastasis in early-stage tongue squamous cell carcinoma. Oncol Lett. 2018;15(2):2349–63.

PubMed   Google Scholar  

Xie N, Wang C, Zhuang Z, Hou J, Liu X, Wu Y, Liu H, Huang H. Decreased miR-320a promotes invasion and metastasis of tumor budding cells in tongue squamous cell carcinoma. Oncotarget. 2016;7(40):65744–57.

Berania I, Cardin GB, Clément I, Guertin L, Ayad T, Bissada E, Nguyen-Tan PF, Filion E, Guilmette J, Gologan O, et al. Four PTEN-targeting co-expressed miRNAs and ACTN4- targeting miR-548b are independent prognostic biomarkers in human squamous cell carcinoma of the oral tongue. Int J Cancer. 2017;141(11):2318–28.

Jia L, Huang Y, Zheng Y, Lyu M, Zhang C, Meng Z, Gan Y, Yu G. miR-375 inhibits cell growth and correlates with clinical outcomes in tongue squamous cell carcinoma. Oncol Rep. 2015;33(4):2061–71.

Chen W, Wang P, Lu Y, Jin T, Lei X, Liu M, Zhuang P, Liao J, Lin Z, Li B, et al. Decreased expression of mitochondrial miR-5787 contributes to chemoresistance by reprogramming glucose metabolism and inhibiting MT-CO3 translation. Theranostics. 2019;9(20):5739–54.

Chen S, Zhang J, Sun L, Li X, Bai J, Zhang H, Li T. miR-611 promotes the proliferation, migration and invasion of tongue squamous cell carcinoma cells by targeting FOXN3. Oral Dis. 2019;25(8):1906–18.

Kawakita A, Yanamoto S, Yamada S, Naruse T, Takahashi H, Kawasaki G, Umeda M. MicroRNA-21 promotes oral cancer invasion via the Wnt/β-catenin pathway by targeting DKK2. Pathol Oncol Res. 2014;20(2):253–61.

Al Rawi N, Elmabrouk N, Abu Kou R, Mkadmi S, Rizvi Z, Hamdoon Z. The role of differentially expressed salivary microRNA in oral squamous cell carcinoma. A systematic review. Arch Oral Biol. 2021;125:105108.

Dioguardi M, Caloro GA, Laino L, Alovisi M, Sovereto D, Crincoli V, Aiuto R, Coccia E, Troiano G, Lo Muzio L. Circulating miR-21 as a potential biomarker for the diagnosis of oral Cancer: a systematic review with Meta-analysis. Cancers (Basel) 2020, 12(4).

Jamali Z, Asl Aminabadi N, Attaran R, Pournagiazar F, Ghertasi Oskouei S, Ahmadpour F. MicroRNAs as prognostic molecular signatures in human head and neck squamous cell carcinoma: a systematic review and meta-analysis. Oral Oncol. 2015;51(4):321–31.

Troiano G, Mastrangelo F, Caponio VCA, Laino L, Cirillo N, Lo Muzio L. Predictive prognostic value of tissue-based MicroRNA expression in oral squamous cell carcinoma: a systematic review and Meta-analysis. J Dent Res. 2018;97(7):759–66.

He Q, Chen Z, Cabay RJ, Zhang L, Luan X, Chen D, Yu T, Wang A, Zhou X. microRNA-21 and microRNA-375 from oral cytology as biomarkers for oral tongue cancer detection. Oral Oncol. 2016;57:15–20.

Moratin J, Hartmann S, Brands R, Brisam M, Mutzbauer G, Scholz C, Seher A, Müller-Richter U, Kübler AC, Linz C. Evaluation of miRNA-expression and clinical tumour parameters in oral squamous cell carcinoma (OSCC). J Craniomaxillofac Surg. 2016;44(7):876–81.

Yu X, Li Z. MicroRNA expression and its implications for diagnosis and therapy of tongue squamous cell carcinoma. J Cell Mol Med. 2016;20(1):10–6.

Sano D, Myers JN. Metastasis of squamous cell carcinoma of the oral tongue. Cancer Metastasis Rev. 2007;26(3–4):645–62.

Silverman S Jr. Demographics and occurrence of oral and pharyngeal cancers. The outcomes, the trends, the challenge. J Am Dent Assoc. 2001;132(Suppl):s7–11.

Article   Google Scholar  

Download references

Acknowledgements

This work is funded by the National Natural Science Foundation of China (Grant No. 82201110) and Yantai School-City. Integration Development Project (2023XDRHXMPT11). The authors are grateful to Prof. Chunlei Han for the advice on data analysis, Dr. Zhengrui Li for thesis writing instructions and AJE institution for language editing support.

The National Natural Science Foundation of China (Grant No. 82201110) and Yantai School-City; Integration Development Project (2023XDRHXMPT11).

Author information

Yiwei Sun and Yuxiao Li contributed equally to this work.

Authors and Affiliations

School of Stomatology, Binzhou Medical University, No. 346 The Guanhai Road Yantai, Yantai, Shandong Province, 264003, China

The Second School of Clinical Medicine, Binzhou Medical University, No. 346 The Guanhai Road Yantai, Yantai, Shandong Province, 264003, China

The affiliated Yantai Stomatological Hospital, Binzhou Medical University, Yantai, 264000, China

Wenjuan Zhou & Zhonghao Liu

Yantai Engineering Research Center for Digital Technology of Stomatology, Yantai, 264000, China

Characteristic Laboratories of Colleges and Universities in Shandong Province for Digital Stomatology, Yantai, 264003, China

You can also search for this author in PubMed   Google Scholar

Contributions

Yuxiao Li and Yiwei Sun designed the study and completed the registration. Yuxiao Li conducted data extraction and performed the statistical analysis. Yuxiao Li and Yiwei Sun evaluated the risk of bias assessment and validated the result. Yuxiao Li and Yiwei Sun wrote the manuscript. Wenjuan Zhou and Zhonghao Liu supervised and offered support in language editing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Wenjuan Zhou or Zhonghao Liu .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no conflict of interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sun, Y., Li, Y., Zhou, W. et al. MicroRNA expression as a prognostic biomarker of tongue squamous cell carcinoma (TSCC): a systematic review and meta-analysis. BMC Oral Health 24 , 406 (2024). https://doi.org/10.1186/s12903-024-04182-0

Download citation

Received : 03 November 2023

Accepted : 26 March 2024

Published : 31 March 2024

DOI : https://doi.org/10.1186/s12903-024-04182-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Tongue squamous cell carcinoma

BMC Oral Health

ISSN: 1472-6831

research methodology systematic review

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

  1. The Systematic Review Process

    research methodology systematic review

  2. Systematic reviews

    research methodology systematic review

  3. Before you begin

    research methodology systematic review

  4. A Step by Step Guide for Conducting a Systematic Review

    research methodology systematic review

  5. Home

    research methodology systematic review

  6. Stages of a Systematic Review.

    research methodology systematic review

VIDEO

  1. Introduction to Systematic Review of Research

  2. Systematic Literature Review: An Introduction [Urdu/Hindi]

  3. How to conduct bibliometric analysis? (bibliometric) (meta analysis) (systematic literature review)

  4. Systematic Literature Review and Meta Analysis(literature review)(quantitative analysis)

  5. Systematic Literature Review Technique

  6. RM(1)-How to Write a Literature Review

COMMENTS

  1. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr. Robert Boyle and his colleagues published a systematic review in ...

  2. An overview of methodological approaches in systematic reviews

    1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...

  3. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question.

  4. Steps of a Systematic Review

    Image: https://pixabay.com Steps to conducting a systematic review: PIECES. P: Planning - the methods of the systematic review are generally decided before conducting it. I: Identifying - searching for studies which match the preset criteria in a systematic manner E: Evaluating - sort all retrieved articles (included or excluded) and assess the risk of bias for each included study

  5. Systematic reviews: Brief overview of methods, limitations, and

    CONCLUSION. Siddaway 16 noted that, "The best reviews synthesize studies to draw broad theoretical conclusions about what the literature means, linking theory to evidence and evidence to theory" (p. 747). To that end, high quality systematic reviews are explicit, rigorous, and reproducible. It is these three criteria that should guide authors seeking to write a systematic review or editors ...

  6. The PRISMA 2020 statement: an updated guideline for reporting ...

    The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement ...

  7. Conducting a Systematic Review: A Practical Guide

    A systematic review protocol should describe the rationale, hypothesis, and planned methods of the review, including the research question. A protocol should always be prepared before the review is started and be used as a guide throughout the review. ... Using bibliographic software to appraise and code data in educational systematic review ...

  8. Method Article How-to conduct a systematic literature review: A quick

    Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...

  9. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr Robert Boyle and his colleagues published a systematic review in ...

  10. How to Do a Systematic Review: A Best Practice Guide ...

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to ...

  11. How to Write a Systematic Review: A Narrative Review

    Background. A systematic review, as its name suggests, is a systematic way of collecting, evaluating, integrating, and presenting findings from several studies on a specific question or topic.[] A systematic review is a research that, by identifying and combining evidence, is tailored to and answers the research question, based on an assessment of all relevant studies.[2,3] To identify assess ...

  12. Easy guide to conducting a systematic review

    A systematic review is a type of study that synthesises research that has been conducted on a particular topic. Systematic reviews are considered to provide the highest level of evidence on the hierarchy of evidence pyramid. Systematic reviews are conducted following rigorous research methodology. To minimise bias, systematic reviews utilise a ...

  13. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  14. Introduction

    "A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question. Researchers conducting systematic reviews use explicit methods aimed at minimizing bias, in order to produce more reliable findings that can be used to inform decision ...

  15. Methodology of a systematic review

    A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. ... the methodology is shown in a structured manner to implement a systematic review. Methodology of a systematic review Actas Urol Esp (Engl Ed). 2018 Oct;42(8):499-506. doi: 10.1016/j.acuro ...

  16. Methodology in conducting a systematic review of systematic reviews of

    Following this, the methods in conducting a systematic review of reviews require consideration of the following aspects, akin to the planning for a systematic review of individual studies: sources, review selection, quality assessment of reviews, presentation of results and implications for practice and research.

  17. PDF Conducting a Systematic Review: Methodology and Steps

    Systematic reviews "Systematic reviews seek to collate evidence that fits pre-specified eligibility criteria in order to answer a specific research question. They aim to minimize bias by using explicit, systematic methods documented in advance with a protocol.1" It is important to highlight that a systematic

  18. Getting Started

    A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies.

  19. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  20. Literature review as a research methodology: An overview and guidelines

    A systematic review can be explained as a research method and process for identifying and critically appraising relevant research, as well as for collecting and analyzing data from said research (Liberati et al., 2009). The aim of a systematic review is to identify all empirical evidence that fits the pre-specified inclusion criteria to answer ...

  21. Home

    A systematic review is a literature review that gathers all of the available evidence matching pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods, documented in a protocol, to minimize bias, provide reliable findings, and inform decision-making. ¹.

  22. Guidance for systematic reviews in journal author instructions

    1 INTRODUCTION. Systematic reviews play a vital role in evidence-based medical practice and decision-making [].Given their crucial role in healthcare [] and the increasing number of published systematic reviews [], ensuring their quality is of utmost importance.Nonetheless, systematic reviews with poor-quality search methods are still being published.

  23. Barriers and enablers to the implementation of patient-reported outcome

    Step 1—assessing the quality of included reviews. In the first step, two reviewers will independently assess the methodological quality of the reviews using the JBI Critical Appraisal Checklist for Systematic Reviews and Research Syntheses, presented in Supplementary material 3.We have selected this checklist for its comprehensiveness, applicability to different types of knowledge syntheses ...

  24. Identifying and understanding benefits associated with return-on

    This led to 8 separate review stages. Stage 1; clarifying research question, involved background reading as is discussed in our protocol on PROSPERO. ... 19 quantitative studies, three qualitative studies, eight mixed-methods studies, and eight systematic reviews. Of the 38 studies, 39% reported or addressed 80%-100% of all items required, 43% ...

  25. Autistic Camouflaging and its Relationship with Mental Health

    However, no review has systematically examined psychosocial influences on camouflaging and well-being. This mixed-methods systematic review aimed to critically synthesize qualitative and quantitative research on psychosocial factors associated with camouflaging and its relationship with mental well-being in autistic and non-autistic people.

  26. Systematic Reviews and Meta-analysis: Understanding the Best Evidence

    With the view to address this challenge, the systematic review method was developed. Systematic reviews aim to inform and facilitate this process through research synthesis of multiple studies, enabling increased and efficient access to evidence.[1,3,4] Systematic reviews and meta-analyses have become increasingly important in healthcare settings.

  27. Research in marine accidents: A bibliometric analysis, systematic

    This study identified the evolution and changes of researchers, journals, the major issues, the research methods and the data sources. Sepehri et al. (2022) 110 papers: Systematic literature review; The analysis of the applications and technologies of shipping 4.0; The introduction of the conceptual framework;

  28. MicroRNA expression as a prognostic biomarker of tongue squamous cell

    Protocol and registration. Following the Preferred Reporting Items for Systematic Review and Meta-analyses (PRISMA) guidelines [], researchers executed a systematic review encompassing aspects such as protocol, inclusion criteria, search strategy, and outcomes.This systematic review has been catalogued on PROSPERO under the registration number CRD 42,023,391,953.

  29. Guidance to best tools and practices for systematic reviews

    Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods.

  30. Research Contents, Methods and Prospects of Emotional ...

    There are various types of literature reviews, and this section first analyzes how people react to their surroundings and then provides an in-depth analysis of the research on the evaluation of the built environment using quantitative perception methods through a systematic review and meta-analysis (PRISMA) approach . It systematically reviews ...