• Open access
  • Published: 01 August 2019

A step by step guide for conducting a systematic review and meta-analysis with simulation data

  • Gehad Mohamed Tawfik 1 , 2 ,
  • Kadek Agus Surya Dila 2 , 3 ,
  • Muawia Yousif Fadlelmola Mohamed 2 , 4 ,
  • Dao Ngoc Hien Tam 2 , 5 ,
  • Nguyen Dang Kien 2 , 6 ,
  • Ali Mahmoud Ahmed 2 , 7 &
  • Nguyen Tien Huy 8 , 9 , 10  

Tropical Medicine and Health volume  47 , Article number:  46 ( 2019 ) Cite this article

792k Accesses

291 Citations

92 Altmetric

Metrics details

The massive abundance of studies relating to tropical medicine and health has increased strikingly over the last few decades. In the field of tropical medicine and health, a well-conducted systematic review and meta-analysis (SR/MA) is considered a feasible solution for keeping clinicians abreast of current evidence-based medicine. Understanding of SR/MA steps is of paramount importance for its conduction. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, this methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly conduct a SR/MA, in which all the steps here depicts our experience and expertise combined with the already well-known and accepted international guidance.

We suggest that all steps of SR/MA should be done independently by 2–3 reviewers’ discussion, to ensure data quality and accuracy.

SR/MA steps include the development of research question, forming criteria, search strategy, searching databases, protocol registration, title, abstract, full-text screening, manual searching, extracting data, quality assessment, data checking, statistical analysis, double data checking, and manuscript writing.

Introduction

The amount of studies published in the biomedical literature, especially tropical medicine and health, has increased strikingly over the last few decades. This massive abundance of literature makes clinical medicine increasingly complex, and knowledge from various researches is often needed to inform a particular clinical decision. However, available studies are often heterogeneous with regard to their design, operational quality, and subjects under study and may handle the research question in a different way, which adds to the complexity of evidence and conclusion synthesis [ 1 ].

Systematic review and meta-analyses (SR/MAs) have a high level of evidence as represented by the evidence-based pyramid. Therefore, a well-conducted SR/MA is considered a feasible solution in keeping health clinicians ahead regarding contemporary evidence-based medicine.

Differing from a systematic review, unsystematic narrative review tends to be descriptive, in which the authors select frequently articles based on their point of view which leads to its poor quality. A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Furthermore, despite the increasing guidelines for effectively conducting a systematic review, we found that basic steps often start from framing question, then identifying relevant work which consists of criteria development and search for articles, appraise the quality of included studies, summarize the evidence, and interpret the results [ 2 , 3 ]. However, those simple steps are not easy to be reached in reality. There are many troubles that a researcher could be struggled with which has no detailed indication.

Conducting a SR/MA in tropical medicine and health may be difficult especially for young researchers; therefore, understanding of its essential steps is crucial. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, we recommend a flow diagram (Fig. 1 ) which illustrates a detailed and step-by-step the stages for SR/MA studies. This methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly and succinctly conduct a SR/MA; all the steps here depicts our experience and expertise combined with the already well known and accepted international guidance.

figure 1

Detailed flow diagram guideline for systematic review and meta-analysis steps. Note : Star icon refers to “2–3 reviewers screen independently”

Methods and results

Detailed steps for conducting any systematic review and meta-analysis.

We searched the methods reported in published SR/MA in tropical medicine and other healthcare fields besides the published guidelines like Cochrane guidelines {Higgins, 2011 #7} [ 4 ] to collect the best low-bias method for each step of SR/MA conduction steps. Furthermore, we used guidelines that we apply in studies for all SR/MA steps. We combined these methods in order to conclude and conduct a detailed flow diagram that shows the SR/MA steps how being conducted.

Any SR/MA must follow the widely accepted Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA checklist 2009) (Additional file 5 : Table S1) [ 5 ].

We proposed our methods according to a valid explanatory simulation example choosing the topic of “evaluating safety of Ebola vaccine,” as it is known that Ebola is a very rare tropical disease but fatal. All the explained methods feature the standards followed internationally, with our compiled experience in the conduct of SR beside it, which we think proved some validity. This is a SR under conduct by a couple of researchers teaming in a research group, moreover, as the outbreak of Ebola which took place (2013–2016) in Africa resulted in a significant mortality and morbidity. Furthermore, since there are many published and ongoing trials assessing the safety of Ebola vaccines, we thought this would provide a great opportunity to tackle this hotly debated issue. Moreover, Ebola started to fire again and new fatal outbreak appeared in the Democratic Republic of Congo since August 2018, which caused infection to more than 1000 people according to the World Health Organization, and 629 people have been killed till now. Hence, it is considered the second worst Ebola outbreak, after the first one in West Africa in 2014 , which infected more than 26,000 and killed about 11,300 people along outbreak course.

Research question and objectives

Like other study designs, the research question of SR/MA should be feasible, interesting, novel, ethical, and relevant. Therefore, a clear, logical, and well-defined research question should be formulated. Usually, two common tools are used: PICO or SPIDER. PICO (Population, Intervention, Comparison, Outcome) is used mostly in quantitative evidence synthesis. Authors demonstrated that PICO holds more sensitivity than the more specific SPIDER approach [ 6 ]. SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) was proposed as a method for qualitative and mixed methods search.

We here recommend a combined approach of using either one or both the SPIDER and PICO tools to retrieve a comprehensive search depending on time and resources limitations. When we apply this to our assumed research topic, being of qualitative nature, the use of SPIDER approach is more valid.

PICO is usually used for systematic review and meta-analysis of clinical trial study. For the observational study (without intervention or comparator), in many tropical and epidemiological questions, it is usually enough to use P (Patient) and O (outcome) only to formulate a research question. We must indicate clearly the population (P), then intervention (I) or exposure. Next, it is necessary to compare (C) the indicated intervention with other interventions, i.e., placebo. Finally, we need to clarify which are our relevant outcomes.

To facilitate comprehension, we choose the Ebola virus disease (EVD) as an example. Currently, the vaccine for EVD is being developed and under phase I, II, and III clinical trials; we want to know whether this vaccine is safe and can induce sufficient immunogenicity to the subjects.

An example of a research question for SR/MA based on PICO for this issue is as follows: How is the safety and immunogenicity of Ebola vaccine in human? (P: healthy subjects (human), I: vaccination, C: placebo, O: safety or adverse effects)

Preliminary research and idea validation

We recommend a preliminary search to identify relevant articles, ensure the validity of the proposed idea, avoid duplication of previously addressed questions, and assure that we have enough articles for conducting its analysis. Moreover, themes should focus on relevant and important health-care issues, consider global needs and values, reflect the current science, and be consistent with the adopted review methods. Gaining familiarity with a deep understanding of the study field through relevant videos and discussions is of paramount importance for better retrieval of results. If we ignore this step, our study could be canceled whenever we find out a similar study published before. This means we are wasting our time to deal with a problem that has been tackled for a long time.

To do this, we can start by doing a simple search in PubMed or Google Scholar with search terms Ebola AND vaccine. While doing this step, we identify a systematic review and meta-analysis of determinant factors influencing antibody response from vaccination of Ebola vaccine in non-human primate and human [ 7 ], which is a relevant paper to read to get a deeper insight and identify gaps for better formulation of our research question or purpose. We can still conduct systematic review and meta-analysis of Ebola vaccine because we evaluate safety as a different outcome and different population (only human).

Inclusion and exclusion criteria

Eligibility criteria are based on the PICO approach, study design, and date. Exclusion criteria mostly are unrelated, duplicated, unavailable full texts, or abstract-only papers. These exclusions should be stated in advance to refrain the researcher from bias. The inclusion criteria would be articles with the target patients, investigated interventions, or the comparison between two studied interventions. Briefly, it would be articles which contain information answering our research question. But the most important is that it should be clear and sufficient information, including positive or negative, to answer the question.

For the topic we have chosen, we can make inclusion criteria: (1) any clinical trial evaluating the safety of Ebola vaccine and (2) no restriction regarding country, patient age, race, gender, publication language, and date. Exclusion criteria are as follows: (1) study of Ebola vaccine in non-human subjects or in vitro studies; (2) study with data not reliably extracted, duplicate, or overlapping data; (3) abstract-only papers as preceding papers, conference, editorial, and author response theses and books; (4) articles without available full text available; and (5) case reports, case series, and systematic review studies. The PRISMA flow diagram template that is used in SR/MA studies can be found in Fig. 2 .

figure 2

PRISMA flow diagram of studies’ screening and selection

Search strategy

A standard search strategy is used in PubMed, then later it is modified according to each specific database to get the best relevant results. The basic search strategy is built based on the research question formulation (i.e., PICO or PICOS). Search strategies are constructed to include free-text terms (e.g., in the title and abstract) and any appropriate subject indexing (e.g., MeSH) expected to retrieve eligible studies, with the help of an expert in the review topic field or an information specialist. Additionally, we advise not to use terms for the Outcomes as their inclusion might hinder the database being searched to retrieve eligible studies because the used outcome is not mentioned obviously in the articles.

The improvement of the search term is made while doing a trial search and looking for another relevant term within each concept from retrieved papers. To search for a clinical trial, we can use these descriptors in PubMed: “clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH terms] OR “clinical trial”[All Fields]. After some rounds of trial and refinement of search term, we formulate the final search term for PubMed as follows: (ebola OR ebola virus OR ebola virus disease OR EVD) AND (vaccine OR vaccination OR vaccinated OR immunization) AND (“clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH Terms] OR “clinical trial”[All Fields]). Because the study for this topic is limited, we do not include outcome term (safety and immunogenicity) in the search term to capture more studies.

Search databases, import all results to a library, and exporting to an excel sheet

According to the AMSTAR guidelines, at least two databases have to be searched in the SR/MA [ 8 ], but as you increase the number of searched databases, you get much yield and more accurate and comprehensive results. The ordering of the databases depends mostly on the review questions; being in a study of clinical trials, you will rely mostly on Cochrane, mRCTs, or International Clinical Trials Registry Platform (ICTRP). Here, we propose 12 databases (PubMed, Scopus, Web of Science, EMBASE, GHL, VHL, Cochrane, Google Scholar, Clinical trials.gov , mRCTs, POPLINE, and SIGLE), which help to cover almost all published articles in tropical medicine and other health-related fields. Among those databases, POPLINE focuses on reproductive health. Researchers should consider to choose relevant database according to the research topic. Some databases do not support the use of Boolean or quotation; otherwise, there are some databases that have special searching way. Therefore, we need to modify the initial search terms for each database to get appreciated results; therefore, manipulation guides for each online database searches are presented in Additional file 5 : Table S2. The detailed search strategy for each database is found in Additional file 5 : Table S3. The search term that we created in PubMed needs customization based on a specific characteristic of the database. An example for Google Scholar advanced search for our topic is as follows:

With all of the words: ebola virus

With at least one of the words: vaccine vaccination vaccinated immunization

Where my words occur: in the title of the article

With all of the words: EVD

Finally, all records are collected into one Endnote library in order to delete duplicates and then to it export into an excel sheet. Using remove duplicating function with two options is mandatory. All references which have (1) the same title and author, and published in the same year, and (2) the same title and author, and published in the same journal, would be deleted. References remaining after this step should be exported to an excel file with essential information for screening. These could be the authors’ names, publication year, journal, DOI, URL link, and abstract.

Protocol writing and registration

Protocol registration at an early stage guarantees transparency in the research process and protects from duplication problems. Besides, it is considered a documented proof of team plan of action, research question, eligibility criteria, intervention/exposure, quality assessment, and pre-analysis plan. It is recommended that researchers send it to the principal investigator (PI) to revise it, then upload it to registry sites. There are many registry sites available for SR/MA like those proposed by Cochrane and Campbell collaborations; however, we recommend registering the protocol into PROSPERO as it is easier. The layout of a protocol template, according to PROSPERO, can be found in Additional file 5 : File S1.

Title and abstract screening

Decisions to select retrieved articles for further assessment are based on eligibility criteria, to minimize the chance of including non-relevant articles. According to the Cochrane guidance, two reviewers are a must to do this step, but as for beginners and junior researchers, this might be tiresome; thus, we propose based on our experience that at least three reviewers should work independently to reduce the chance of error, particularly in teams with a large number of authors to add more scrutiny and ensure proper conduct. Mostly, the quality with three reviewers would be better than two, as two only would have different opinions from each other, so they cannot decide, while the third opinion is crucial. And here are some examples of systematic reviews which we conducted following the same strategy (by a different group of researchers in our research group) and published successfully, and they feature relevant ideas to tropical medicine and disease [ 9 , 10 , 11 ].

In this step, duplications will be removed manually whenever the reviewers find them out. When there is a doubt about an article decision, the team should be inclusive rather than exclusive, until the main leader or PI makes a decision after discussion and consensus. All excluded records should be given exclusion reasons.

Full text downloading and screening

Many search engines provide links for free to access full-text articles. In case not found, we can search in some research websites as ResearchGate, which offer an option of direct full-text request from authors. Additionally, exploring archives of wanted journals, or contacting PI to purchase it if available. Similarly, 2–3 reviewers work independently to decide about included full texts according to eligibility criteria, with reporting exclusion reasons of articles. In case any disagreement has occurred, the final decision has to be made by discussion.

Manual search

One has to exhaust all possibilities to reduce bias by performing an explicit hand-searching for retrieval of reports that may have been dropped from first search [ 12 ]. We apply five methods to make manual searching: searching references from included studies/reviews, contacting authors and experts, and looking at related articles/cited articles in PubMed and Google Scholar.

We describe here three consecutive methods to increase and refine the yield of manual searching: firstly, searching reference lists of included articles; secondly, performing what is known as citation tracking in which the reviewers track all the articles that cite each one of the included articles, and this might involve electronic searching of databases; and thirdly, similar to the citation tracking, we follow all “related to” or “similar” articles. Each of the abovementioned methods can be performed by 2–3 independent reviewers, and all the possible relevant article must undergo further scrutiny against the inclusion criteria, after following the same records yielded from electronic databases, i.e., title/abstract and full-text screening.

We propose an independent reviewing by assigning each member of the teams a “tag” and a distinct method, to compile all the results at the end for comparison of differences and discussion and to maximize the retrieval and minimize the bias. Similarly, the number of included articles has to be stated before addition to the overall included records.

Data extraction and quality assessment

This step entitles data collection from included full-texts in a structured extraction excel sheet, which is previously pilot-tested for extraction using some random studies. We recommend extracting both adjusted and non-adjusted data because it gives the most allowed confounding factor to be used in the analysis by pooling them later [ 13 ]. The process of extraction should be executed by 2–3 independent reviewers. Mostly, the sheet is classified into the study and patient characteristics, outcomes, and quality assessment (QA) tool.

Data presented in graphs should be extracted by software tools such as Web plot digitizer [ 14 ]. Most of the equations that can be used in extraction prior to analysis and estimation of standard deviation (SD) from other variables is found inside Additional file 5 : File S2 with their references as Hozo et al. [ 15 ], Xiang et al. [ 16 ], and Rijkom et al. [ 17 ]. A variety of tools are available for the QA, depending on the design: ROB-2 Cochrane tool for randomized controlled trials [ 18 ] which is presented as Additional file 1 : Figure S1 and Additional file 2 : Figure S2—from a previous published article data—[ 19 ], NIH tool for observational and cross-sectional studies [ 20 ], ROBINS-I tool for non-randomize trials [ 21 ], QUADAS-2 tool for diagnostic studies, QUIPS tool for prognostic studies, CARE tool for case reports, and ToxRtool for in vivo and in vitro studies. We recommend that 2–3 reviewers independently assess the quality of the studies and add to the data extraction form before the inclusion into the analysis to reduce the risk of bias. In the NIH tool for observational studies—cohort and cross-sectional—as in this EBOLA case, to evaluate the risk of bias, reviewers should rate each of the 14 items into dichotomous variables: yes, no, or not applicable. An overall score is calculated by adding all the items scores as yes equals one, while no and NA equals zero. A score will be given for every paper to classify them as poor, fair, or good conducted studies, where a score from 0–5 was considered poor, 6–9 as fair, and 10–14 as good.

In the EBOLA case example above, authors can extract the following information: name of authors, country of patients, year of publication, study design (case report, cohort study, or clinical trial or RCT), sample size, the infected point of time after EBOLA infection, follow-up interval after vaccination time, efficacy, safety, adverse effects after vaccinations, and QA sheet (Additional file 6 : Data S1).

Data checking

Due to the expected human error and bias, we recommend a data checking step, in which every included article is compared with its counterpart in an extraction sheet by evidence photos, to detect mistakes in data. We advise assigning articles to 2–3 independent reviewers, ideally not the ones who performed the extraction of those articles. When resources are limited, each reviewer is assigned a different article than the one he extracted in the previous stage.

Statistical analysis

Investigators use different methods for combining and summarizing findings of included studies. Before analysis, there is an important step called cleaning of data in the extraction sheet, where the analyst organizes extraction sheet data in a form that can be read by analytical software. The analysis consists of 2 types namely qualitative and quantitative analysis. Qualitative analysis mostly describes data in SR studies, while quantitative analysis consists of two main types: MA and network meta-analysis (NMA). Subgroup, sensitivity, cumulative analyses, and meta-regression are appropriate for testing whether the results are consistent or not and investigating the effect of certain confounders on the outcome and finding the best predictors. Publication bias should be assessed to investigate the presence of missing studies which can affect the summary.

To illustrate basic meta-analysis, we provide an imaginary data for the research question about Ebola vaccine safety (in terms of adverse events, 14 days after injection) and immunogenicity (Ebola virus antibodies rise in geometric mean titer, 6 months after injection). Assuming that from searching and data extraction, we decided to do an analysis to evaluate Ebola vaccine “A” safety and immunogenicity. Other Ebola vaccines were not meta-analyzed because of the limited number of studies (instead, it will be included for narrative review). The imaginary data for vaccine safety meta-analysis can be accessed in Additional file 7 : Data S2. To do the meta-analysis, we can use free software, such as RevMan [ 22 ] or R package meta [ 23 ]. In this example, we will use the R package meta. The tutorial of meta package can be accessed through “General Package for Meta-Analysis” tutorial pdf [ 23 ]. The R codes and its guidance for meta-analysis done can be found in Additional file 5 : File S3.

For the analysis, we assume that the study is heterogenous in nature; therefore, we choose a random effect model. We did an analysis on the safety of Ebola vaccine A. From the data table, we can see some adverse events occurring after intramuscular injection of vaccine A to the subject of the study. Suppose that we include six studies that fulfill our inclusion criteria. We can do a meta-analysis for each of the adverse events extracted from the studies, for example, arthralgia, from the results of random effect meta-analysis using the R meta package.

From the results shown in Additional file 3 : Figure S3, we can see that the odds ratio (OR) of arthralgia is 1.06 (0.79; 1.42), p value = 0.71, which means that there is no association between the intramuscular injection of Ebola vaccine A and arthralgia, as the OR is almost one, and besides, the P value is insignificant as it is > 0.05.

In the meta-analysis, we can also visualize the results in a forest plot. It is shown in Fig. 3 an example of a forest plot from the simulated analysis.

figure 3

Random effect model forest plot for comparison of vaccine A versus placebo

From the forest plot, we can see six studies (A to F) and their respective OR (95% CI). The green box represents the effect size (in this case, OR) of each study. The bigger the box means the study weighted more (i.e., bigger sample size). The blue diamond shape represents the pooled OR of the six studies. We can see the blue diamond cross the vertical line OR = 1, which indicates no significance for the association as the diamond almost equalized in both sides. We can confirm this also from the 95% confidence interval that includes one and the p value > 0.05.

For heterogeneity, we see that I 2 = 0%, which means no heterogeneity is detected; the study is relatively homogenous (it is rare in the real study). To evaluate publication bias related to the meta-analysis of adverse events of arthralgia, we can use the metabias function from the R meta package (Additional file 4 : Figure S4) and visualization using a funnel plot. The results of publication bias are demonstrated in Fig. 4 . We see that the p value associated with this test is 0.74, indicating symmetry of the funnel plot. We can confirm it by looking at the funnel plot.

figure 4

Publication bias funnel plot for comparison of vaccine A versus placebo

Looking at the funnel plot, the number of studies at the left and right side of the funnel plot is the same; therefore, the plot is symmetry, indicating no publication bias detected.

Sensitivity analysis is a procedure used to discover how different values of an independent variable will influence the significance of a particular dependent variable by removing one study from MA. If all included study p values are < 0.05, hence, removing any study will not change the significant association. It is only performed when there is a significant association, so if the p value of MA done is 0.7—more than one—the sensitivity analysis is not needed for this case study example. If there are 2 studies with p value > 0.05, removing any of the two studies will result in a loss of the significance.

Double data checking

For more assurance on the quality of results, the analyzed data should be rechecked from full-text data by evidence photos, to allow an obvious check for the PI of the study.

Manuscript writing, revision, and submission to a journal

Writing based on four scientific sections: introduction, methods, results, and discussion, mostly with a conclusion. Performing a characteristic table for study and patient characteristics is a mandatory step which can be found as a template in Additional file 5 : Table S3.

After finishing the manuscript writing, characteristics table, and PRISMA flow diagram, the team should send it to the PI to revise it well and reply to his comments and, finally, choose a suitable journal for the manuscript which fits with considerable impact factor and fitting field. We need to pay attention by reading the author guidelines of journals before submitting the manuscript.

The role of evidence-based medicine in biomedical research is rapidly growing. SR/MAs are also increasing in the medical literature. This paper has sought to provide a comprehensive approach to enable reviewers to produce high-quality SR/MAs. We hope that readers could gain general knowledge about how to conduct a SR/MA and have the confidence to perform one, although this kind of study requires complex steps compared to narrative reviews.

Having the basic steps for conduction of MA, there are many advanced steps that are applied for certain specific purposes. One of these steps is meta-regression which is performed to investigate the association of any confounder and the results of the MA. Furthermore, there are other types rather than the standard MA like NMA and MA. In NMA, we investigate the difference between several comparisons when there were not enough data to enable standard meta-analysis. It uses both direct and indirect comparisons to conclude what is the best between the competitors. On the other hand, mega MA or MA of patients tend to summarize the results of independent studies by using its individual subject data. As a more detailed analysis can be done, it is useful in conducting repeated measure analysis and time-to-event analysis. Moreover, it can perform analysis of variance and multiple regression analysis; however, it requires homogenous dataset and it is time-consuming in conduct [ 24 ].

Conclusions

Systematic review/meta-analysis steps include development of research question and its validation, forming criteria, search strategy, searching databases, importing all results to a library and exporting to an excel sheet, protocol writing and registration, title and abstract screening, full-text screening, manual searching, extracting data and assessing its quality, data checking, conducting statistical analysis, double data checking, manuscript writing, revising, and submitting to a journal.

Availability of data and materials

Not applicable.

Abbreviations

Network meta-analysis

Principal investigator

Population, Intervention, Comparison, Outcome

Preferred Reporting Items for Systematic Review and Meta-analysis statement

Quality assessment

Sample, Phenomenon of Interest, Design, Evaluation, Research type

Systematic review and meta-analyses

Bello A, Wiebe N, Garg A, Tonelli M. Evidence-based decision-making 2: systematic reviews and meta-analysis. Methods Mol Biol (Clifton, NJ). 2015;1281:397–416.

Article   Google Scholar  

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96(3):118–21.

Rys P, Wladysiuk M, Skrzekowska-Baran I, Malecki MT. Review articles, systematic reviews and meta-analyses: which can be trusted? Polskie Archiwum Medycyny Wewnetrznej. 2009;119(3):148–56.

PubMed   Google Scholar  

Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. 2011.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14:579.

Gross L, Lhomme E, Pasin C, Richert L, Thiebaut R. Ebola vaccine development: systematic review of pre-clinical and clinical studies, and meta-analysis of determinants of antibody response variability after vaccination. Int J Infect Dis. 2018;74:83–96.

Article   CAS   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, ... Henry DA. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Giang HTN, Banno K, Minh LHN, Trinh LT, Loc LT, Eltobgy A, et al. Dengue hemophagocytic syndrome: a systematic review and meta-analysis on epidemiology, clinical signs, outcomes, and risk factors. Rev Med Virol. 2018;28(6):e2005.

Morra ME, Altibi AMA, Iqtadar S, Minh LHN, Elawady SS, Hallab A, et al. Definitions for warning signs and signs of severe dengue according to the WHO 2009 classification: systematic review of literature. Rev Med Virol. 2018;28(4):e1979.

Morra ME, Van Thanh L, Kamel MG, Ghazy AA, Altibi AMA, Dat LM, et al. Clinical outcomes of current medical approaches for Middle East respiratory syndrome: a systematic review and meta-analysis. Rev Med Virol. 2018;28(3):e1977.

Vassar M, Atakpo P, Kash MJ. Manual search approaches used by systematic reviewers in dermatology. Journal of the Medical Library Association: JMLA. 2016;104(4):302.

Naunheim MR, Remenschneider AK, Scangas GA, Bunting GW, Deschler DG. The effect of initial tracheoesophageal voice prosthesis size on postoperative complications and voice outcomes. Ann Otol Rhinol Laryngol. 2016;125(6):478–84.

Rohatgi AJaiWa. Web Plot Digitizer. ht tp. 2014;2.

Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5(1):13.

Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14(1):135.

Van Rijkom HM, Truin GJ, Van’t Hof MA. A meta-analysis of clinical studies on the caries-inhibiting effect of fluoride gel treatment. Carries Res. 1998;32(2):83–92.

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Tawfik GM, Tieu TM, Ghozy S, Makram OM, Samuel P, Abdelaal A, et al. Speech efficacy, safety and factors affecting lifetime of voice prostheses in patients with laryngeal cancer: a systematic review and network meta-analysis of randomized controlled trials. J Clin Oncol. 2018;36(15_suppl):e18031-e.

Wannemuehler TJ, Lobo BC, Johnson JD, Deig CR, Ting JY, Gregory RL. Vibratory stimulus reduces in vitro biofilm formation on tracheoesophageal voice prostheses. Laryngoscope. 2016;126(12):2752–7.

Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355.

RevMan The Cochrane Collaboration %J Copenhagen TNCCTCC. Review Manager (RevMan). 5.0. 2008.

Schwarzer GJRn. meta: An R package for meta-analysis. 2007;7(3):40-45.

Google Scholar  

Simms LLH. Meta-analysis versus mega-analysis: is there a difference? Oral budesonide for the maintenance of remission in Crohn’s disease: Faculty of Graduate Studies, University of Western Ontario; 1998.

Download references

Acknowledgements

This study was conducted (in part) at the Joint Usage/Research Center on Tropical Disease, Institute of Tropical Medicine, Nagasaki University, Japan.

Author information

Authors and affiliations.

Faculty of Medicine, Ain Shams University, Cairo, Egypt

Gehad Mohamed Tawfik

Online research Club http://www.onlineresearchclub.org/

Gehad Mohamed Tawfik, Kadek Agus Surya Dila, Muawia Yousif Fadlelmola Mohamed, Dao Ngoc Hien Tam, Nguyen Dang Kien & Ali Mahmoud Ahmed

Pratama Giri Emas Hospital, Singaraja-Amlapura street, Giri Emas village, Sawan subdistrict, Singaraja City, Buleleng, Bali, 81171, Indonesia

Kadek Agus Surya Dila

Faculty of Medicine, University of Khartoum, Khartoum, Sudan

Muawia Yousif Fadlelmola Mohamed

Nanogen Pharmaceutical Biotechnology Joint Stock Company, Ho Chi Minh City, Vietnam

Dao Ngoc Hien Tam

Department of Obstetrics and Gynecology, Thai Binh University of Medicine and Pharmacy, Thai Binh, Vietnam

Nguyen Dang Kien

Faculty of Medicine, Al-Azhar University, Cairo, Egypt

Ali Mahmoud Ahmed

Evidence Based Medicine Research Group & Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Nguyen Tien Huy

Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Department of Clinical Product Development, Institute of Tropical Medicine (NEKKEN), Leading Graduate School Program, and Graduate School of Biomedical Sciences, Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

NTH and GMT were responsible for the idea and its design. The figure was done by GMT. All authors contributed to the manuscript writing and approval of the final version.

Corresponding author

Correspondence to Nguyen Tien Huy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Figure S1. Risk of bias assessment graph of included randomized controlled trials. (TIF 20 kb)

Additional file 2:

Figure S2. Risk of bias assessment summary. (TIF 69 kb)

Additional file 3:

Figure S3. Arthralgia results of random effect meta-analysis using R meta package. (TIF 20 kb)

Additional file 4:

Figure S4. Arthralgia linear regression test of funnel plot asymmetry using R meta package. (TIF 13 kb)

Additional file 5:

Table S1. PRISMA 2009 Checklist. Table S2. Manipulation guides for online database searches. Table S3. Detailed search strategy for twelve database searches. Table S4. Baseline characteristics of the patients in the included studies. File S1. PROSPERO protocol template file. File S2. Extraction equations that can be used prior to analysis to get missed variables. File S3. R codes and its guidance for meta-analysis done for comparison between EBOLA vaccine A and placebo. (DOCX 49 kb)

Additional file 6:

Data S1. Extraction and quality assessment data sheets for EBOLA case example. (XLSX 1368 kb)

Additional file 7:

Data S2. Imaginary data for EBOLA case example. (XLSX 10 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Tawfik, G.M., Dila, K.A.S., Mohamed, M.Y.F. et al. A step by step guide for conducting a systematic review and meta-analysis with simulation data. Trop Med Health 47 , 46 (2019). https://doi.org/10.1186/s41182-019-0165-6

Download citation

Received : 30 January 2019

Accepted : 24 May 2019

Published : 01 August 2019

DOI : https://doi.org/10.1186/s41182-019-0165-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Tropical Medicine and Health

ISSN: 1349-4147

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

a meta analysis literature review

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Cochrane Handbook

The Cochrane Handbook isn't set down to be a standard, but it has become the de facto standard for planning and carrying out a systematic review. Chapter 6, Searching for Studies, is most helpful in planning your review.

Scoping Reviews, JBI Manual for Evidence Synthesis

The Joanna Briggs Institute provides extensive guidance for their authors in producing both systematic and scoping reviews. Their chapter on scoping reviews provides a succinct overview of the scoping review process. JBI maintains a page with other materials for scoping reviewers.

Methods Guide for Effectiveness and Comparative Effectiveness Reviews

Very good chapters on conducting a review, most of which were published as articles in the Journal of Clincal Epidemiology.

Institutes of Medicine Standards for Systematic Reviews

The IOM standards promote objective, transparent, and scientifically valid systematic reviews. They address the entire systematic review process, from locating, screening, and selecting studies for the review, to synthesizing the findings (including meta-analysis) and assessing the overall quality of the body of evidence, to producing the final review report.

Systematic Reviews: CRD's Guidance for Undertaking Reviews in Health Care

Provides a succinct outline for carrying out systematic reviews and well as details about constructing a protocol, testing for bias, and other aspects of the review process. Includes examples.

Systematic reviews to support evidence-based medicine how to review and apply findings of healthcare research

Khan, K., & Royal Society of Medicine. 2nd ed,  2013. London [England]: Hodder Annold. [Harvard ID required]

Systematic reviews to answer health care questions

Nelson, H. (2014). Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins. [Harvard ID required]

Systematic Review Toolbox

Not a guide or standard but a clearinghouse for all things systematic review. Check here for templates, reporting standards, screening tools, risk of bias assessment, etc.

Reporting Standards: PRISMA and MOOSE

You will improve the quality of your review by adhering to the standards below. Using the approriate standard can reassure editors and reviewers that you have conscienciously carried out your review.

http://www.prisma-statement.org/ The Preferred Reporting Items for Systematic Reviews and Meta-Analyses is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. A 27-item checklist,  PRISMA  focuses on randomized trials but can also be used as a basis for reporting systematic reviews of other types of research, particularly evaluations of interventions. PRISMA may also be useful for critical appraisal of published systematic reviews, although it is not a quality assessment instrument to gauge the quality of a systematic review.

Consider using PRISMA-P when completing your protocol. PRISMA-P is a 17-item checklist for elements considered essential in protocol for a systematic review or meta-analysis. The documentation contains an excellent rationale for completing a protocol, too.

Use PRISMA-ScR, a 20-item checklist, for reporting scoping reviews. The documentation provides a clear overview of scoping reviews.

Further Reading:

Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009 Jul 21;6(7):e1000097. Epub 2009 Jul 21. PubMed PMID: 19621072 .  

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting  systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009 Jul 21;6(7):e1000100. Epub 2009 Jul 21. PubMed PMID: 19621070 . 

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA; PRISMA-P Group. Preferred reporting items for systematic review andmeta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015 Jan 2;349:g7647. doi: 10.1136/bmj.g7647. PubMed PMID: 25555855 .

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA; PRISMA-P Group. Preferred reporting items for systematic review andmeta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015 Jan 1;4:1. doi: 10.1186/2046-4053-4-1. PubMed PMID: 25554246 .

Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, Moher D, Peters MDJ, Horsley T, Weeks L, Hempel S, Akl EA, Chang C, McGowan J, Stewart L, Hartling L, Aldcroft A, Wilson MG, Garritty C, Lewin S, Godfrey CM, Macdonald MT, Langlois EV, Soares-Weiser K, Moriarty J, Clifford T, Tunçalp Ö, Straus SE. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018 Oct 2;169(7):467-473. doi: 10.7326/M18-0850. Epub 2018 Sep 4. PMID: 30178033 .

Also published in the Annals of Internal Medicine, BMJ, and the Journal of Clinical Epidemiology.

MOOSE Guidelines

http://www.consort-statement.org/Media/Default/Downloads/Other%20Instruments/MOOSE%20Statement%202000.pdf Meta-analysis of Observational Studies in Epidemiology checklist contains specifications for reporting of meta-analyses of observational studies in epidemiology. Editors will expect you to follow and cite this checklist.  It refers to the  Newcastle-Ottawa Scale for assessing the quality of non-randomized studies, a method of rating each observational study in your meta-analysis.

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000 Apr 19;283(15):2008-12. PubMed PMID:  10789670 .

  • << Previous: Getting Started
  • Next: Review Protocols >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
  • En español – ExME
  • Em português – EME

Systematic reviews vs meta-analysis: what’s the difference?

Posted on 24th July 2023 by Verónica Tanco Tellechea

""

You may hear the terms ‘systematic review’ and ‘meta-analysis being used interchangeably’. Although they are related, they are distinctly different. Learn more in this blog for beginners.

What is a systematic review?

According to Cochrane (1), a systematic review attempts to identify, appraise and synthesize all the empirical evidence to answer a specific research question. Thus, a systematic review is where you might find the most relevant, adequate, and current information regarding a specific topic. In the levels of evidence pyramid , systematic reviews are only surpassed by meta-analyses. 

To conduct a systematic review, you will need, among other things: 

  • A specific research question, usually in the form of a PICO question.
  • Pre-specified eligibility criteria, to decide which articles will be included or discarded from the review. 
  • To follow a systematic method that will minimize bias.

You can find protocols that will guide you from both Cochrane and the Equator Network , among other places, and if you are a beginner to the topic then have a read of an overview about systematic reviews.

What is a meta-analysis?

A meta-analysis is a quantitative, epidemiological study design used to systematically assess the results of previous research (2) . Usually, they are based on randomized controlled trials, though not always. This means that a meta-analysis is a mathematical tool that allows researchers to mathematically combine outcomes from multiple studies.

When can a meta-analysis be implemented?

There is always the possibility of conducting a meta-analysis, yet, for it to throw the best possible results it should be performed when the studies included in the systematic review are of good quality, similar designs, and have similar outcome measures.

Why are meta-analyses important?

Outcomes from a meta-analysis may provide more precise information regarding the estimate of the effect of what is being studied because it merges outcomes from multiple studies. In a meta-analysis, data from various trials are combined and generate an average result (1), which is portrayed in a forest plot diagram. Moreover, meta-analysis also include a funnel plot diagram to visually detect publication bias.

Conclusions

A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, pre-specified eligibility criteria for including articles, and a systematic method for its production. Whereas a meta-analysis is a quantitative, epidemiological study design used to assess the results of articles included in a systematic-review. 

Remember: All meta-analyses involve a systematic review, but not all systematic reviews involve a meta-analysis.

If you would like some further reading on this topic, we suggest the following:

The systematic review – a S4BE blog article

Meta-analysis: what, why, and how – a S4BE blog article

The difference between a systematic review and a meta-analysis – a blog article via Covidence

Systematic review vs meta-analysis: what’s the difference? A 5-minute video from Research Masterminds:

  • About Cochrane reviews [Internet]. Cochranelibrary.com. [cited 2023 Apr 30]. Available from: https://www.cochranelibrary.com/about/about-cochrane-reviews
  • Haidich AB. Meta-analysis in medical research. Hippokratia. 2010;14(Suppl 1):29–37.

' src=

Verónica Tanco Tellechea

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

a meta analysis literature review

How to read a funnel plot

This blog introduces you to funnel plots, guiding you through how to read them and what may cause them to look asymmetrical.

""

Heterogeneity in meta-analysis

When you bring studies together in a meta-analysis, one of the things you need to consider is the variability in your studies – this is called heterogeneity. This blog presents the three types of heterogeneity, considers the different types of outcome data, and delves a little more into dealing with the variations.

""

Natural killer cells in glioblastoma therapy

As seen in a previous blog from Davide, modern neuroscience often interfaces with other medical specialities. In this blog, he provides a summary of new evidence about the potential of a therapeutic strategy born at the crossroad between neurology, immunology and oncology.

Main Navigation Menu

Systematic reviews.

  • Getting Started with Systematic Reviews

What is a Systematic Review and Meta-Analysis

Differences between systematic and literature reviews.

  • Finding and Evaluating Existing Systematic Reviews
  • Steps in a Systematic Review
  • Step 1: Developing a Question
  • Step 2: Selecting Databases
  • Step 3: Grey Literature
  • Step 4: Registering a Systematic Review Protocol
  • Step 5: Translate Search Strategies
  • Step 6: Citation Management Tools
  • Step 7: Article Screening
  • Other Resources
  • Interlibrary Loan (ILL)

A systematic review collects and analyzes all evidence that answers a specific research question. In a systematic review, a question needs to be clearly defined and have inclusion and exclusion criteria. In general, specific and systematic methods selected are intended to minimize bias. This is followed by an extensive search of the literature and a critical analysis of the search results. The reason why a systematic review is conducted is to provide a current evidence-based answer to a specific question that in turn helps to inform decision making. Check out the Centers for Disease Control and Prevention and Cochrane Reviews links to learn more about Systematic Reviews.

A systematic review can be combined with a meta-analysis. A meta-analysis is the use of statistical methods to summarize the results of a systematic review. Not every systematic review contains a meta-analysis. A meta-analysis may not be appropriate if the designs of the studies are too different, if there are concerns about the quality of studies, if the outcomes measured are not sufficiently similar for the result across the studies to be meaningful.

Centers for Disease Control and Prevention. (n.d.).  Systematic Reviews . Retrieved from  https://www.cdc.gov/library/researchguides/sytemsaticreviews.html

Cochrane Library. (n.d.).  About Cochrane Reviews . Retrieved from  https://www.cochranelibrary.com/about/about-cochrane-reviews

a meta analysis literature review

Source: Kysh, Lynn (2013): Difference between a systematic review and a literature review. [figshare]. Available at:  https://figshare.com/articles/Difference_between_a_systematic_review_and_a_literature_review/766364

  • << Previous: Getting Started with Systematic Reviews
  • Next: Finding and Evaluating Existing Systematic Reviews >>
  • Last Updated: Feb 15, 2024 2:53 PM
  • URL: https://guides.library.ucmo.edu/systematicreviews

Logo for OPEN OKSTATE

Literature Review, Systematic Review and Meta-analysis

Literature reviews can be a good way to narrow down theoretical interests; refine a research question; understand contemporary debates; and orientate a particular research project. It is very common for PhD theses to contain some element of reviewing the literature around a particular topic. It’s typical to have an entire chapter devoted to reporting the result of this task, identifying gaps in the literature and framing the collection of additional data.

Systematic review is a type of literature review that uses systematic methods to collect secondary data, critically appraise research studies, and synthesise findings. Systematic reviews are designed to provide a comprehensive, exhaustive summary of current theories and/or evidence and published research (Siddaway, Wood & Hedges, 2019) and may be qualitative or qualitative. Relevant studies and literature are identified through a research question, summarised and synthesized into a discrete set of findings or a description of the state-of-the-art. This might result in a ‘literature review’ chapter in a doctoral thesis, but can also be the basis of an entire research project.

Meta-analysis is a specialised type of systematic review which is quantitative and rigorous, often comparing data and results across multiple similar studies. This is a common approach in medical research where several papers might report the results of trials of a particular treatment, for instance. The meta-analysis then statistical techniques to synthesize these into one summary. This can have a high statistical power but care must be taken not to introduce bias in the selection and filtering of evidence.

Whichever type of review is employed, the process is similarly linear. The first step is to frame a question which can guide the review. This is used to identify relevant literature, often through searching subject-specific scientific databases. From these results the most relevant will be identified. Filtering is important here as there will be time constraints that prevent the researcher considering every possible piece of evidence or theoretical viewpoint. Once a concrete evidence base has been identified, the researcher extracts relevant data before reporting the synthesized results in an extended piece of writing.

Literature Review: GO-GN Insights

Sarah Lambert used a systematic review of literature with both qualitative and quantitative phases to investigate the question “How can open education programs be reconceptualised as acts of social justice to improve the access, participation and success of those who are traditionally excluded from higher education knowledge and skills?”

“My PhD research used systematic review, qualitative synthesis, case study and discourse analysis techniques, each was underpinned and made coherent by a consistent critical inquiry methodology and an overarching research question. “Systematic reviews are becoming increasingly popular as a way to collect evidence of what works across multiple contexts and can be said to address some of the weaknesses of case study designs which provide detail about a particular context – but which is often not replicable in other socio-cultural contexts (such as other countries or states.) Publication of systematic reviews that are done according to well defined methods are quite likely to be published in high-ranking journals – my PhD supervisors were keen on this from the outset and I was encouraged along this path. “Previously I had explored social realist authors and a social realist approach to systematic reviews (Pawson on realist reviews) but they did not sufficiently embrace social relations, issues of power, inclusion/exclusion. My supervisors had pushed me to explain what kind of realist review I intended to undertake, and I found out there was a branch of critical realism which was briefly of interest. By getting deeply into theory and trying out ways of combining theory I also feel that I have developed a deeper understanding of conceptual working and the different ways theories can be used at all stagesof research and even how to come up with novel conceptual frameworks.”

Useful references for Systematic Review & Meta-Analysis: Finfgeld-Connett (2014); Lambert (2020); Siddaway, Wood & Hedges (2019)

Research Methods Handbook Copyright © 2020 by Rob Farrow; Francisco Iniesto; Martin Weller; and Rebecca Pitt is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 25 April 2024

Surgery is associated with better long-term outcomes than pharmacological treatment for obesity: a systematic review and meta-analysis

  • Leonardo Zumerkorn Pipek 1 ,
  • Walter Augusto Fabio Moraes 2 ,
  • Rodrigo Massato Nobetani 2 ,
  • Vitor Santos Cortez 2 ,
  • Alberto Santos Condi 2 ,
  • João Victor Taba 2 ,
  • Rafaela Farias Vidigal Nascimento 3 ,
  • Milena Oliveira Suzuki 2 ,
  • Fernanda Sayuri do Nascimento 2 ,
  • Vitoria Carneiro de Mattos 2 ,
  • Leandro Ryuchi Iuamoto 4 ,
  • Wu Tu Hsing 4 ,
  • Luiz Augusto Carneiro-D’Albuquerque 5 ,
  • Alberto Meyer 5 &
  • Wellington Andraus 5  

Scientific Reports volume  14 , Article number:  9521 ( 2024 ) Cite this article

Metrics details

  • Endocrine system and metabolic diseases
  • Gastrointestinal diseases

Obesity is a highly prevalent disease with numerous complications. Both intensive medical treatment with the use of pharmacological drugs and bariatric surgery are current options. The objective of this meta-analysis was to compare, in the long-term, intensive medical treatment and surgery based on twelve parameters related to weight loss, cardiovascular and endocrine changes. A review of the literature was conducted in accordance with the PRISMA guidelines (PROSPERO: CRD42021265637). The literature screening was done from inception to October 2023 through PubMed, EMBASE and Web of Science databases. We included randomized clinical trials that had separate groups for medical treatment and bariatric surgery as an intervention for obesity. The risk of bias was assessed through RoB2. A meta-analysis was performed with measures of heterogeneity and publication bias. Subgroup analysis for each surgery type was performed. Data is presented as forest-plots. Reviewers independently identified 6719 articles and 6 papers with a total 427 patients were included. All studies were randomized controlled trials, three had a follow up of 5 years and two had a follow up of 10 years. Both groups demonstrated statistical significance for most parameters studied. Surgery was superior for weight loss (− 22.05 kg [− 28.86; − 15.23), total cholesterol (− 0.88 [− 1.59; − 0.17]), triglycerides (− 0.70 [− 0.82; − 0.59]), HDL (0.12 [0.02; 0.23]), systolic pressure (− 4.49 [− 7.65; − 1.33]), diastolic pressure (− 2.28 [− 4.25; − 0.31]), Hb glycated (− 0.97 [− 1.31; − 0.62]), HOMA IR (− 2.94; [− 3.52; − 2.35]) and cardiovascular risk (− 0.08; [− 0.10; − 0.05]). Patient in the surgical treatment group had better long term outcomes when compared to the non-surgical group for most clinical parameters.

Introduction

Obesity has been a known condition for over 2000 years 1 but that has become much more prevalent in recent decades. Despite great efforts to prevent this disease, the prevalence in adults in the United States has increased in recent decades and reached 42.4% in 2018. The GBD Obesity Study 2 Collaborators 2015 showed that this increasing trend occurred in more than 70 countries and is highly expressive in adolescents.

The classification of obesity is defined by a body mass index (BMI) greater than 30 kg/m 2 . The psychological damage that many of these patients suffer in a society governed by aesthetic standards is just one of the most visible and immediate consequences of obesity. Mortality from cardiovascular causes and its relationship with BMI has already been widely studied 3 , showing that the risk increases progressively with the increase of the index. Similarly, obesity was associated with a higher incidence of cancer 4 , respiratory 5 and metabolic 6 diseases.

In this context, the importance of effective treatment of this condition is clear, reducing mortality and improving the quality of life of these patients. While some benefits are evident with a loss of just 5% 6 of their weight, many patients require a more expressive loss to reduce the risks associated with obesity.

There are several treatments available for weight loss. Lifestyle changes, low calorie diet and increasing physical activity are the mainstay treatment for all patients 7 , 8 . Specific weight loss diets and exercise programs have also been developed for this purpose, yielding varying results. Finally, pharmacological, and surgical treatment has gained more attention in recent years for selected patients in whom other measures were insufficient.

Several studies have demonstrated the effectiveness of bariatric surgery in the short and medium term for the treatment of obesity. More recent studies have also shown that new drugs developed for weight loss may be a viable option for the treatment of this disease 8 , 9 . Comparison of these new drugs with surgical treatment is scarce in the literature and aimed only at evaluating changes related to weight loss in a short period of time.

This systematic review evaluated the hypothesis whether surgical treatment is superior than non-surgical treatment for patients with obesity. We evaluated the long-term effect of these treatments on anthropometric measures (weight, waist circumference, BMI) and on obesity related pathologies (triglycerides, LDL, HDL, total cholesterol, cardiovascular risk, systolic and diastolic blood pressure, HOMA and glycated hemoglobin).

Materials and methods

This systematic review was carried out in accordance with the items of Preferred Reports for Systematic Reviews and Protocol Meta-Analysis (PRISMA-P) 10 and assessing the methodological quality of systematic reviews (AMSTAR-2) guidelines 11 . This study was registered by the Prospective Register of Systematic Reviews (PROSPERO, 258667) before the research was carried out.

Drafting of the research question was based on the PICO strategy 12 , considering: P (Patients with obesity with indication for bariatric surgery based on BMI); I (Bariatric Surgery); C (Pharmacological treatment); O (Long term morbidity/mortality—at least 5 years of follow up).

Eligibility criteria

Inclusion criteria.

Types of studies: Randomized clinical trials.

Types of participants: Patients eligible for bariatric surgery, according to the American Society for Metabolic and Bariatric Surgery (ASMBS).

Types of intervention: Bariatric surgery or medical treatment.

Exclusion criteria

Studies were excluded if they: (1) did not have one group for each type of intervention (surgery or pharmacologic treatment); (2) had a heterogeneous population; (3) did not use a standard assessment method for the entire duration of the study, or did not have pre-assessment; (4) were not related to the question in the review; (5) were in a language other than English, Portuguese or Spanish; (6) were incomplete, unpublished or inaccessible to the authors.

Types of variables/parameters analyzed

Data was collected and arranged in tables, including the authors name, date and country of publication, number of participants included in the final analysis, sex, age, and body mass index.

Literature revision

The survey was from inception to October 10, 2023, without language restrictions, in the Medline database (via PubMed), EMBASE and Web of Science.

Using the search tool, we selected MeSH terms from the most relevant publications to conduct a new search to obtain articles that could be included in this systematic review. In addition, a manual search of theses, meetings, references, study records and contact with experts in the field was carried out.

Search strategy

The same keywords were used in all databases, according to each database input format.

The search strategy was:

(Bariatric Surgery) AND ((nonsurgical) OR (Orlistat) OR (phentermine) OR (topiramate) OR (lorcaserin) OR (naltrexone) OR (bupropion) OR (liraglutide) OR (conservative) OR (conventional) OR (Anti-Obesity Agents) OR (Intensive medical)) AND (obesity) → 3024.

(Bariatric Surgery) AND ((nonsurgical) OR (conservative) OR (Anti-Obesity Agents) OR (Intensive medical)) AND (obesity) → 4732.

Web of Science:

(Bariatric Surgery) AND ((nonsurgical) OR (conservative) OR (Anti-Obesity Agents) OR (Intensive medical)) AND (obesity) → 1772.

Data extraction

The data for each study was extracted independently by two authors. Disagreements were resolved by consensus. If no consensus was reached, a third author was consulted. Data extraction was carried out using the Rayyan tool— https://rayyan.qcri.org/ 13 .

All studies were analyzed by their titles and abstracts, according to inclusion and exclusion criteria. If the eligibility criteria was met, the full text would be extracted. All studies eligible for qualitative analysis are described in the “Results” section.

Missing data was clarified by contacting the authors directly.

Data validation

The risk of bias for intervention-type studies was analyzed using the guidelines of the Cochrane Back Review Group (CBRG) 14 .

Statistical analysis

As several studies of sufficient quality were available, a meta-analysis was carried out with measures of heterogeneity and publication bias. The data was presented through forest-plots, according to their statistical relevance.

Characteristics of study participants are presented as means, minimum and maximum values for quantitative variables, and as frequencies and percentages for qualitative variables. The prevalence values and 95% confidence intervals was calculated using the Wilson method To assess the global heterogeneity between the studies, Cochran's Q test was calculated, as well as the I2 (percentage of variation). The results of the studies' association measures and their respective 95% confidence intervals are presented in forest-plots.

Statistical analysis were performed using the Stata/MP 14.0 software for Windows.

Study selection

The electronic search found 9528 results for the keywords used. After removing 2809 duplicates and screening through abstract, we considered 55 potentially eligible studies for full-text analysis. Of these, 49 did not respect the exclusion criteria. Only 6 studies were considered eligible for qualitative analysis and 6 articles were eligible for meta-analysis [Fig.  1 ].

figure 1

PRISMA 2020 flow diagram for new systematic reviews.

Many studies were excluded due to lack of description for the intervention in the non-surgical group.

Study characteristics

The following articles were included in the systematic review and meta-analysis 15 , 16 , 17 , 18 , 19 , 20 . In total, there were 427 participants. All studies were RCT. Four had a follow up of five years 15 , 16 , 19 and two had a follow up of 10 years 17 , 18 . Of the six eligible studies, two were undertaken in the United States of America 15 , 16 , two in Italy 17 , 19 , one in Australia 18 , and one in Singapore 20 . Study characteristics and detailed demographics can be found in Tables 1 and 2 . All studies included a group treated exclusively with intensive medical treatment (IMT). The definition of IMT differed between them but were considered if the patients had frequent follow up visits and were instructed on health habits including exercise and diet, with or without the use of pharmacological treatment.

There were four modalities of surgery used for weight loss: Roux-en-Y Gastric Bypass (RYGB) 15 , 17 , 18 , 19 , 20 ; Biliopancreatic diversion (BPD) 17 , 19 ; Laparoscopic Sleeve Gastrectomy (LSG) 15 , 16 ; Laparoscopic Adjustable Gastric Band (LAGB) 18 . The subgroup analysis for outcomes separated studies in RYGB, LSG and other types of surgery. The non-surgical treatment for obesity included one or the combination of the following medications: Orlistat, Phentermine, Naltrexone, Bupropion, Liraglutide, Lorcaserin, Sibutramine.

Risk of bias

After reading the articles included in the systematic review, the following elements were analyzed to determine the level of evidence: study design and selection, detection, loss, reporting and information bias. The summary of the risk of bias analysis for each of the included articles is presented in Fig.  2

figure 2

Risk of bias analysis.

All studies had a low risk of bias for most criteria. In three of the studies, assessors were aware of the intervention received by study participants or the information was not available 16 , 17 , 20 . Three other studies 15 , 18 , 19 had bias regarding deviations from intended interventions due to the fact that an appropriate analysis to estimate the effects of assignment to intervention was not performed 15 ; patients assigned to the control group crossed over to the intervention group, and no measures were reportedly taken to balance that deviation 19 ; there was a significant loss of follow-up for all groups 20 .

All six studies had data on weight loss after treatment. Mean difference values and their respective 95% confidence intervals (95% CI) were calculated. In Fig.  3 A, the forest plot is shown. All publications found that surgical procedures were more efficient for long term weight loss. The global MD value was − 22.1 kg (95% CI [− 28.9; − 15.2). The measure of heterogeneity I2 (Higgins heterogeneity measure) was 77.8%, a value considered as high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).The subgroup analysis showed that there was not a significant difference between the types of surgery ( p  = 0.30).

figure 3

(A ) Weight outcomes; ( B ) Waist circumference outcomes; ( C ) BMI outcomes.

Waist circumference

Four studies had data on waist circumference 16 , 17 , 19 , 20 . In Fig.  3 B, the forest plot is shown. Patients treated with surgery had a mean difference of − 12.3 (95% CI [− 15.0; − 9.6]) compared to IMT. The measure of heterogeneity I2 (Higgins heterogeneity measure) was 0%, a value considered as low heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did not allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.99).

The subgroup analysis showed that there was not a significant difference between the types of surgery ( p  = 0.99).

Five studies had data on BMI 16 , 17 , 18 , 19 , 20 . In Fig.  3 C, the forest plot is shown. Patients treated with surgery had a mean difference of − 8.0 (95% CI [− 10.5; − 5.5]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 84%, a value considered high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was a significant difference between the types of surgery ( p  = 0.01). The group with LAGB and BPD surgery had the highest decrease in BMI, with a mean of − 10.0.

Triglycerides

Three studies had data on tryglycerides 17 , 19 , 20 . In Fig.  4 A, the forest plot is shown. Patients treated with surgery had a mean difference of − 0.7 (95% CI [− 0.8; − 0.6]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 50.4%, a value considered high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did not allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.08).

figure 4

(A ) Triglycerides outcomes; ( B ) LDL outcomes; ( C ) HDL outcome; ( D ) Cholesterol outcomes.

The subgroup analysis showed that there was a significant difference between the types of surgery ( p  = 0.01), with a worse outcome for RYGB.

Four studies had data on LDL 16 , 17 , 19 , 20 . In Fig.  4 B, the forest plot is shown. Patients treated with surgery had a mean difference of − 0.5 (95% CI [− 1.0; 0.0]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 92.7%, a value considered high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was a significant difference between the types of surgery ( p  = 0.01). There was an increase of 0.5 in LDL for the LSG group. The group with LAGB and BPD surgery had the highest decrease in LDL, with a mean of − 1.3.

Four studies had data on HDL 16 , 17 , 19 , 20 . In Fig.  4 C, the forest plot is shown. Patients treated with surgery had a mean difference of 0.1 (95% CI [0.0; 0.2]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 90.5%, a value considered high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was a significant difference between the types of surgery ( p  = 0.01). The group with RYGB surgery had the highest significant increase in HDL, with a mean of 0.2.

Cholesterol

Three studies had data on cholesterol 17 , 19 , 20 . In Fig.  4 D, the forest plot is shown. Patients treated with surgery had a mean difference of − 0.9 (95% CI [− 1.6; − 0.2]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 94.8%, a value considered as high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was a significant difference between the types of surgery ( p  = 0.01). The group with LAGB and BPD surgery had the highest decrease in cholesterol, with a mean of − 1.7.

Cardiovascular risk

Two studies had data on cardiovascular risk 17 , 19 . In Fig.  5 A, the forest plot is shown. Patients treated with surgery had a mean difference of − 0.08 (95% CI [− 0.10; − 0.05]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 0%, a value considered as low heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did not allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.44).

figure 5

(A ) Cardiovascular risk outcomes; ( B ) Systolic blood pressure outcomes; ( C ) Diastolic blood pressure outcomes; ( D ) HOMA outcomes; ( E ) Glycated Hemoglobin outcomes.

The subgroup analysis showed that there was no significant difference between the types of surgery ( p  = 0.36).

Systolic blood pressure

Four studies had data on systolic blood pressure 16 , 17 , 19 , 20 . In Fig.  5 B, the forest plot is shown. Patients treated with surgery had a mean difference of − 4.49 (95% CI [− 7.65; − 1.33]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 71%, a value considered as high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was not a significant difference between the types of surgery ( p  = 0.79).

Diastolic blood pressure

Four studies had data on diastolic blood pressure 16 , 17 , 19 , 20 . In Fig.  5 C, the forest plot is shown. Patients treated with surgery had a mean difference of − 2.28 (95% CI [− 4.25; − 0.31]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 60.5%, a value considered as high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was not a significant difference between the types of surgery ( p  = 0.66).

Three studies had data on HOMA 15 , 17 , 19 . In Fig.  5 D, the forest plot is shown. Patients treated with surgery had a mean difference of − 2.94 (95% CI [− 3.52; − 2.35]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 14%, a value considered as low heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did not allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.32).

The subgroup analysis showed that there was no significant difference between the types of surgery ( p  = 0.33).

Glycated Hemoglobin

Five studies had data on glycated haemoglobin 15 , 16 , 17 , 19 , 20 . In Fig.  5 E, the forest plot is shown. Patients treated with surgery had a mean difference of − 1.0(95% CI [− 1.3; − 0.6]) compared to IMT. The measure of heterogeneity I2 (Higgins’s heterogeneity measure) was 79.8%, a value considered as high heterogeneity. According to Cochran’s Q heterogeneity test, the sample evidence did allow us to reject the null hypothesis of non-heterogeneity ( p  = 0.01).

The subgroup analysis showed that there was no significant difference between the types of surgery ( p  = 0.98).

Obesity is defined as a BMI greater than or equal to 30 by the CDC and is currently among the most prevalent diseases in the world, in addition to being an important risk factor for many other diseases. It has high rates of morbidity and mortality 21 , 22 and, in this context, weight loss can bring countless positive impacts to the individual. Currently, there are several treatments for obesity, and we can divide them into non-surgical or surgical.

Non-surgical treatments include non-drug and drug treatments. Among the non-medicated, we can highlight the change in eating habits, regular physical exercise, and cognitive behavioral therapy 8 . Ideally, these measures should be implemented for all patients living with obesity, even for those who will undergo drug or surgical treatment. Recently, in addition to lifestyle change, neuromodulation with deep transcranial stimulation has also been studied and has shown effectiveness in weight loss reduction 23 .

A systematic review carried out in 2021, which analyzed 64 articles concluded that among the most effective non-surgical interventions are low-carbohydrate or low-fat diets and combined therapies. This study also showed that non-drug interventions, such as physical exercise, when used alone, are not very effective in reducing the weight of these patients Therefore, a combination of two or more therapies should be chosen 24 .

Pharmacological treatment must be chosen together with the patient. One or more drugs can be used, the main ones used being: Liraglutide, Semaglutide, Tirzepatide, Orlistat, Phentermine and Sibutramine 25 .

Liraglutide was recently approved for the treatment of obesity and is now one of the most widely used drugs. It acts as a GLP-1 receptor agonist 26 , 27 , 28 , enhancing its effects. This group of drugs is already known in the treatment of Type 2 Diabetes Mellitus, a condition that can often be associated with obesity 29 , 30 , since its pathophysiology involves increased insulin resistance. The main actions of this drug are: increased satiety due to a reduction in the speed of gastric emptying, increased insulin release and decreased glucagon release. Semaglutide is a drug with a similar mechanism of action who demonstrated not only a substantial weight loss 31 , but was also associated with a lower 10-year T2D risk in people with overweight or obesity after 2 years of follow up 32 . More recently, a new drug that combines GLP-1 and GIP receptor agonist, Tirzepatide, has shown even better results in the short term 33 .

Orlistat, in turn, reversibly inhibits the lipase enzyme 34 , which has the function of breaking down fat from food for its absorption, as well as inhibiting the absorption of ingested triglycerides. Thus, there is elimination of fat in the feces 35 . The main adverse effects are gastrointestinal symptoms, however this can be beneficial as it leads to a change in behavior, for example causing a lower consumption of foods rich in fat 36 .

Phentermine, an amphetamine analogue, can be used in conjunction with topiramate for the treatment of obesity. The mechanism of action of the drugs is not yet known, however, significant weight loss has already been observed, in addition to a reduction in the consumption of hypercaloric foods and a decrease in the speed of gastric emptying with the use of this combination of drugs 37 , 38 .

Sibutramine, widely used in the 1990s, acts to inhibit the reuptake of serotonin, norepinephrine, and dopamine 34 . Serotonin, in turn, activates POMC system neurons and inhibits NPY neurons, thereby promoting reduced appetite and increased satiety. Despite generating weight reduction 39 , some data show increased cardiovascular risk 40 , and therefore, it is no longer used as a first-line drug.

Among the possible surgeries, the most performed today are: Roux-en-Y Gastric Bypass (RYGB), Biliopancreatic diversion (BPD), Laparoscopic Sleeve Gastrectomy (LSG) and Laparoscopic Adjustable Gastric Band (LAGB). According to the NIH and the American Bariatric Society 41 , 42 , some indications for performing bariatric surgery are adults with BMI greater than or equal to 40 and adults with BMI greater than 35 accompanied by some comorbidity such as type 2 diabetes mellitus, obstructive sleep apnea or hypertension.

RYGB is one of the best-known procedures and its complications vary according to the surgical technique used. Some complications include gastric distention, ulcers, cholelithiasis, hernias, dumping syndrome, and hyperammonaemia encephalopathy.

BPD presents long-term nutritional complications, such as anemia, bone diseases and fat-soluble vitamin deficiency. This technique has high mortality rates, mainly due to the complexity of the technique.

Among the procedures described, LSG is the one with the fewest complications, being described in the literature bleeding or stenosis of the stoma. An alternative technique using endoscopy for sleeve gastroplasty has shown to be safe and efficient for weight loss after 104 weeks, with important improvements in metabolic comorbidities 43 .

The procedure with the lowest mortality rate is the LAGB 44 . Despite this, it can present complications such as obstruction, band erosion, band slippage and gastric prolapse, esophagitis, hernia, in addition to having a high rate of reoperation, reaching 50% of patients who underwent this surgery 45 .

In this article, we compare data on weight loss through intensive drug treatment, which includes changes in eating habits, physical exercise, and medications, and through surgical treatment. Both treatments showed that weight loss caused an improvement in the lipid panel, with a reduction in total cholesterol, triglycerides and LDL, an increase in HDL, improvement in systolic and diastolic blood pressure, decrease in glycated hemoglobin and insulin resistance (accessed through HOMA), in addition to reducing the risk for cardiovascular diseases.

Our systematic review confirmed the findings of individual studies that bariatric surgery has a greater potential for weight reduction, BMI and waist circumference, as already described in individual articles and widely in the literature. It should be noted that even in the long term, this difference remained. Similarly, a 2014 Cochrane systematic review 46 comparing RCT with more than 1 year of follow-up showed that all 7 articles included demonstrated an advantage of the surgical group. An article 47 on the use of pharmacological treatment for obesity showed that even recent drugs approved, including GLP 1 agonists, are not able to reduce weight to levels similar to those of bariatric surgery to date, despite the emergence of new drugs still in initial phase 48 . It is worth mentioning that in these studies the comparison time is relatively short (12 months) and that we do not have data on the long-term impact. Thus, in relation to long term weight loss, bariatric surgery is still the best option.

Most articles were not able to individually demonstrate that surgical treatment is superior to non-surgical in terms of pressure reduction. However, the result of the meta-analysis showed a superiority of the surgical group in relation to both systolic and diastolic pressure, more pronounced in the BPD group. Wang 49 performed a systematic review focused on the impact on pressure and demonstrated that there was a reduction in systolic and diastolic values, but the subgroup analysis showed that this occurs only in the RYGB groups for systolic pressure. Similarly, Schiavon also demonstrated a significant reduction in the need of blood pressure medication after 3 years in the RYGB group when compared intensive medical treatment for obesity 50 . This difference found in only one subtype of surgery seems to be just a reflection of the sample size, which can be interpreted that surgical treatment in general tends to reduce pressure to a greater extent than non-surgical treatment. The fact that different types of surgery are significant may reflect the studies selected in our meta-analysis, which have longer follow-ups.

In relation to both HOMA-IR and glycated Hb, there was a more significant improvement in the group that underwent surgery. The way in which the data on diabetes remission was reported in the articles did not allow a meta-analysis to be carried out with these data and, therefore, it was not included. However, individual data from the Mingrone 2015, Mingrone 2021 and Schauer articles showed that the surgery group had better results. A network meta-analysis from 2021 51 comparing the different types of metabolic surgery for the treatment of obesity and diabetes showed that RYGB was 20% more likely to result in remission of type 2 diabetes compared to SG. There was no significant difference between the other groups. Moreover, the effects of bariatric surgery on diabetes is not exclusive for patients with obesity, as shown by a study with patients with a BMI of 27–32 kg/m 2 that had a better glycemic control when treated with RYGB 20 . Regarding the lipid profile, Schauer's study was not able to demonstrate superiority in relation to LDL and HDL parameters. However, by combining the data from Mingrone's articles, it is possible to demonstrate that surgical treatment is superior. Regarding cholesterol reduction, Mingrone's studies showed that although RYGB and BDP were better in relation to non-surgical treatment, the BDP technique had a statistically greater reduction in relation to RYGB. This can be explained by the greater intestinal exclusion in BDP and, therefore, having a greater impact on lipid absorption. Despite Sayeed's study 52 et al. was not included in this meta-analysis due to the inadequate way of separating the groups for analysis, the results regarding the lipid profile showed that the group that received both interventions was superior to the exclusive non-surgical treatment. It is important to point out that despite a statistically significant difference between the groups, the effect size of this difference is probably not clinically significant.

The choice of treatment for obesity can also have an impact on several other patient comorbidities. Hossain et al. 53 performed a systematic review with 26 studies that showed that bariatric surgery appears to be more effective in the treatment of asthma. Similarly, a study by Crawford et al. 15 showed that there is a greater increase in bone turnover in groups undergoing bariatric surgery in relation to pharmacological treatment. Other than that, bariatric surgery is also demonstrated to be superior in the treatment of other obesity related pathologies, such as Non-Alcoholic Steatohepatitis (NASH), and in the treatment of obesity in adolescents 54 , 55 .

The effect of major cardiovascular adverse events (MACE) and mortality 56 have also been promising for bariatric surgery. A recent cohort comparing bariatric surgery in patients with obesity and use of GLP1-agonists inpatients with diabetes showed a lower risk of MACE in the surgical group 57 . The surgical treatment has also shown superiority when compared to medical treatment regarding the prevention of diabetic kidney disease in 5 years for patients with diabetes and obesity 58 . Boyers et al. evaluated the cost-effectiveness of surgical and pharmacological treatment in the treatment of obesity and found that RYGB should be the treatment of choice only if the optimization of health system costs is considered 59 .

Another important consideration is the fact that pharmacological and surgical treatment for obesity are not mutually exclusive. Most clinicians choose to combine both treatment modalities in practice to improve results. Weight gain after bariatric surgery is a known possibility, and for those patients, two-thirds of the weight regain can be safely lost with GLP1 agonist, providing clinicians with a therapeutic option for this clinical challenge.

Methodologies and limitations of the studies

Despite the large number of articles in the literature on the treatment of obesity, there are few RCTs comparing non-surgical and surgical treatment, and most of them only follow up in the short term. In addition, many articles do not adequately describe the strategy used in non-surgical treatment. This lack of data and standardization in this type of treatment can lead to bias and possibly the formation of extremely heterogeneous groups for analysis.

Most of the studies included in our systematic review have diabetes as an inclusion criteria. In this circumstance, our findings may not be generalized to patients with obesity without diabetes.

Another important limitation of our systematic review refers to pharmacological treatment in the non-surgical group. The use of GLP 1 agonists has great potential in the treatment of obesity, but they have only started to be used recently. As the purpose of our article is to assess the long-term impact, there are still few articles available that used this drug. The use of the most recent medications, such as Tirzepatide, could not be evaluated in our study, once there are no RCTs in the literature presenting its long-term effects. Those drugs proved to be very efficient and might have similar effect in the long term. Future systematic reviews may reveal a different results when including the new generation of weight loss medication.

Finally, choosing the most appropriate treatment often involves individual characteristics of each patient, and the impact on quality of life can be extremely subjective and difficult to assess.

Obesity is a disease that increases the morbidity and mortality of patients, contributing to several secondary diseases. This systematic review evaluated the impact on the main variables related to obesity in the long term. The findings indicated that both treatment modalities are efficacious in managing obesity; however, the surgical group demonstrated superior outcomes in comparison to the non-surgical group across most variables. Nonetheless, the advent of novel pharmacological treatments has shown promising potential. Further studies focusing on the long-term impacts of these new drug treatments should be undertaken to allow for a comprehensive comparison with non-surgical treatment methods.

Data availability

Data is provided within the manuscript or supplementary information files.

Bray, G. The Battle of the Bulge: A History of Obesity Research (Dorrance Pub., 2007).

Collaborators GBD 2015 O, Afshin, A., Forouzanfar, M. H., Reitsma, M. B., Sur, P., Estep, K. et al. Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med. 377 , 13–27 (2017).

Whitlock, G. et al. Body-mass index and cause-specific mortality in 900 000 adults: Collaborative analyses of 57 prospective studies. Lancet 373 , 1083–1096 (2009).

Article   PubMed   Google Scholar  

Steele, C. B. et al. Vital Signs: Trends in incidence of cancers associated with overweight and obesity—United States, 2005–2014. MMWR Morb. Mortal. Wkly. Rep. 66 , 1052–1058 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Goldhaber, S. Z. et al. Risk factors for pulmonary embolism. The Framingham Study. Am. J. Med. 74 , 1023–1028 (1983).

Article   CAS   PubMed   Google Scholar  

Knowler, W. C. et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346 , 393–403 (2002).

Bray, G. A., Frühbeck, G., Ryan, D. H. & Wilding, J. P. H. Management of obesity. The Lancet 387 , 1947–1956 (2016).

Article   Google Scholar  

Perdomo, C. M., Cohen, R. V., Sumithran, P., Clément, K. & Frühbeck, G. Contemporary medical, device, and surgical therapies for obesity in adults. The Lancet 401 , 1116–1130 (2023).

Updike, W. H. et al. Is it time to expand glucagon-like peptide-1 receptor agonist use for weight loss in patients without diabetes?. Drugs 81 , 881–893 (2021).

Moher, D. et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst. Rev. 4 , 1 (2015).

Shea, B. J. et al. AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 358 , j4008 (2017).

Brown, D. A review of the PubMed PICO Tool: using evidence-based practice in health education. Health Promot. Pract. 21 , 496–498 (2020).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—A web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Leeflang, M. M. G., Deeks, J. J., Takwoingi, Y. & Macaskill, P. Cochrane diagnostic test accuracy reviews. Syst. Rev. 2 , 82 (2013).

Crawford, M. R. et al. Increased bone turnover in type 2 diabetes patients randomized to bariatric surgery versus medical therapy at 5 years. Endocr. Pract. 24 , 256–264 (2018).

Schauer, P. R. et al. Bariatric surgery versus intensive medical therapy for diabetes—5-Year outcomes. N. Engl. J. Med. 376 , 641–651 (2017).

Mingrone, G. et al. Metabolic surgery versus conventional medical therapy in patients with type 2 diabetes: 10-Year follow-up of an open-label, single-centre, randomised controlled trial. The Lancet 397 , 293–304 (2021).

O’Brien, P. E., Brennan, L., Laurie, C. & Brown, W. Intensive medical weight loss or laparoscopic adjustable gastric banding in the treatment of mild to moderate obesity: Long-term follow-up of a prospective randomised trial. Obes. Surg. 23 , 1345–1353 (2013).

Mingrone, G. et al. Bariatric-metabolic surgery versus conventional medical treatment in obese patients with type 2 diabetes: 5 Year follow-up of an open-label, single-centre, randomised controlled trial. The Lancet 386 , 964–973 (2015).

Cheng, A. et al. Roux-en-Y gastric bypass versus best medical treatment for type 2 diabetes mellitus in adults with body mass index between 27 and 32 kg/m 2 : A 5-year randomized controlled trial. Diabetes Res. Clin. Pract. 188 , 109900 (2022).

Chooi, Y. C., Ding, C. & Magkos, F. The epidemiology of obesity. Metabolism 92 , 6–10 (2019).

Christensen, S. Recognizing obesity as a disease. J. Am. Assoc. Nurse Pract. 32 , 497–503 (2020).

Ferrulli, A. et al. Weight loss induced by deep transcranial magnetic stimulation in obesity: A randomized, double-blind, sham-controlled study. Diabetes Obes. Metab. 21 , 1849–1860 (2019).

Twells, L. K. et al. Nonsurgical weight loss interventions: A systematic review of systematic reviews and meta-analyses. Obes. Rev. 22 , e13320 (2021).

Rosa-Gonçalves, P. & Majerowicz, D. Pharmacotherapy of obesity: Limits and perspectives. Am. J. Cardiovasc. Drugs 19 , 349–364 (2019).

Pi-Sunyer, X. et al. A randomized, controlled trial of 3.0 mg of liraglutide in weight management. N. Engl. J. Med. 373 , 11–22 (2015).

Knudsen, L. B. & Lau, J. The discovery and development of liraglutide and semaglutide. Front. Endocrinol. (Lausanne) 10 , 155 (2019).

de Oca alejandra PZMTS, PelliTero S, PUig-DoMingo M. obesity and glP-1. Minerva Endocrinology 46 , 168–176 (2021).

Kahn, S. E., Hull, R. L. & Utzschneider, K. M. Mechanisms linking obesity to insulin resistance and type 2 diabetes. Nature 444 , 840–846 (2006).

Article   ADS   CAS   PubMed   Google Scholar  

Rubio-Almanza, M., Cámara-Gómez, R. & Merino-Torres, J. F. Obesity and type 2 diabetes: Also linked in therapeutic options. Endocrinol. Diabetes Nutr. 66 , 140–149 (2019).

Wharton, S. et al. Two-year effect of semaglutide 2.4 mg on control of eating in adults with overweight/obesity: STEP 5. Obesity 31 , 703–715 (2023).

Wilkinson, L. et al. Effect of semaglutide 2.4 mg once weekly on 10-year type 2 diabetes risk in adults with overweight or obesity. Obesity 31 , 2249–2259 (2023).

Frías, J. P. et al. Tirzepatide versus semaglutide once weekly in patients with type 2 diabetes. N. Engl. J. Med. 385 , 503–515 (2021).

Son, J. W. & Kim, S. Comprehensive review of current and upcoming anti-obesity drugs. Diabetes Metab. J. 44 , 802–818 (2020).

Ballinger, A. & Peikin, S. R. Orlistat: Its current status as an anti-obesity drug. Eur. J. Pharmacol. 440 , 109–117 (2002).

Zhou, Y. H. et al. Effect of anti-obesity drug on cardiovascular risk factors: A systematic review and meta-analysis of randomized controlled trials. PLoS One 7 , e39062 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Cosentino, G., Conrad, A. O. & Uwaifo, G. I. Phentermine and topiramate for the management of obesity: A review. Drug Des. Dev. Ther. 7 , 267–278 (2013).

CAS   Google Scholar  

Smith, S. M., Meyer, M. & Trinkley, K. E. Fentermina/topiramato (qsymia) para el tratamiento de obesidad. Ann. Pharmacother. 47 , 340–349 (2013).

Sharma, B. & Henderson, D. C. Sibutramine: Current status as an anti-obesity drug and its future perspectives. Expert Opin. Pharmacother. 9 , 2161–2173 (2008).

Tziomalos, K., Krassas, G. E. & Tzotzas, T. The use of sibutramine in the management of obesity and related disorders: An update. Vasc. Health Risk Manag. 5 , 441–452 (2009).

CAS   PubMed   PubMed Central   Google Scholar  

Burguera, B. et al. Critical assessment of the current guidelines for the management and treatment of morbidly obese patients. J. Endocrinol. Invest. 30 , 844–852 (2007).

Grundy, S. M. et al. Gastrointestinal surgery for severe obesity. Ann. Intern. Med. 115 , 956–961 (1991).

Abu Dayyeh, B. K. et al. Endoscopic sleeve gastroplasty for treatment of class 1 and 2 obesity (MERIT): A prospective, multicentre, randomised trial. The Lancet 400 , 441–451 (2022).

Chapman, A. E. et al. Laparoscopic adjustable gastric banding in the treatment of obesity: A systematic literature review. Surgery 135 , 326–351 (2004).

Himpens, J. et al. Long-term outcomes of laparoscopic adjustable gastric banding. Arch. Surg. 146 , 802–807 (2011).

Colquitt, J. L., Pickett, K., Loveman, E. & Frampton, G. K. Surgery for weight loss in adults. Cochrane Database Syst. Rev. 2014 , CD003641 (2014).

Cotugno, M. et al. Clinical efficacy of bariatric surgery versus liraglutide in patients with type 2 diabetes and severe obesity: A 12-month retrospective evaluation. Acta Diabetol. 52 , 331–336 (2014).

Tan, Q. et al. Recent advances in incretin-based pharmacotherapies for the treatment of obesity and diabetes. Front. Endocrinol. (Lausanne) https://doi.org/10.3389/fendo.2022.838410 (2022).

Wang, L. et al. The impact of bariatric surgery versus non-surgical treatment on blood pressure: Systematic review and meta-analysis. Obes. Surg. 31 , 4970–4984 (2021).

Schiavon, C. A. et al. Three-year outcomes of bariatric surgery in patients with obesity and hypertension. Ann. Intern. Med. 173 , 685–693 (2020).

Currie, A. C., Askari, A., Fangueiro, A. & Mahawar, K. Network meta-analysis of metabolic surgery procedures for the treatment of obesity and diabetes. Obes. Surg. 31 , 4528–4541 (2021).

Ikramuddin, S. et al. Lifestyle intervention and medical management with vs without roux-en-y gastric bypass and control of hemoglobin a1c, ldl cholesterol, and systolic blood pressure at 5 years in the diabetes surgery study. JAMA J. Am. Med. Assoc. 319 , 266–278 (2018).

Hossain, N., Arhi, C. & Borg, C. M. Is bariatric surgery better than nonsurgical weight loss for improving asthma control? A systematic review. Obes. Surg. 31 , 1810–1832 (2021).

Järvholm, K. et al. Metabolic and bariatric surgery versus intensive non-surgical treatment for adolescents with severe obesity (AMOS2): A multicentre, randomised, controlled trial in Sweden. Lancet Child Adolesc. Health 7 , 249–260 (2023).

Verrastro, O. et al. Bariatric–metabolic surgery versus lifestyle intervention plus best medical care in non-alcoholic steatohepatitis (BRAVES): A multicentre, open-label, randomised trial. The Lancet 401 , 1786–1797 (2023).

Courcoulas, A. P. et al. Reduction in long-term mortality after sleeve gastrectomy and gastric bypass compared to nonsurgical patients with severe obesity. Ann. Surg. 277 , 442–448 (2023).

Stenberg, E. & Näslund, E. Major adverse cardiovascular events among patients with type-2 diabetes, a nationwide cohort study comparing primary metabolic and bariatric surgery to GLP-1 receptor agonist treatment. Int. J. Obes. 47 , 251–256 (2023).

Article   CAS   Google Scholar  

Bjornstad, P. et al. Effect of surgical versus medical therapy on diabetic kidney disease over 5 years in severely obese adolescents with type 2 diabetes. Diabetes Care 43 , 187–195 (2020).

Boyers, D. et al. Cost-effectiveness of bariatric surgery and non-surgical weight management programmes for adults with severe obesity: A decision analysis model. Int. J. Obes. 45 , 2179–2190 (2021).

Download references

Acknowledgements

The authors are thankful to Justin Axel-Berg for the English corrections and Rossana V. Mendoza López for the statistical analysis.

Author information

Authors and affiliations.

Department of Neurology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de Sao Paulo, São Paulo, SP, Brazil

Leonardo Zumerkorn Pipek

Faculty of Medicine FMUSP, University of São Paulo, São Paulo, Brazil

Walter Augusto Fabio Moraes, Rodrigo Massato Nobetani, Vitor Santos Cortez, Alberto Santos Condi, João Victor Taba, Milena Oliveira Suzuki, Fernanda Sayuri do Nascimento & Vitoria Carneiro de Mattos

Centro Universitário FMABC, Santo André, São Paulo, Brazil

Rafaela Farias Vidigal Nascimento

Center of Acupuncture, Department of Orthopaedics and Traumatology, University of São Paulo, São Paulo, Brazil

Leandro Ryuchi Iuamoto & Wu Tu Hsing

Department of Gastroenterology, Hospital das Clínicas, HCFMUSP, Avenida Doutor Arnaldo, 455, São Paulo, Brazil

Luiz Augusto Carneiro-D’Albuquerque, Alberto Meyer & Wellington Andraus

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: L.Z.P., A.M. Methodology: L.Z.P., L.R.I., A.M. Formal analysis: L.Z.P., R.F.V.N., A.M. Data Curation: L.Z.P., W.A.F.B., R.M.N., V.S.C., A.S.C., J.V.T., R.F.V.N., M.O.S., F.S.N., V.C.M. Writing—Original Draft: L.Z.P., W.A.F.B., R.M.N., V.S.C., A.S.C., J.V.T., R.F.V.N., M.O.S., F.S.N., V.C.M. Writing—Review and Editing: L.Z.P., R.F.V.N., L.R.I., W.T.H., L.A.C., A.M., W.A. Visualization: L.Z.P., R.F.V.N., A.M. Supervision: L.R.I., W.T.H., L.A.C., A.M., W.A. Project administration: L.R.I., W.T.H., L.A.C., A.M., W.A.

Corresponding author

Correspondence to Alberto Meyer .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Pipek, L.Z., Moraes, W.A.F., Nobetani, R.M. et al. Surgery is associated with better long-term outcomes than pharmacological treatment for obesity: a systematic review and meta-analysis. Sci Rep 14 , 9521 (2024). https://doi.org/10.1038/s41598-024-57724-5

Download citation

Received : 19 November 2023

Accepted : 21 March 2024

Published : 25 April 2024

DOI : https://doi.org/10.1038/s41598-024-57724-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Long term outcome
  • Pharmacological treatment

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

a meta analysis literature review

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Study Protocol

Frequency, complications, and mortality of inhalation injury in burn patients: A systematic review and meta-analysis protocol

Roles Conceptualization, Project administration, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Culdade de Ciências de Saúde - Universidade de Brasília-UnB, Programa de Pós-Graduação em Ciências da Saúde, FaBrasilia (DF), Brazil

ORCID logo

Roles Writing – review & editing

Affiliation Programa de Pós-Graduação em Ciências da Saúde, Escola Superior de Ciências da Saúde (ESCS), Brasilia (DF), Brazil

Roles Investigation, Writing – review & editing

Affiliation Programa de Pós-Graduação em Ciências da Saúde, Coordenação de Cursos Pós-Graduação Stricto Sensu, Escola Superior de Ciências da Saúde (ESCS), Brasilia (DF), Brazil

Affiliation Universidade de Brasília, Brasilia (DF), Brazil and Programa de Pós Graduação em Ciências do Movimento Humano e Reabilitação, Universidade Evangélica de Goiás, Goiás, Brazil

Roles Conceptualization, Data curation, Writing – review & editing

Affiliation Radiology Professor of Universidade de Ribeirão Preto, Campus Guarujá, Guarujá-SP, Brazil

Roles Data curation, Writing – review & editing

Roles Conceptualization, Methodology, Project administration, Writing – review & editing

Affiliation Programa de Pós-Graduação em Ciências da Saúde, Coordenação de Pesquisa e Comunicação Científica, Escola Superior de Ciências da Saúde (ESCS), Brasilia (DF), Brazil

  • Juliana Elvira Herdy Guerra Avila, 
  • Levy Aniceto Santana, 
  • Denise Rabelo Suzuki, 
  • Vinícius Zacarias Maldaner da Silva, 
  • Marcio Luís Duarte, 
  • Aline Mizusaki Imoto, 
  • Fábio Ferreira Amorim

PLOS

  • Published: April 23, 2024
  • https://doi.org/10.1371/journal.pone.0295318
  • Peer Review
  • Reader Comments

Table 1

Introduction

Burns are tissue traumas caused by energy transfer and occur with a variable inflammatory response. The consequences of burns represent a public health problem worldwide. Inhalation injury (II) is a severity factor when associated with burn, leading to a worse prognosis. Its treatment is complex and often involves invasive mechanical ventilation (IMV). The primary purpose of this study will be to assess the evidence regarding the frequency and mortality of II in burn patients. The secondary purposes will be to assess the evidence regarding the association between IIs and respiratory complications (pneumonia, airway obstruction, acute respiratory failure, acute respiratory distress syndrome), need for IMV and complications in other organ systems, and highlight factors associated with IIs in burn patients and prognostic factors associated with acute respiratory failure, need for IMV and mortality of II in burn patients.

This is a systematic literature review and meta-analysis, according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA). PubMed/MEDLINE, Embase, LILACS/VHL, Scopus, Web of Science, and CINAHL databases will be consulted without language restrictions and publication date. Studies presenting incomplete data and patients under 19 years of age will be excluded. Data will be synthesized through continuous (mean and standard deviation) and dichotomous (relative risk) variables and the total number of participants. The means, sample sizes, standard deviations from the mean, and relative risks will be entered into the Review Manager web analysis software (The Cochrane Collaboration).

Despite the extensive experience managing IIs in burn patients, they still represent an important cause of morbidity and mortality. Diagnosis and accurate measurement of its damage are complex, and therapies are essentially based on supportive measures. Considering the challenge, their impact, and their potential severity, IIs represent a promising area for research, needing further studies to understand and contribute to its better evolution.

The protocol of this review is registered on the International prospective register of systematic reviews platform of the Center for Revisions and Disclosure of the University of York, United Kingdom ( https://www.crd.york.ac.uk/prospero ), under number RD42022343944.

Citation: Herdy Guerra Avila JE, Aniceto Santana L, Rabelo Suzuki D, Maldaner da Silva VZ, Duarte ML, Mizusaki Imoto A, et al. (2024) Frequency, complications, and mortality of inhalation injury in burn patients: A systematic review and meta-analysis protocol. PLoS ONE 19(4): e0295318. https://doi.org/10.1371/journal.pone.0295318

Editor: Mohamed Boussarsar, Centre Hospitalier Universitaire Farhat Hached de Sousse, TUNISIA

Received: July 19, 2023; Accepted: November 19, 2023; Published: April 23, 2024

Copyright: © 2024 Herdy Guerra Avila et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The identified research data will be made publicly available when the study is completed and published.

Funding: The authors received funding from Fundação de Ensino e Pesquisa em Ciências da Saúde - FEPECS, Address: SMHN 03 - conjunto A - bloco 1 - Edifício FEPECS CEP: 70701-907.

Competing interests: The authors have declared that no competing interests exist.

Burns are tissue traumas caused by energy transfer (thermal, chemical, electrical, radiation) [ 1 , 2 ] and occur with variable local and systemic inflammatory responses according to the intensity, location, and the affected area depth [ 3 ]. Due to the severity of their conditions, most patients require treatment in specialized units with intensive support and monitoring [ 4 ]. The consequences of burns represent a public health problem, ranging from physical incapacity and psychological and social damage to death [ 4 ]. As per a World Health Organization Fact Sheet dated October 2023, there are over 11,000,000 cases worldwide annually, resulting in 180,000 deaths [ 5 ]. Only in Brazil, from 2015 to 2020, there were 19,772 deaths from burns, as delineated by data provided by the Brazilian Department of Health [ 6 ]. According to US statistics from the National Inpatient Sample and the National Burn Repository, 40,000 hospitalizations are estimated yearly due to burns in the United States, with about 5% presenting inhalation injuries (IIs) [ 6 , 7 ]. Approximately 33% of all burn patients will require invasive mechanical ventilation (IMV), which increases significantly with II [ 8 ].

The diagnosis of respiratory system involvement is essentially clinical and can be complemented by bronchoscopy and other radiological and laboratory tests [ 9 ]. In ideal conditions, bronchoscopy should be performed in the first 24 hours in all patients with a history of smoke inhalation and is considered the gold standard for this type of evaluation [ 10 , 11 ]. When present, IIs significantly impact patient outcomes, increasing fluid needs during resuscitation, pulmonary complications, and mortality [ 12 – 14 ], serving as a marker of severity and an independent risk factor for death [ 13 , 14 ], especially in patients with over 20% of body surface area burned [ 15 , 16 ]. Besides, contrary to the recent advancements in the treatment of cutaneous burn injuries, the complex treatment of the IIs remains a challenging frontier since the pathophysiology is not fully understood, the diagnostic criteria remain unclear, the interventions are often ineffective, and the mortality remains high [ 17 – 20 ].

The treatment of IIs is traditionally performed through respiratory support with 100% oxygen, hyperbaric oxygen therapy, and/or protective IMV [ 9 , 21 , 22 ]. However, questions regarding the best way to identify and classify respiratory tract involvement, whether all patients should be intubated and receive IMV, which IMV mode is best indicated, and issues related to systemic toxicity are essential points that must be better elucidated [ 21 ].

In this context, the primary purpose of this study will be to assess the evidence regarding the frequency and mortality of II in burn patients. The secondary purposes will be to assess the evidence regarding the association between IIs and respiratory complications (pneumonia, airway obstruction, acute respiratory failure, acute respiratory distress syndrome—ARDS), need for IMV and complications in other organ systems, and highlight factors associated with IIs in burn patients and prognostic factors associated with acute respiratory failure, need for IMV and mortality of II in burn patients.

Materials and methods

Study design.

This systematic literature review will be guided and reported according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) [ 23 ] ( S1 Checklist ). The protocol of this review is registered on the International prospective register of systematic reviews platform of the Center for Revisions and Disclosure of the University of York, United Kingdom ( https://www.crd.york.ac.uk/prospero ), under number RD42022343944.

Research question

The question guiding this study will be: what is the frequency and mortality of inhalation injuries in burn patients?

The PICOS criterion was followed, where:

  • P (population) = burn patients;
  • I (exposure) = smoke inhalation;
  • C (comparison/control) = no smoke inhalation;
  • O (outcomes): frequency, mortality, need for IMV, complications;
  • S (study design): observational studies, clinical trials

Inclusion criteria

Population of interest..

Adult patients of both sexes victims of acute burns regardless of magnitude or cause.

Exposure type.

Inhalation injury associated with the burn event. Inhalation injury will be defined as the damage inflicted to the respiratory tract or lung tissue from smoke, heat, and/or chemical irritants introduced into the airway during a burn event [ 24 ]. Although bronchoscopy may be performed to confirm the diagnosis of II and is considered the gold standard for this type of evaluation, the studies that applied only clinical criteria or used imaging or laboratory findings for II diagnosis will be included in the review [ 10 , 11 ].

Control group.

Adult patients victims of burns who have not been exposed to smoke inhalation.

Outcomes evaluated.

The primary outcomes will be:

Secondary outcomes will be:

  • Respiratory complications: pneumonia, airway obstruction, acute respiratory failure, and ARDS;
  • Need for IMV;
  • Complications in other organ systems;
  • Factors associated with IIs, need for IMV, complications in other organ systems, and mortality.

The ARDS Berlin definition will be used to diagnose the ARDS [ 25 ].

Type of study included.

Observational studies and clinical trials that evaluated the frequency and mortality of burn patients exposed to smoke inhalation.

Exclusion criteria

Studies in patients under 19 years of age will be excluded.

Studies presenting incomplete data, reviews, case series, case reports, and editorials will be excluded. Letters to the editor that do not report results from original data will also be excluded.

The inclusion and exclusion criteria are summarized in Table 1

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0295318.t001

Methods for identification of studies

The search for studies will be performed without language and publication date restrictions in the following databases:

  • LILACS/VHL;
  • Web of Science

Search strategy.

In the search, descriptors previously identified in DeCS (Descriptors in Health Sciences, http://decs.bvs.br/ ), MeSH (Medical Subject Headings https://www.nlm.nih.gov/mesh/meshhome.html , https://www.nlm.nih.gov/mesh/meshhome.html ), and Entree terms ( https://www.embase.com ) will be used, and their respective synonyms to include the largest number of relevant studies.

In this context, the search terms used will be:

  • (1) Burns, inhalation; inhalation burns; smoke inhalation injury; burn, inhalation; inhalation burn; smoke inhalation injury; inhalation injury, smoke; injury, smoke inhalation; inhalation injuries, smoke; injuries, smoke inhalation; smoke inhalation injuries; lung burn; queimaduras por inalação; quemaduras por inhalación; brûlures par inhalation; lesão por inalação de fumaça; smoke; lesión por inhalación de humo; lésion par inhalation de fumée;
  • (2) Epidemiology; epidemiology or incidence or prevalence or occurrence; social epidemiology; epidemiologies, social; epidemiology, social; social epidemiologies; epidemiologia; epidemiology; epidemiología; épidémiologie; epidemiologia social.

The complete search strategy for all databases is shown in Table 2 .

thumbnail

https://doi.org/10.1371/journal.pone.0295318.t002

In addition, grey literature reports will be sourced through simplified searches on Google Scholar and worldwidescience.org.

Finally, forward and backward reference searches will be performed to identify any other potential studies that might have been missed in the search process (backward and forward snowballing).

Selection and data analysis

Selection of studies and evaluation of methodological quality..

All references found by the searches will be organized using the Rayyan platform for Systematic Review ( https://rayyan.qcri.org/ ) used as a tool for removing duplicates, selecting, and screening studies. The data extraction from the selected studies, such as information from the participants and analyzed outcomes, will be performed manually using Microsoft Word.

Two reviewers will independently perform the studies’ selection (JA and DS in the authors’ list). The Rayyan platform provides an interface for each reviewer. Then, it indicates which studies showed disagreements in the analysis so that a third reviewer (AI in the authors’ list) can resolve them.

Initially, the title and abstract will be analyzed. Next, the third reviewer (AI in the authors’ list) will analyze the inclusion and exclusion disagreements about a particular study. Then, the texts will be fully evaluated, and the studies composing the review will be defined.

Again, for studies with a disagreement between the two main reviewers, the third reviewer (AI in the authors’ list) will resolve the conflicts. Studies not meeting the inclusion criteria will be excluded, and the reasons for this decision will be recorded.

The nonrandomized eligible studies will be included in the risk of bias assessment stage through the Newcastle Ottawa Scale Tool [ 26 ] and randomized controlled trials through the Cochrane Risk of Bias 2 (RoB 2) tool 2019 version [ 27 ].

The study selection process will be performed based on PRISMA Flow Diagram [ 23 ].

Data extraction process.

Data extraction will be performed according to criteria related to the following protocols:

  • general characteristics of the studies (author, year, title, journal, country and language of publication, study design);
  • information on participants (age, sex, ethnicity, diagnosis or specific characteristics, sample size);
  • exposure data (description of inhaled material, duration of exposure);
  • data on control;
  • characteristics of inhalation injuries;
  • data related to outcomes (frequency, mortality, need for IMV, development of complications in other organ systems).

Summary of results and statistical analysis.

Outcome scores after the intervention will be extracted from the included studies and collected using continuous (mean and standard deviation) and dichotomous (relative risk) data and the total number of participants. When numerical data are missing, the authors will be contacted via e-mail, requesting additional data for analysis.

Means, sample sizes, standard deviations from the mean, and relative risks will be entered into the Review Manager analysis software, version 5.3 (The Cochrane Collaboration), which will be used to quantify the results. Statistical significance will be defined as p < 0.05. Since the outcomes of interest will be evaluated with different scales and units, standardized measurements will be used to calculate the effect sizes, standard mean deviation, and 95% confidence intervals (95% CI).

For further comparisons concerning the extent of burn injury size, shock, or presence of infections, subgroup analysis will be performed if feasible.

Assessment of risk of bias

For this evaluation, the Newcastle Ottawa Scale tool will be used for the nonrandomized studies [ 26 ] and the RoB 2 (2019) version for the randomized controlled trials [ 27 ]. Two reviewers will independently evaluate the risk of bias of the included studies (JA and DS in the authors’ list). The third reviewer (AI in the authors’ list) will resolve the disagreements.

Quality of the evidence

For this evaluation, the criteria of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group will be used [ 28 ]. GRADE assesses the quality of the evidence based on the assessment of five domains: risk of bias, imprecision, inconsistency, indirectness, and publication bias [ 28 ]. Two reviewers will independently evaluate the quality of evidence of the included studies (JA and DS in the authors’ list). The third reviewer (AI in the authors’ list) will resolve the disagreements.

Ethical considerations

The research will be performed with information from studies published in electronic databases, respecting ethical principles at all stages. When processing the data collected, the principles of fidelity to the authors and respect for textual integrity will be protected.

The reviewers will not have any connection with the authors of the articles; therefore, there will be no conflicts of interest.

Inhalation injury is a frequent condition following burn injury, notably increasing the frequency with the rise of the burn injury size and patient age [ 29 , 30 ]. Although there is already extensive experience managing IIs in burn patients, they still represent a great challenge, mainly due to their complex pathophysiological process that has not yet been fully clarified, where the involvement of several inflammatory cells, mediators, and cytokines has been demonstrated [ 19 ]. Diagnosis and accurate measurement of its damage are also complex, and therapies are essentially based on supportive measures [ 17 ].

In the IIs, the magnitude and location of the injury vary considerably according to the environment and the host factors [ 31 ]. In this respect, several factors should be considered, such as ignition source, concentration and solubility of inhaled substances, diameter and size of the particles in the smoke, exposure duration, temperature, and patient immune response [ 20 , 31 , 32 ]. Individuals aged 65 and beyond exhibit a mortality rate from burns exceeding the average six factors [ 33 ]. Due to diminished physiological reserves and comorbidities, managing this demographic poses a distinctive and formidable challenge. Multiple preexisting risk factors are manifest in older adults, encompassing an elevated susceptibility to infections, pulmonary diseases, and comorbidities [ 34 ].

Although most patients exposed to smoke inhalation evolve well, the development of respiratory injury significantly worsens the outcome of these patients with a significant increase in mortality and complications, including long-term sequelae [ 17 , 34 – 36 ]. Indeed, pulmonary complications following burns and II cause or directly contribute to 77% deaths [ 37 , 38 ]. Among the pulmonary complications, ARDS may develop early or several days after the exposure [ 39 ]. Although ARDS may also occur in burns without II, the clinical symptoms tend to worsen following IIs. In II, ARDS usually starts earlier, progresses with greater severity, and requires IMV for longer [ 13 ]. Furthermore, sepsis and acute respiratory failure are frequent causes of morbidity and mortality in patients with exclusive thermal burns, which may be even more prevalent in patients with IIs [ 13 ].

It is already known that II is an independent risk factor for mortality in patients with small and moderate burns [ 13 ]. In this respect, the management of IIs is essential and can vary from the conservative approach to more elaborate options involving drugs [ 23 ]. Specific treatments have been tested to prevent IMV, complications, and poor outcomes. Some studies observed that N acetylcysteine and inhaled anticoagulants (such as heparin) may effectively treat inhalation injury, significantly improving lung compliance and airway obstruction, reducing reintubation rates, increasing the number of ventilator-free days, and decreasing hospital length of stay, and mortality [ 40 – 43 ].

Respiratory impairment is still a major challenge in clinical practice and a promising area for research, needing further studies to understand and act on this potentially severe condition. In this systematic review, we aim to clarify the principal voids in the existing literature regarding fire-related II to guide future studies. Furthermore, the findings can contribute to diagnostic and management protocols for II in burned patients, which may improve health care and the prognosis of these people. In this aspect, especially the identification of factors associated with acute respiratory failure, need for IMV, and mortality may contribute to defining the phenotype of inhalation burns that is associated with poor prognosis and clinical approaches that may have led to better outcomes, which can contribute to stricter monitoring of these patients and the institution of earlier clinical therapeutic approaches to improve outcomes for these patients.

Supporting information

S1 checklist. prisma-p (preferred reporting items for systematic review and meta-analysis protocols) 2015 checklist: recommended items to address in a systematic review protocol*..

https://doi.org/10.1371/journal.pone.0295318.s001

  • 1. Campos EV. Cuidados intensivos ao paciente grande queimado [Intensive care for major burn patient]. In: Azevedo CPA, Taniguchi LU, Ladeira JP (editors). Medicina intensiva: abordagem prática [Intensive care medicine: practical approach]. 3 rd ed. Barueri: Manole, 2017. p. 899–922.
  • 2. Piccolo NS, Serra MCVF, Leonardi DF, Lima EM Jr, Novaes FN, Correa MD et al. Queimaduras–parte II: tratamento da lesão [Burns–part II: treatment of the injury]. In : Brazilian Medical Association, Brazilian Federal Council of Medicine. Projetos diretrizes [Project Guidelines]. São Paulo: Brazilian Medical Association; 2008. p. 1–14.
  • View Article
  • Google Scholar
  • 5. World Health Organization. Burns. WHO: Washington, 2023 [cited 2022 Oct 01]. www.who.int/mediacentre/factsheets/fs365/en/ .
  • 6. Ministério da Saúde. Brasil. Óbitos por queimaduras no Brasil: análise inicial dos dados do Sistema de Informações sobre Mortalidade, 2015 a 2020 [Burn deaths in Brazil: initial analysis of data from the Mortality Information System, 2015 to 2020]. In: Brazilian Ministry of Health. Boletim Epidemiológico Volume 47 [Epidemiological Bulletin Volume 47]. Brazilian Ministry of Health: Brasília, 2022 [cited 2022 Oct 01]. https://www.gov.br/saude/pt-br/centrais-de-conteudo/publicacoes/boletins/epidemiologicos/edicoes/2022/boletim-epidemiologico-vol-53-no47 . p. 40–48.
  • PubMed/NCBI
  • 24. Woodson CL. Diagnosis and treatment of inhalation injury. In: Herndon DN (editor). Total Burn Care, 5 th ed. Elsevier: Amsterdam, 2017. p. 184–194.
  • 26. Wells GA, Shea B, O’Connell Da, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomized studies in meta-analyses [Internet]. Oxford: The Ottawa Hospital Research Institute; 2000. [cited 2022 Oct 01]. http://www.ohri.ca/programs/clinical_epidemiology/oxford.as .
  • 31. Traber DL. The pathophysiology of inhalation injury. In: Herndon DN (editor). Total Burn Care, 5 th ed. Elsevier: Amsterdam, 2017. p. 174–183.
  • 33. Porro LJ, Demling RH, Pierira CT, Herndon DN. Care of the geriatric patient. In: Herndon DN (editor). Total Burn Care, 5 th ed. Elsevier: Amsterdam, 2017. p. 381–385.
  • Systematic Review
  • Open access
  • Published: 19 April 2024

Effects of Chronic Static Stretching on Maximal Strength and Muscle Hypertrophy: A Systematic Review and Meta-Analysis with Meta-Regression

  • Konstantin Warneke   ORCID: orcid.org/0000-0003-4964-2867 1 , 7 ,
  • Lars Hubertus Lohmann 2 ,
  • David G. Behm 3 ,
  • Klaus Wirth 4 ,
  • Michael Keiner 5 ,
  • Stephan Schiemann 6 &
  • Jan Wilke 7  

Sports Medicine - Open volume  10 , Article number:  45 ( 2024 ) Cite this article

782 Accesses

12 Altmetric

Metrics details

Increases in maximal strength and muscle volume represent central aims of training interventions. Recent research suggested that the chronic application of stretch may be effective in inducing hypertrophy. The present systematic review therefore aimed to syntheisize the evidence on changes of strength and muscle volume following chronic static stretching.

Three data bases were sceened to conduct a systematic review with meta-analysis. Studies using randomized, controlled trials with longitudinal (≥ 2 weeks) design, investigating strength and muscle volume following static stretching in humans, were included. Study quality was rated by two examiners using the PEDro scale.

A total of 42 studies with 1318 cumulative participants were identified. Meta-analyses using robust variance estimation showed small stretch-mediated maximal strength increases (d = 0.30 p  < 0.001) with stretching duration and intervention time as significant moderators. Including all studies, stretching induced small magnitude, but significant hypertrophy effects (d = 0.20). Longer stretching durations and intervention periods as well as higher training frequencies revealed small (d = 0.26–0.28), but significant effects ( p  < 0.001–0.005), while lower dosage did not reach the level of significance ( p  = 0.13–0.39).

Conclusions

While of minor effectiveness, chronic static stretching represents a possible alternative to resistance training when aiming to improve strength and increase muscle size. As a dose-response relationship may exist, higher stretch durations and frequencies as well as long program durations should be further elaborated.

• While animal research consistently showed chronic stretch-mediated hypertrophy and strength increases, literature in humans draws an inconclusive picture, possibly due to lack of comparability of stretching parameters, such as duration and frequency.

• Our systematic review is the first that included studies using comparable stretching durations of up to two hours in humans, which showed small magnitude maximal strength increases and muscle hypertrophy.

• Even though less effective, high volume stretching might provide a sufficient alternative to strength training when aiming to induce muscle hypertrophy and strength increases. It must be noted that comparatively high training effort is opposed by comparatively small adaptations, suggesting a preference for the more efficient strength training if applicable.

Stretch training is commonly used to achieve improvements in flexibility [ 1 , 2 ], with widespread applications in sports conditioning and orthopedic physical therapy [ 3 , 4 ]. While it was widely accepted in the 1980s that static stretching should be included in warm-up routines [ 5 , 6 , 7 ], current evidence questions the implementation of (static) stretching during warm-up due to its detrimental impact on subsequent sports performance [ 8 , 9 , 10 ].

Despite adverse acute effects, static stretching may be beneficial for athletes if performed in the long-term [ 11 , 12 ]. A recent systematic review with meta-analysis evaluating animal studies found chronic stretching of the anterior latissimus dorsi in chickens and quails (for up to 24 h per day, seven days per week) substantially increased muscle mass by up to 319% (d = 8.5) due to increases in muscle cross-sectional area (up to 142%; d = 7.9). Besides these structural changes, gains in maximal strength (up to 95%; d = 12.4) [ 13 ] were observed. Interestingly, investigations aiming to translate animals’ muscle adaptions to humans were requested as early as in 1983: “Thirty minutes of stretching per day is certainly within normal physiological limits, and as a result may be applied to human muscle with hopes that similar adaptations would occur” [ 14 ].

Stretching effects on hypertrophy [ 15 , 16 ] and strength [ 17 , 18 ] in humans were previously reviewed pointing out only small strength increases (under dynamic conditions [ 17 ]) while muscle hypertrophy was exclusively evident using high intensity stretching [ 16 ]. However, even though recent reviews were performed in 2023, they missed inclusion of new literature that – for the first time – applied static stretching with continuous stretching durations up to two hours [ 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 ], which might lead to an under- or overestimation of the current evidence.

Consequently, the aim of this systematic review with meta-analysis was to investigate changes in muscle size and maximum strength following chronic static stretching interventions in humans. We hypothesized that stretching programs, performed in the long-term, would lead to increases in both outcomes. Based on findings from animal research, we assumed that previous stretching volume was not sufficient. Therefore, we hypothesized longer stretching session durations and intervention periods, as well as high training frequencies would trigger improvements, while lower durations/frequencies would not elicit relevant changes.

A systematic review and meta-analysis using robust variance estimation was performed adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The study was registered in the PROSPERO database (CRD42023411225).

Literature Search

Two independent investigators (KoW & LHL) conducted a systematic literature search using MEDLINE/PubMed, Web of Science and SPORTDiscus (March 2023) and updated in January 2024. The following inclusion criteria were applied: (1) randomized, controlled study design; (2) static stretching intervention with a duration of at least two weeks, performed in humans; (3) measurement of (a) maximal strength or related parameters such as active peak torque and/or (b) markers of muscle size (i.e., cross-sectional area, muscle thickness). Studies assessing acute effects, combining static stretch training with other (active) training protocols such as resistance training or neuromuscular facilitation, or including patients were excluded. The search terms (Online Supplemental Material) were created based on the requirements of each database. As an example, the terms for PubMed were as follows:

((stretch*) AND (performance OR strength OR 1RM OR force OR MVC OR (maxim* AND “voluntary contraction”) OR hypertrophy OR “muscle cross-sectional area” OR CSA OR “muscle thickness” OR “muscle mass” OR “muscle volume”) NOT (acute OR postural OR pnf OR “proprioceptive neuromuscular facilitation” OR “stretch shortening”)).

In addition to database searches, the reference lists of all included studies were screened for further eligible articles [ 27 ].

Methodological Study Quality and Risk of Bias

The assessment of study quality was performed by two independent investigators (KW1 & LHL) using the PEDro scale for randomized, controlled trials [ 28 , 29 ]. If consensus could not be reached, a third rater casting the decisive vote was consulted (MK). The PEDro scale (Table A in Supplemental Material) was used in previous reviews with meta-analysis on exercise and exercise therapy [ 30 , 31 ].

Risk of publication bias was examined using visual inspection of funnel plots [ 32 ], which were created using the method of Fernandez-Castilla et al. [ 33 ]. Additionally, Egger’s regression tests incorporating robust variance estimation for funnel plot asymmetry were applied [ 34 ]. The certainty about the evidence was rated as very low, low, moderate or high using the criteria proposed by the GRADE working group [ 35 ]. Generally, the quality of evidence of randomized trials is considered high and thereafter adjusted within the GRADE framework. In case of limitations in study design or execution, inconsistency of results, indirectness of evidence, imprecision or publication bias, one point is subtracted for each weakness. Conversely, large-magnitude effects or a dose-response gradient each lead to addition of one point to the quality of evidence rating.

Data Processing and Statistics

The means (M) and standard deviations (SD) from pre- and post-intervention tests were extracted for all parameters and study arms (stretching and inactive control). In case of missing data, the authors of the primary studies were contacted. Changes from pre to post were computed as M (posttest) – M (pretest) and standard deviations were pooled as

To account for multiple within-study outcome dependency with unknown origin of covariances, meta-analytical calculation was performed using robust variance estimation [ 36 ]. Standardized mean differences (SMD) and 95% confidence intervals (CI) for maximal strength capacity and muscle size changes (including both muscle thickness and muscle cross-sectional area) were pooled from fitting parameters from all included studies. We used R (R Foundation for Statistical Computing, Vienna, Austria) with the robumeta, version 2.0 [ 36 ] and metapackages. Obtained effect sizes (ES) were interpreted as 0 ≤ d < 0.2 trivial, 0.2 ≤ d < 0.5 small, 0.5 ≤ d < 0.8 moderate, or d ≥ 0.8 large [ 37 ], while τ² was used to explore study outcome heterogeneity, with classifications equal to effect sizes.

Meta-regression was performed using the robumeta package for dependent study outcomes, as described by Fisher & Tipton [ 36 ]. Furthermore, to quantify the influence of quantifable outcome moderators (stretching duration, intervention period and training frequency) when aiming to enhance maximal strength and muscle size, sub-analyses were performed for three variables: intervention duration, session duration and exercise frequency. For moderating variables (duration, intervention period and training frequency), we used the median-split for cut-off determination (intervention duration: small: <6 weeks vs. high: ≥ 6 weeks, frequency: low: <5 sessions vs. high: ≥5 sessions, stretching duration: short: <15 min vs. long: ≥15 min. To test for significant differences in mean effect size of sub-groups, the Welsh test was performed due to violation of normal distribution. If several study effects were presented mean effects for each study were calculated to account for within-study dependency in effect size comparsions.

Search Results

Figure 1 displays the flow of the literature search.

figure 1

Flow chart of literature search

Collectively, the queries in the three databases returned 10,427 hits. After application of inclusion and exclusion criteria, a total of 42 eligible studies with 1318 participants were identified. Among these, 36 studies with 85 ES [ 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 ] investigated strength parameters. Nineteen (19) studies [ 21 , 22 , 23 , 24 , 26 , 39 , 66 , 67 , 68 , 51 , 52 , 55 , 57 , 58 , 69 , 70 , 71 , 63 , 65 ] with 45 ES examined markers of muscle size.

Methodological Quality, Risk of Bias and Quality of Evidence

Per average, the methodological quality of the included studies was rated as fair [ 72 ] (mean 4.17 ± 1.4 out of 10 points; range 2 to 8 points; see Table A in Supplemental Material). For both outcomes (muscle volume and maximal strength), the quality of evidence was downgraded by 2 levels (high to low) due to high risk of bias (limitations in study quality: fair PEDro score and heterogeneity in study designs). In case of the sub-analyses for session and intervention duration (outcomes of maximal strength), the quality of evidence was upgraded by one level due to moderate to strong associations (low to moderate effect sizes, mostly on same side effect).

Quantitative Synthesis

Table 1 provides the study characteristics of included articles, while Table 2 summarizes the quantitative analysis results for overall and different subgroups.

Maximal Strength Capacit y

Static stretching showed a small positive effect on maximal strength ( d  = 0.30, p  < 0.001, 95% CI 0.14 to 0.46, τ²=0.01, 36 studies with 85 ES, Table  1 ). The certainty about the evidence is low. Meta-regression showed stretching duration positively influenced maximal strength ( p  = 0.04, estimate: 0.005), while a tendency was reported for intervention period ( p  = 0.06, estimate: 0.06). No significant result could be found for training frequency ( p  = 0.64).

Accordingly, higher stretch durations (≥ 15 min) induced small strength increases ( d  = 0.45, p  < 0.001, 95% CI 0.29 to 0.62, τ²=0.0, 14 studies, 30 ES, Fig.  2 ) which were opposed to shorter durations (< 15 min) which revealed a small-magnitude, not significant effect ( d  = 0.21, p  = 0.06, 95% CI -0.06 to 0.44, 22 studies, 55 ES, Fig.  3 ) with a significant mean ES difference ( p  = 0.01). The certainty about the evidence is moderate.

figure 2

Illustrates the meta-analytical results of long stretching durations. Legend: 1RM = one repetition maximum, EL = extended leg, FL = flexed leg

figure 3

Illustrates the meta-analytical results of short stretching durations. Legend: HI = high intensity group, LI = low intensity group, 1RM = one repetition maximum

Similar to stretch duration, longer program durations (> 6 weeks) achieved small strength increases ( d  = 0.36, p  = 0.003, 95%CI 0.13 to 0.59, τ²=0.04, 24 studies with 51 ES) while shorter durations yielded only trivial improvements ( d  = 0.16, p  = 0.006, 95%CI 0.05 to 0.26, τ²=0.0, 12 studies, 34 ES), with a significantly higher mean effect for longer intervention periods ( p  = 0.03). The certainty about the evidence is moderate. High training frequencies (more than five stretching sessions per week) led to small-magnitude strength increases ( d  = 0.32, p  = 0.025, 95% CI 0.05 to 0.6, τ²=0.04, 16 studies, 40 ES). Less than five sessions per week yielded only a small effect size ( d  = 0.26, p  < 0.001, 95%CI 0.14 to 0.38, τ²=0, 20 studies with 45 ES), without a significant difference in group mean effects ( p  = 0.39). The certainty about the evidence is low.

  • Hypertrophy

For hypertrophy, a trivial positive effect of stretching was found ( d  = 0.20, p  = 0.003, 95% CI 0.08 to 0.32, τ²=0.0, 19 studies, 45 ES) (see Fig.  4 ). The certainty about the evidence is low. While the meta regression ( p  = 0.23–0.88) revealed no significant influence of any included moderator, long-duration stretching (≥ 15 min) had a small effect size ( d  = 0.28, p  =  0.005, 95% CI 0.12 to 0.44, τ²=0.0, 7 studies, 17 ES) without a significant difference compared to shorter durations ( p  = 0.29) that, in turn, failed reaching a significant effect ( d  = 0.13, p  = 0.14, 95%CI -0.05 to 0.30, τ²=0.0, 12 studies with 28 ES). Similarly, studies that performed stretching for more than 6 weeks revealed d  = 0.26, p  < 0.001 extracted from 16 studies with 35 ES, while shorter training periods failed to reach the level of significance ( d = -0.05, p  = 0.13 from 3 studies and 10 ES) with higher effects for longer periods ( p  = 0.006). If stretching was performed more than 5 times per week, there were significant small magnitude increases in muscle size ( d  = 0.27, p  = 0.002, from 11 studies with 28 ES), opposed by no significant effect for lower training frequencies ( d  = 0.09, p  = 0.39), without a significantly higher mean effect size for higher frequencies ( p  = 0.31). The certainty about the evidence is low for all effects.

figure 4

Forest plot for all included studies on stretch-mediated hypertrophy

Publication Bias

Visual inspection of funnel plots (Fig.  5 ) revealed no indication of a publication bias for maximal strength as well as for muscle volume. Consistently, for both outcomes, Egger’s regression tests showed no publication bias p  = 0.23–0.31.

figure 5

Shows funnel plots for visual publication bias inspection, with ( a ) for maximal strength studies and ( b ) for hypertrophy studies. Plot size illustrates the number of outcomes in the respective study that were pooled and weighted in the meta-analytical calculation

In accordance with previous research, the present systematic review found chronic static stretching to increase (a) maximum strength [ 11 , 12 , 17 , 18 ], and (b) muscle size [ 16 ]. With stretching duration and a tendency for intervention time as moderating training parameters for maximal strength, our results indicate longer stretching durations to be of superior effectiveness. While overall stretch-induced hypertrophy showed small effects (d = 0.2), these effects seem attributable to stretching durations of ≥ 15 min, intervention periods of > 6 weeks and training frequencies of ≥ 5 times as lower dosage did not reach the level of significance in subgroup calculations ( p  = 0.14–0.39). The possible necessity of high stretching volumes with regard to improvements in strength and muscle volume is in line with results from animal studies [ 73 , 74 ].

As pointed out, early evidence had mostly suggested that stretching does not modify morphological and functional muscle parameters in humans [ 11 , 12 , 15 ]. However, this assumption was based on a lack of studies using high to very high stretch durations. Even the most recent review of Arntz et al. [ 18 ] did not include long duration studies [ 19 , 20 , 21 , 25 , 26 , 75 , 76 ], while Panidi et al. [ 16 ] included only one long-duration study [ 26 ]. Since animal research indicated a potential dose-reponse relationship [ 14 , 77 ], a meta-regression was performed that confirmed stretching duration to significantly moderate strength adaptations. While in contrast, the regression did not reveal such a relationship for muscle hypertrophy, significant muscle size enhancements were only obtained in higher dosage in subgroup analyses (≥ 15 min stretching, ≥6 weeks intervention period, ≥5x stretching per week). Compared to animals with reported muscle mass increases of up to 300% [ 78 ], human hypertrophy effects must be considered small. These differences could be attributed to diverse factors. Compared to animals, human muscle protein synthesis is slower [ 79 , 80 , 81 ]. This may be one explanation for a lack of hypertrophy in response to 30 min of stretch reported by Yahata [ 65 ]. Nevertheless, by using stretching durations of accumulated 15 min per session, Wohlann et al. [ 20 ] obtained significant muscle hypertrophy. There were differences in the intervened muscle groups, Wohlann used 4x weekly pectoralis stretching, while calf muscle stretching performed by Yahata and colleagues [ 65 ] was applied only twice per week. The potential role of training frequency is supported by consistent hypertrophy effects in all Warneke et al. studies [ 23 , 24 , 26 ], who used daily stretching. The results of the meta-analysis partly confirm this assumption, although meta regression did not reach the level of significance for both, maximal strength and hypertrophy. However, subgroup analysis for hypertrophy showed only more frequent training application to produce significant effects, while no significant influence of frequency was observed for strength increases.

Several mechanisms could explain the stretch-induced increases in muscle size or strength. First and foremost, it may be speculated that time under tension is not only paramount for gains in muscle volume following resistance training [ 82 ] but also following stretching [ 83 ], which would be in agreement with our results, showing the stretching duration to be important for strength (meta regression: p  = 0.038), but also for hypertrophy, as only with ≥ 15 min muscle size did increases occur. Accordingly, the literature shows high mechanical tension imposed on the sarcomere could trigger protein synthesis [ 84 , 85 ]. In quails and chickens, progressive stretching induced fast hypertrophy alongside serial sarcomereogenesis during the first days of the intervention [ 78 ]. However, when the stretching stimulus remained unmodified during such a program, initial increases in muscle cross-sectional area started to disappear [ 86 ]. Ashmore [ 87 ] suggested that the mechanical tension caused by stretching would lead to high stresses and compensatory adaptations in the sarcomere. It has, furthermore, been hypothesized that an increased total amount of sarcomeres reduces tension and with this stress on the individual sarcomere [ 86 ]. Thus, to increase training intensity and to ensure continuously strong tensioning of the sarcomere, the stretching stimulus needs to be re-adjusted. Indeed, Antonio & Gonyea [ 78 ] achieved the highest gains in muscle mass and hypertrophy by increasing the stretch intensity, starting with 10% of the body weight up to 35% after 5 weeks of chronic stretch in quails.

Another theory postulates that chronic stretch creates hypoxic conditions which are similar to those during blood flow restriction. Reducing arterial perfusion has been demonstrated to increase lactate levels, growth hormone concentrations, and inflammatory cytokines such as interleukin-6 [ 88 , 89 ]. Such metabolic milieu may represent a potent stimulus for mTOR signaling [ 90 , 91 , 92 ]. Interestingly, Jessee et al. [ 93 ] showed that blood flow restriction induces hypertrophy, however, it seems of minor relevance for maximum strength increases. Hotta et al. [ 94 ] observed acute decreases of blood flow during 30 min of stretching in animals. Studies measuring the metabolic muscle response to stretching would thus be warranted in order to further delineate the potential relevance of the abovementioned factors.

In sum, irrespective of initial processes, muscle hypertrophy requires an increase in muscle protein synthesis. Suzuki & Takeda [ 95 ] and Kremer [ 96 ] described the activation of stretch-activated channels and thus, the stimulation of the mTOR/p70S6K/PI3K pathway [ 97 , 98 , 99 ]. The literature emphasizes the importance of mechanical tension (e.g., through stretching) to trigger anabolic signaling pathways, with the stimulation of protein synthesis [ 100 , 101 , 102 , 103 ] as an underlying mechanism of hypertrophy (and maximal strength) [ 104 , 105 , 106 ]. Van der Pjil et al. [ 107 , 108 ] indicated the relevance of titin unfolding in hypertrophy (in parallel and longitudinal), supporting the hypothesis of high intensities [ 109 ]. Conversely, Fowles et al. [ 110 ] were not able to show acute increases in protein synthesis after 33-minutes of stretching in humans, although significant increases in protein synthesis rates had been reported in animals [ 100 , 102 , 103 , 111 ]. The stronger response in animals could hence be explained by a higher protein synthesis rate [ 80 , 81 ].

With regard to the increases in maximum strength, it may be expected that the increases in muscle volume would drive the strength gains. This would require hypertrophy to precede enhanced strength. However, no study has investigated the temporal association of both factors. In addition, effect sizes were trivial to small for muscle volume but moderate for strength. Another theory may attribute the improvements to neural adaptations [ 112 , 113 ]. The studies by Warneke et al. [ 19 , 26 ] and Nelson et al. [ 60 ], on the one hand, provide support for this assumption as they detected strength increases in the non-stretched contralateral leg. However, on the other hand, Holly et al. [ 114 ] and Barnett et al. [ 115 ] showed no significant increase in EMG activity during stretching in animals. Furthermore, Sola et al. [ 116 ] found stretch-mediated hypertrophy in denervated muscles, indicating a minor role of neural aspects. Therefore, to clarify the role of neural aspects in stretch-mediated adaptations, further research seems necessary.

Even though muscle hypertrophy only occurs using higher dosage stretching, our work has significant clinical implications. In general, stretching may represent an alternative to conventional resistance training interventions inducing muscle size- and strength increases. Nevertheless, several aspects must be considered. While Currier et al. [ 117 ] showed moderate to large magnitude maximal strength and muscle size increases of ES = 0.51 and ES = 1.60, respectively, when using resistance training, the present study’s small magnitude effect sizes of ES = 0.28 and ES = 0.45, respectively, showed that even long stretching durations were less effective. Assuming about one hour of stretching on one isolated muscle to achieve meaningful muscle hypertrophy [ 83 ] seems, on the one hand, of limited practical relevance [ 85 ]. On the other hand, passively induced mechanical tension via stretch training could be included into daily life, with for example using splints/ortheses during sitting in the office or while watching television [ 118 ]. A further benefit might be the potential applicability for people lacking motivation or ability to perform resistance training (e.g., patients with unstable cardiovascular diseases), if heavy resistance training is contraindicated, or after muscle, ligament or bone injuries leading to prolonged times of immobilization. Thus, (probably only) for conditioned populations, stretching could provide a sufficient alternative, especially since no training supervision is necessary to ensure safe exercise execution. Although stretching could be a valuable training intervention, it should only temporarily substitute or, even better, supplement classical training regimes. This is of importance because although stretching has been shown to be beneficial for cardiovascular health [ 119 ], it may not add as efficiently to the recommended levels of physical activity (e.g. by the World Health Organization, 150 min of moderate or 75 min of vigorous activity per week) as other activities such as walking, running, team sports, or resistance training.

Several aspects call for further research. Even though significant stretch-induced muscle hypertrophy in response to stretching durations of ≥ 15 min was identified, this was based on only 7 studies with a range of 3 × 5 min to one hour of stretching, with the highest effects originating from one research group [ 19 , 20 , 21 , 23 , 24 , 25 , 26 , 76 ]. Thus, further studies are requested to confirm or disconfirm the results. Furthermore, all long-lasting stretch interventions (more than one hour) were performed with high stretching frequency and intervention periods (≥ 6 weeks), increases in maximal strength and muscle volume cannot be clearly ascribed to one of these parameters. Further studies should hence examine long-lasting stretch interventions of < 6 weeks and/or ≤ 5 sessions per week. Moreover, the role of stretch intensity merits further investigation. Reporting stretch intensity using individual pain perception seems of questionable validity [ 120 ]. However, it is well known from strength training that training intensity seems to be of crucial importance for adaptations, especially with regard to maximum strength increases [ 121 ]. Considering the importance of titin unfolding, which is assumed to occur exclusively in maximally stretched sarcomeres, reaching high degrees of stretch could be hypothesized to be of paramount importance [ 109 , 122 ].

Despite some plausible theories [ 83 ], the underlying mechanisms remain speculative. While many physiological parameters were assessed in animals, no studies examined signaling pathways and possible alterations of protein synthesis in humans. Furthermore, research has almost exclusively focused on skeletal muscle. Interestingly, it has been shown that the connective tissue can exert significant force transmission effects [ 123 ]. Therefore, it may be prudent for future trials to consider multiple tissues.

Some increases in the examined parameters were surprisingly high in studies included in our review. Nelson et al. [ 60 ] reported an improvement in maximal strength of 29% (d = 1.48) in the stretched leg and a gain of about 11% (d = 0.46) in the contralateral control leg following 4 × 30 s stretching three times per week for ten weeks. Mizuno [ 55 ] found increases of 24% using static stretching three times per week for eight weeks, while Panidi et al. [ 69 ] detected hypertrophy effects of up to 23%. When these short duration stretching results are compared to those from strength training [ 124 ], the listed stretch-induced adaptations seem unreasonably high, even though participants are partially classified untrained to recreationally active. Against this background, it will be of interest to further identify moderator variables determining strong and weak stretch responders.

Lastly, testing for significant differences of mean effects to provide a valuable statement of subgroup differences was performed using the Welsh test. This testing procedure must be considered a supplementation of the main statistics and must be interpreted with caution, as no specific pooling for dependent outcomes was possible. If one study provided multiple outcomes, effect size means were calculated, meaning each study corresponded to one outcome, which reduced this limitation.

The present systematic review provides low- to moderate-certainty evidence that chronic static stretching increases maximum strength and muscle size. While the overall effects are small if existent, comparatively high effort seems necessary with longer stretching- and intervention periods (≥ 15 min, ≥ 6 weeks) and greater frequencies (≥ 5x/week) seem particularly effective. The exact physiological mechanisms causing potential effects remain a matter of debate. Nevertheless, even though less effective compared to resistance training, high volume stretching might provide a valuable alternative under special circumstances, e.g., if traditional resistance training is contraindicated.

Data Availability

Data can be provided on reasonable request. Supplemental materal associated with this article can be found in the online version.

Abbreviations

confidence interval

effect size

standard deviation

standardized mean differences

Konrad A, Alizadeh S, Daneshjoo A, Hadjizadeh AS, Graham S, Zahiri A et al. Chronic effects of stretching on range of motion with consideration of potential moderating variables: a systematic review with meta-analysis. J Sport Health Sci. 2023.

Medeiros DM, Martini TF. Chronic effect of different types of stretching on Ankle Dorsiflexion Range of Motion: systematic review and Meta-analysis. Foot(Edinb). 2018;34:28–35.

PubMed   Google Scholar  

Malliaropoulos N, Papalexandris S, Papalada A, Papacostas E. The role of stretching in Rehabilitation of Hamstring injuries: 80 athletes Follow-Up. Med Sci Sports Exerc. 2004;36:756–9.

Article   PubMed   Google Scholar  

Tunwattanapong P, Kongkasuwan R, Kupniratsaikul V. The effectiveness of a Neck and Shoulder stretching Exercise Program among Office Workers with Neck Pain: a Randomized Controlled Trial. Clin Rehabil. 2016;30:64–72.

Shellock FG, Prentice WE. Warming-up and stretching for Improved Physical Performance and Prevention of sports-related injuries. Sports med. 1985;2:267–78.

Article   CAS   PubMed   Google Scholar  

Gremion G. Is stretching for sports Performance still useful? A review of the literature. Rev Med Suisse. 2005;1:1830–4.

CAS   PubMed   Google Scholar  

Williford HN, East JB, Smith FH, Burry LA. Evaluation of warm-up for improvement in flexibility. Am J Sports Med. 1986;14:316–9.

Ebadi LA, Çetin E. Duration dependent effect of static stretching on quadriceps and hamstring muscle force. Sports. 2018;6.

Kay AD, Blazevich AJ. Effect of acute static stretch on maximal muscle performance: a systematic review. Med Sci Sports Exerc. 2012;44:154–64.

Simic L, Sarabon N, Markovic G. Does pre-exercise static stretching inhibit maximal muscular performance? A meta-analytical review. Scand J Med Sci Sports. 2013;23:131–48.

Medeiros DM, Lima CS. Influence of chronic stretching on muscle performance: systematic review. Hum Mov Sci. 2017;54:220–9.

Shrier I. Does stretching improve performance? A systematic and critical review of the literature. Clin J Sport Med. 2004;14:267–73.

Warneke K, Freund PA, Schiemann S. Long-lasting stretching induced muscle hypertrophy - a Meta-analysis of Animal studies. J Sci Sport Exerc. 2022.

Frankeny JR, Holly GR, Ashmore CR. Effects of graded duration of Stretch on normal and dystrophic skeletal muscle. Muscle Nerve. 1983;6:269–77.

Nunes JP, Schoenfeld BJ, Nakamura M, Ribeiro AS, Cunha PM, Cyrino ES. Does stretch training induce muscle hypertrophy in humans? A review of the literature. Clin Physiol Funct Imaging. 2020;40:148–56.

Panidi I, Donti O, Konrad A, Petros CD, Terzis G, Mouratidis A et al. Muscle architecture adaptations to static stretching training: a systematic review with meta-analysis. Sports Med Open. 2023;9.

Thomas E, Ficarra S, Nunes JP, Paoli A, Bellafiore M, Palma A, et al. Does stretching training influence muscular strength? A systematic review with Meta-analysis and Meta-regression. J Strength Cond Res. 2023;37:1145–56.

Arntz F, Markov A, Behm DG, Behrens M, Negra Y, Nakamura M, et al. Chronic effects of Static stretching exercises on muscle strength and power in Healthy Individuals across the Lifespan: a systematic review with multi-level Meta-analysis. Sports Med. 2023;53:723–45.

Article   PubMed   PubMed Central   Google Scholar  

Warneke K, Keiner M, Hillebrecht M, Schiemann S. Influence of one hour versus two hours of Daily Static stretching for six weeks using a calf-muscle-stretching orthosis on maximal strength. Int J Environ Res Public Health. 2022;19.

Wohlann T, Warneke K, Kalder V, Behm DG, Schmidt T, Schiemann S. Influence of 8-weeks of supervised static stretching or resistance training of pectoral major muscles on maximal strength, muscle thickness and range of motion. Eur J Appl Physiol. 2024.

Wohlann T, Warneke K, Hillebrecht M, Petersmann A, Ferrauti A, Schiemann S. Effects of daily static stretch training over 6 weeks on maximal strength, muscle thickness, contraction properties and flexibility. Front Sports Act Living. 2023;5.

Warneke K, Hillebrecht M, Claassen-Helmers E, Wohlann T, Keiner M, Behm DG. Effects of a home-based stretching program on Bench Press Maximum Strength and Shoulder Flexibility. J Sports Sci Med. 2023;597–604.

Warneke K, Wirth K, Keiner M, Lohmann LH, Hillebrecht M, Brinkmann A, et al. Comparison of the effects of long-lasting static stretching and hypertrophy training on maximal strength, muscle thickness and flexibility in the plantar flexors. Eur J Appl Physiol. 2023;123:1773–87.

Warneke K, Keiner M, Wohlann T, Lohmann LH, Schmitt T, Hillebrecht M, et al. Influence of long-lasting static stretching intervention on functional and morphological parameters in the plantar flexors: a randomized controlled trial. J Strength Cond Res. 2023;37(10):1993-2001. https://doi.org/10.1519/JSC.0000000000004513 . Epub 2023 Jun 5. PMID: 37318350.

Warneke K, Konrad A, Keiner M, Zech A, Nakamura M, Hillebrecht M, et al. Using Daily stretching to Counteract Performance decreases as a result of reduced physical Activity—A controlled trial. Int J Environ Res Public Health. 2022;19:15571.

Warneke K, Brinkmann A, Hillebrecht M, Schiemann S. Influence of long-lasting Static stretching on maximal strength, muscle thickness and flexibility. Front Physiol. 2022;13.

Horsley T, Dingwall O, Sampson M. Checking reference lists to find additional studies for systematic reviews. Cochrane Database Syst Rev. 2011;2011.

de Morton NA. The PEDro Scale is a valid measure of the Methodological Quality of clinical trials: a demographic study. Aust J Physiother. 2009;55:129–33.

Maher CG, Sherrington C, Herbert RD, Moseley AM, Elkins M. Reliability of the PEDro scale for rating quality of randomized controlled trials. Phys Ther. 2003;83:713–21.

Van Duijnhoven HJR, Heeren A, Peters MAM, Veerbeek JM, Kwakkel G, Geurts ACH, et al. Effects of Exercise Therapy on Balance Capacity in Chronic Stroke: systematic review and Meta-analysis. Stroke. 2016;47:2603–10.

Stojanović E, Ristić V, McMaster DT, Milanović Z. Effect of Plyometric Training on Vertical Jump performance in female athletes: a systematic review and Meta-analysis. Sports Med. 2017;47:975–86.

Mavridis D, Salanti G. Exploring and accounting for publication bias in mental health: a brief overview of methods. Evid Based Ment Health. 2014;17:11–5.

Fernández-Castilla B, Declercq L, Jamshidi L, Beretvas SN, Onghena P, van den Noortgate W. Visual representations of meta-analyses of multiple outcomes: extensions to forest plots, funnel plots, and caterpillar plots. Methodology. 2020;16:299–315.

Article   Google Scholar  

Pustejovsky J. [R-meta] egger’s test for funnel plot symmetry of a ’robu()’model. https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2019-November/001876.html . 2019.

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490.

Fisher Z, Tipton E, Robumeta. An R-package for robust variance estimation in meta analysis. arXiv:1503.02220. 2015.

Faraone SV. Interpreting estimates of treatment effects: implications for managed care. P T. 2008;33:700–3.

PubMed   PubMed Central   Google Scholar  

Abdel-aziem AA, Mohammad WS. Plantar-flexor static stretch training effect on eccentric and concentric peak torque – a comparative study of trained versus untrained subjects. J Hum Kinet. 2012;34:49–58.

Akagi R, Takahashi H. Effect of a 5-week static stretching program on hardness of the gastrocnemius muscle. Scand J Med Sci Sports. 2014;24:950–7.

Barbosa GM, Trajano GS, Dantas GAF, Silva BR, Vieira WHB. Chronic effects of Static and Dynamic stretching on Hamstrings eccentric strength and functional performance: a Randomized Controlled Trial. J Strength Cond Res. 2020;34:2031–9.

Brusco CM, Blazevich AJ, Pinto RS. The effects of 6 weeks of constant-angle muscle stretching training on flexibility and muscle function in men with limited hamstrings’ flexibility. Eur J Appl Physiol [Internet]. 2019;119:1691–700. https://doi.org/10.1007/s00421-019-04159-w .

Caldwell SL, Bilodeau RLS, Cox MJ, Behm DG. Cross education training effects are evident with twice daily, self-administered band stretch training. J Sports Sci Med. 2019;18:544–51.

Chen CH, Nosaka K, Chen HL, Lin MJ, Tseng KW, Chen TC. Effects of flexibility training on eccentric exercise-induced muscle damage. Med Sci Sports Exerc. 2011;43:491–500.

Cini A, de Vasconcelos GS, Soligo MC, Felappi C, Rodrigues R, Aurélio Vaz M, et al. Comparison between 4 weeks passive static stretching and proprioceptive neuromuscular facilitation programmes on neuromuscular properties of hamstring muscles: a randomised clinical trial. Int J Ther Rehabil. 2020;27:1–11.

Ikeda N, Ryushi T. Effects of 6-Week Static stretching of knee extensors on flexibility, muscle strength, Jump Performance, and muscle endurance. J Strength Cond Res. 2021;35:715–23.

Kokkonen J, Nelson AG, Eldredge C, Winchester JB. Chronic static stretching improves exercise performance. Med Sci Sports Exerc. 2007;39:1825–31.

Konrad A, Tilp M. Increased range of motion after static stretching is not due to changes in muscle and tendon structures. Clin Biomech Elsevier Ltd. 2014;29:636–42.

Kubo K, Kanehisa H, Fukunaga T. Effect of stretching training on the viscoelastic properties of human tendon structures in vivo. J Appl Physiol. 2002;92:595–601.

LaRoche DP, Lussier MV, Roy SJ. Chronic stretching and Voluntary muscle force. J Strength Cond Res. 2008;22:589–96.

Leslie AW, Lanovaz JL, Andrushko JW, Farthing JP. Flexibility training and the repeated-bout effect: priming interventions prior to eccentric training of the knee flexors. Appl Physiol Nutr Metab. 2017;42:1044–53.

e Lima KMM, Carneiro SP, de Alves S, Peixinho D, de Oliveira CC. Assessment of muscle Architecture of the biceps femoris and Vastus Lateralis by Ultrasound after a chronic stretching program. Clin J Sport Med. 2015;25:55–60.

Longo S, Cè E, Valentina Bisconti A, Rampichini S, Doria C, Borrelli M, et al. The effects of 12 weeks of static stretch training on the functional, mechanical, and architectural characteristics of the triceps surae muscle-tendon complex. Eur J Appl Physil. 2021;121:1743–58.

Marshall PWM, Cashman A, Cheema BS. A Randomized Controlled Trial for the Effect of Passive stretching on measures of Hamtring Extensibility, Passive Stiffness, Strength and Stretch Tolerance. J Sci Med Sport. 2011;14:535–40.

Minshull C, Eston R, Bailey A, Rees D, Gleeson N. The differential effects of PNF versus passive stretch conditioning on neuromuscular performance. Eur J Sport Sci. 2014;14:233–41.

Mizuno T. Combined effects of Static stretching and Electrical Stimulation on Joint Range of Motion and muscle strength. J Strength Cond Res. 2019;33:2694–703.

Morton SK, Whitehead JR, Brinkert RH, Caine DJ. Resistance training vs. static stretching: effects on flexibility and strength. J Strength Cond Res. 2011;25:3391–8.

Moltubakk MM, Villars FO, Magulas MM, Magnusson SP, Seynnes OR, Bojsen-Møller J. Altered triceps Surae muscle-Tendon Unit properties after 6 months of Static stretching. Med Sci Sports Exerc. 2021;53:1975–86.

Nakamura M, Yoshida R, Sato S, Yahata K, Murakami Y, Kasahara K et al. Comparison between high- and Low-Intensity Static Stretching Training Program on active and Passive properties of Plantar Flexors. Front Physiol. 2021;12.

Nakao S, Ikezoe T, Nakamura M, Umegaki H, Fujita K, Umehara J, et al. Chronic effects of a Static stretching Program on Hamstring Strength. J Strength Cond Res. 2019;35:1924–9.

Nelson AG, Kokkonen J, Winchester JB, Kalani W, Peterson K, Kenly MS, et al. A 10-Week stretching program increases strength in the contralateral muscle. J Cond Res. 2012;26:832–6.

Nóbrega ACL, Paula KC, Carvalho ACG. Interaction between Resistance Training and Flexibility Training in healthy young adults. J Strength Conditioning Res. 2005;19:842.

Google Scholar  

Reiner M, Gabriel A, Sommer D, Bernsteiner D, Tilp M, Konrad A. Effects of a high-7-Week Pectoralis muscle stretching training on muscle function and muscle stiffness. Sports Med Open. 2023;9:40.

Simpson CL, Kim BDH, Bourcet MR, Jones GR, Jakobi JM. Stretch training induces unequal adaptation in muscle fascicles and thickness in medial and lateral gastrocnemii. Scand J Med Sci Sports. 2017;27:1597–604.

Wilson SJ, Christensen B, Gange K, Todden C, Hatterman-Valenti H, Albrecht JM. Chronic stretching during 2 weeks of immobilization decreases loss of girth, peak torque, and dorsiflexion range of motion. J Sport Rehabil. 2019;28:67–71.

Yahata K, Konrad A, Sato S, Kiyono R, Yoshida R, Fukaya T, et al. Effects of a high-volume static stretching programme on plantar-flexor muscle strength and architecture. Eur J Appl Physiol. 2021;121:1159–66.

Andrade RJ, Freitas SR, Hug F, Le Sant G, Lacourpaille L, Gross R, et al. Chronic effects of muscle and nerve-directed stretching on tissue mechanics. J Appl Physiol. 2020;129:1011–23.

Freitas SR, Mil-Homens P. Effect of 8-week high-intensity stretching training on biceps femoris architecture. J Strength Cond Res. 2015;29:1737–40.

Kay AD, Rubley B, Talbot C, Mina M, Baross AW, Blazevich AJ. Stretch imposed on active muscle elicits positive adaptations in strain risk factors and exercise-induced muscle damage. Scand J Med Sci Sports. 2018;28:2299–309.

Panidi I, Bogdanis GC, Terzis G, Donti A, Konrad A, Gaspari V et al. Muscle architectural and functional adaptations following 12-Weeks of stretching in adolescent female athletes. Front Physiol. 2021;12.

Peixinho CC, Silva GA, Brandão MCA, Menegaldo LL, de Oliveira LF. Effect of a 10-Week stretching program of the triceps Surae muscle Architecture and Tendon Mechanical properties. J Sci Sport Exerc. 2021;3:107–14.

Sekir U, Arslan G, Ilhan O, Akova B. Effects of Static and Dynamic stretching on muscle Architecture. Turkish J Sports Med. 2019;54:158–68.

Cashin AG, McAuley JH. Clinimetrics: Physiotherapy evidence database (PEDro) scale. J Physiother. 2020;66:59.

Warneke K, Freund PA, Schiemann S. Long-lasting stretching induces muscle hypertrophy: a Meta-analysis of Animal studies. J Sci Sport Exerc. 2022.

Kelley G. Mechanical overload and skeletal muscle Fiber hyperplasia: a Meta-analysis. J Appl Physiol. 1996;81:1584–8.

Warneke K, Zech A, Wagner CM, Konrad A, Nakamura M, Keiner M, et al. Sex diffeences in stretch-induced hypertrophy, maximal strength and flexibility gains. Front Physiol. 2022;13:1078301.

Warneke K, Lohmann LH, Keiner M, Wagner C, Schmidt T, Wirth K, et al. Using Long-Duration Static Stretch Training to counteract strength and flexibility deficits in moderately trained participants. Int J Environ Res Public Health. 2022;1:15.

Bates GP. The relationship between duration of stimulus per day and the extend of hypertrophy of slow-tonic skeletal muscle in the fowles, Gallus gallus. Comp Biochem Physiol. 1993;106A:755–8.

Antonio J, Gonyea WJ, Progressive WJG. Progressive stretch overload of skeletal muscle results in hypertrophy before hyperplasia. J Appl Physiol. 1993;75:1263–71.

Sayegh JF, Lajtha A. In vivo rates of protein synthesis in brain, muscle, and liver of five vertebrate species. Neurochem Res. 1989;11:1165–8.

Garibotto G, Tessari P, Robaudo C, Zanetti M, Saffioti S, Vettore M, et al. Protein turnover in the kidney and the whole body in humans. Min Electrolyte Metab. 1997;23:185–8.

CAS   Google Scholar  

Tessari P, Garibotto G, Inchiostro S, Robaudo C, Saffioti S, Vettore M, et al. Kidney, splanchnic, and leg protein turnover in humans. Inside from leucine and phenylalanine kinetics. J Clin Invest. 1996;98:1481–92.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Martins-Costa HC, Lacerda LT, Diniz RCR, Lima FV, Andrade AGP, Peixoto GH, et al. Equalization of training protocols by Time under Tension determines the magnitude of changes in strength and muscular hypertrophy. J Strength Cond Res. 2022;36:1770–80.

Warneke K, Lohmann LH, Lima CD, Hollander K, Konrad A, Zech A et al. Physiology of stretch-mediated hypertrophy and strength increases: a narrative review. Sports Med. 2023.

Wackerhage H, Schoenfeld BJ, Hamilton DL, Lehti M, Hulmi JJ. Stimuli and sensors that initiate muscle hypertrophy following resistance exercise. J Appl Physiol. 2019;126:30–43.

Schoenfeld BJ, Wackerhage H, De Souza E. Inter-set stretch: a potential time-efficient strategy for enhancing skeletal muscle adaptations. Front Sports Act Living. 2022;4.

Devol DL, Novakofski J, Fernando R, Bechtel PJ. Varying amounts of Stretch stimulus regulate Stretch-Induced muscle hypertrophy in the chicken. Biochem Physiol. 1991;100A:55–61.

Ashmore CR. Stretch-induced growth in chicken wing muscles: effects on hereditary muscular dystrophy. Am J Physiol. 1982;242:C178–83.

Hughes L, Rosenblatt B, Haddad F, Gissane C, McCarthy D, Clarke T, et al. Comparing the effectiveness of blood Flow Restriction and Traditional Heavy load resistance training in the Post-surgery Rehabilitation of Anterior Cruciate Ligament Reconstruction patients: a UK National Health Service Randomised Controlled Trial. Sports Med. 2019;49:1787–805.

Krzysztofik M, Wilk M, Wojdała G, Gołaś A. Maximizing muscle hypertrophy: A systematic review of advanced resistance training techniques and methods. Int J Environ Res Public Health. MDPI AG; 2019.

Loenneke JP, Wilson JM, Marín PJ, Zourdos MC, Bemben MG. Low intensity blood flow restriction training: a meta-analysis. Eur J Appl Physiol. 2012;112:1849–59.

Horiuchi M, Okita K. Blood Flow Restricted Exercise and vascular function. Int J Vasc Med. 2012;2012:1–17.

Fry CS, Glynn EL, Drummond MJ, Timmerman KL, Fujita S, Abe T, et al. Blood flow restriction exercise stimulates mTORC1 signaling and muscle protein synthesis in older men. J Appl Physiol. 2010;108:1199–209.

Jessee MB, Buckner SL, Mouser JG, Mattocks KT, Dankel SJ, Abe T, et al. Muscle adaptations to High-Load Training and very low-load training with and without blood Flow Restriction. Front Physiol. 2018;9:1–10.

Hotta K, Behnke BJ, Arjmandi B, Ghosh P, Chen B, Brooks R, et al. Daily muscle stretching enhances blood flow, endothelial function, capillarity, vascular volume and connectivity in aged skeletal muscle. J Physiol. 2018;596:1903–17.

Suzuki YM, Takeda S. Mechanobiology in skeletal muscle. Mech Biology. 2011;51–62.

Kremer B. Dehnungsinterventionen im Spannungsfeld historischer Entwicklung, ritualisierter Anwendung, Meisterlehre und Wissenschaft; Eine Bestandsanalyse. Karlsruher Sportwissenschaftliche Beiträge. 2017. pp. 188–92.

Mousavizadeh R, Hojabrpour P, Eltit F, McDonald PC, Dedhar S, McCormack RG, et al. β1 integrin, ILK and mTOR regulate collagen synthesis in mechanically loaded tendon cells. Sci Rep. 2020;10:1–12.

Bradley JMB, Kelley MJ, Rose A, Acott TS. Signaling pathways used in trabecular matrix metalloproteinase response to Mechanical Stretch. Investig Ophthalmol Vis Sci. 2003;44:5174–81.

Aoki MS, Miyabara EH, Soares AG, Saito ET, Moriscot AS. mTOR pathway inhibition attenuates skeletal muscle growth induced by stretching. Cell Tissue Res. 2006;324:149–56.

Czerwinski SM, Martin JM, Bechtel PJ. Modulation of IGF mRNA abundance during stretch-induced skeletal muscle hypertrophy and regression. J Appl Physiol. 1994;76:2026–30.

Goldspink G. Changes in muscle mass and phenotype and the expression of autocrine and systemic growth factors by muscle in response to stretch and overload. J Anat. 1999;194:323–34.

Sparrow MP. Regression of Skeletal Muscle of Chicken Wing after Stretch-Induced Hypertrophy. Am J Physiol. 1982;242:C333–8.

Laurent GJ, Sparrow MP, Millward DJ. Turnover of muscle protein in Fowl: changes in Rates of protein synthesis and breakdown during hypertrophy of the anterior and posterior latissimus dorsi muscles. Biochem J. 1978;176:407–14.

Aguilar-Agon KW, Capel AJ, Martin NRW, Player DJ, Lewis MP. Mechanical loading stimulates hypertrophy in tissue-engineered skeletal muscle: Molecular and phenotypic responses. J Cell Physiol. 2019;234:23547–58.

Boppart MD, Mahmassani ZS. Integrine signaling: linking mechanical stimulation to skeletal muscle hypertrophy. Am J Physiol Cell Physiol. 2019;317:C629–41.

Sasai N, Agata N, Inoue-Miyazu M, Kawakami K, Kobayashi K, Sokabe M, et al. Involvement of PI3K/Akt/TOR pathway in stretch-induced hypertrophy of myotubes. Muscle Nerve. 2010;41:100–6.

van der Pijl R, Strom J, Conijn S, Lindqvist J, Labeit S, Granzier H, et al. Titin-based mechanosensing modulates muscle hypertrophy. J Cachexia Sarcopenia Muscle. 2018;9:947–61.

van der Pijl RJ, Hudson B, Granzier-Nakajima T, Li F, Knottnerus AM, Smith J et al. Deleting Titin’s C-Terminal PEVK exons increases Passive Stiffness, alters splicing, and induces cross-sectional and longitudinal hypertrophy in skeletal muscle. Front Physiol. 2020;11.

Apostolopoulos N, Metsios GS, Flouris AD, Koutedakis Y, Wyon MA. The relevance of stretch intensity and position—a systematic review. Front Psych. 2015;6.

Fowles JR, MacDougall JD, Tarnopolsky MA, Sale DG, Roy BD, Yarascheski KE. The effects of acute passiv stretch on muscle protein synthesis in humans. Can J Appl Physiol. 2000;25:165–80.

Laurent GJ, Sparrow MP. Changes in RNA, DNA and protein content and the Rates of protein synthesis and degradation durting hypertrophy of the Anterior Latissimus Dorsi muscle of the adult fowl (Gallus Domesticus). Growth. 1977;41:249–62.

Del Vecchio A, Casolo A, Negro F, Scorcelletti M, Bazzucchi I, Enoka R, et al. The increase in muscle force after 4 weeks of strength training is mediated by adaptations in motor unit recruitment and rate coding. J Physiol. 2019;597:1873–87.

Kim EH, Hassan AS, Heckman CJ. Changes in motor unit discharge patterns following strength training. J Physiol. 2019;597:3509–10.

Holly RG, Barnett JG, Ashmore CR, Taylor RG, Molti PA. Stretch-induced growth in chicken wing muscles: a new model of stretch hypertrophy. Am J Physiol. 1980;238:C62–71.

Barnett JG, Holly RG, Ashmore CR. Stretch-induced growth in chicken wing muscles: biochemical and morphological characterization. Am J Physiol. 1980;239:C39–46.

Sola M, Christensen DL, Martin AW. Hypertrophy and Hyperplasia of Adult Chicken Anterior Latissimus Dorsi muscles following Stretch with and without Denervation 0. Exp Neurol. 1973;41:76–100.

Currier BS, Mcleod JC, Banfield L, Beyene J, Welton NJ, D’Souza AC, et al. Resistance training prescription for muscle strength and hypertrophy in healthy adults: a systematic review and bayesian network meta-analysis. Br J Sports Med. 2023;57:1211–20.

Behm DG, Granacher U, Warneke K, Aragão-Santos JC, Da Silva-Grigoletto ME, Konrad A. Minimalist training: is lower dosage or intensity resistance training effective to improve physical fitness? A narrative review. Sports Med. 2023.

Thomas E, Bellafiore M, Gentile A, Paoli A, Palma A, Bianco A. Cardiovascular responses to muscle stretching: a systematic review and Meta-analysis. Int J Sports Med. 2021.

Lim W, Park H. No significant correlation between the intensity of static stretching and subject’s perception of pain. J Phys Ther Sci. 2017;29:1856–9.

Schoenfeld BJ, Peterson MD, Ogborn D, Contreras B, Sonmez GT. Effects of Low- vs. high-load resistance training on muscle strength and hypertrophy in Well-trained men. J Strength Cond Res. 2015;29:2954–63.

Freundt JK, Linke WA. Titin as a force-generating muscle protein under regulatory control. J Appl Physiol. 2019;126:1474–82.

Wilke J, Debelle H, Tenberg S, Dilley A, Maganaris C. Ankle motion is Associated with Soft tissue displacement in the dorsal thigh: an in vivo investigation suggesting Myofascial Force Transmission across the knee Joint. Front Physiol. 2020;11.

Schoenfeld BJ, Grgic J, Ogborn D, Krieger JW. Strength and hypertrophy adaptations between Low- vs. high-load resistance training: a systematic review and Meta analysis. J Strength Cond Res. 2017;31:3508–23.

Download references

Acknowledgements

Not applicable.

Registration of the Study

The study was registered in the PROSPERO data base using the number CRD42023411225 and the title “Effects of Chronic Static Stretching on Maximal Strength and Muscle Hypertrophy: A Systematic Review with Meta-Analysis”.

The authors acknowledge the financial support by the University of Graz.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Institute of Human Movement Science, Sport and Health, University of Graz, Graz, Austria

Konstantin Warneke

Department of Human Motion Science and Exercise Physiology, Friedrich Schiller University, 07743, Jena, Germany

Lars Hubertus Lohmann

School of Human Kinetics and Recreation, Newfoundland and Labrador, Memorial University of Newfoundland, St. John’s, Canada

David G. Behm

University of Applied Sciences Wiener Neustadt, Wiener Neustadt, Austria

Klaus Wirth

Department of Sport Science, German University of Health & Sport, Ismaning, Germany

Michael Keiner

Institute of Exercise, Sport and Health, Leuphana University, Lüneburg, Germany

Stephan Schiemann

Department of Movement Sciences, University of Klagenfurt, Klagenfurt am Wörthersee, Austria

Konstantin Warneke & Jan Wilke

You can also search for this author in PubMed   Google Scholar

Contributions

KoW wrote the first draft, contributed to the screening of studies, performed the meta-analytic procedure with the help of JW, and performed the graphical illustration with the help of LHL. LHL contributed to study screening, assisted in the writing and helped with the graphical illustration. JW supervised the project, included critical feedback and advised on statistical procedures. MK, KlW, SS and AK included their critical feedback and expertise in the fields to the manuscript. All authors contributed to the manuscript and discussed the final version.

Corresponding author

Correspondence to Lars Hubertus Lohmann .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests. There were no sponsors included.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Warneke, K., Lohmann, L.H., Behm, D.G. et al. Effects of Chronic Static Stretching on Maximal Strength and Muscle Hypertrophy: A Systematic Review and Meta-Analysis with Meta-Regression. Sports Med - Open 10 , 45 (2024). https://doi.org/10.1186/s40798-024-00706-8

Download citation

Received : 30 August 2023

Accepted : 26 March 2024

Published : 19 April 2024

DOI : https://doi.org/10.1186/s40798-024-00706-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Maximum strength
  • Long-lasting

a meta analysis literature review

Mortality burden of pre-treatment weight loss in patients with non-small-cell lung cancer: A systematic literature review and meta-analysis

Affiliations.

  • 1 Department of Internal Medicine, Division of Hematology, Oncology and Cell Therapy, Rush University Medical Center, Chicago, IL, USA.
  • 2 Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA.
  • 3 Department of Medicine and Wilmot Cancer Institute, Division of Hematology/Oncology, University of Rochester Medical Center, Rochester, NY, USA.
  • 4 Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA.
  • 5 Curo, Envision Pharma Group, Philadelphia, PA, USA.
  • 6 EBM Health Consultants, New Delhi, Delhi, India.
  • 7 Internal Medicine Business Unit, Global Product Development, Pfizer Inc, New York, NY, USA.
  • 8 Internal Medicine Research Unit, Worldwide Research, Development and Medical, Pfizer Inc, Cambridge, MA, USA.
  • 9 Internal Medicine Research Unit, Clinical Development, Pfizer Inc, Cambridge, MA, USA.
  • 10 Global Medical Affairs, Pfizer Inc, New York, NY, USA.
  • PMID: 38650388
  • DOI: 10.1002/jcsm.13477

Cachexia, with weight loss (WL) as a major component, is highly prevalent in patients with cancer and indicates a poor prognosis. The primary objective of this study was to conduct a meta-analysis to estimate the risk of mortality associated with cachexia (using established WL criteria prior to treatment initiation) in patients with non-small-cell lung cancer (NSCLC) in studies identified through a systematic literature review. The review was conducted according to PRISMA guidelines. Embase® and PubMed were searched to identify articles on survival outcomes in adult patients with NSCLC (any stage) and cachexia published in English between 1 January 2016 and 10 October 2021. Two independent reviewers screened titles, abstracts and full texts of identified records against predefined inclusion/exclusion criteria. Following a feasibility assessment, a meta-analysis evaluating the impact of cachexia, defined per the international consensus criteria (ICC), or of pre-treatment WL ≥ 5% without a specified time interval, on overall survival in patients with NSCLC was conducted using a random-effects model that included the identified studies as the base case. The impact of heterogeneity was evaluated through sensitivity and subgroup analyses. The standard measures of statistical heterogeneity were calculated. Of the 40 NSCLC publications identified in the review, 20 studies that used the ICC for cachexia or reported WL ≥ 5% and that performed multivariate analyses with hazard ratios (HRs) or Kaplan-Meier curves were included in the feasibility assessment. Of these, 16 studies (80%; n = 6225 patients; published 2016-2021) met the criteria for inclusion in the meta-analysis: 11 studies (69%) used the ICC and 5 studies (31%) used WL ≥ 5%. Combined criteria (ICC plus WL ≥ 5%) were associated with an 82% higher mortality risk versus no cachexia or WL < 5% (pooled HR [95% confidence interval, CI]: 1.82 [1.47, 2.25]). Although statistical heterogeneity was high (I 2 = 88%), individual study HRs were directionally aligned with the pooled estimate, and there was considerable overlap in CIs across included studies. A subgroup analysis of studies using the ICC (HR [95% CI]: 2.26 [1.80, 2.83]) or WL ≥ 5% (HR [95% CI]: 1.28 [1.12, 1.46]) showed consistent findings. Assessments of methodological, clinical and statistical heterogeneity indicated that the meta-analysis was robust. Overall, this analysis found that ICC-defined cachexia or WL ≥ 5% was associated with inferior survival in patients with NSCLC. Routine assessment of both weight and weight changes in the oncology clinic may help identify patients with NSCLC at risk for worse survival, better inform clinical decision-making and assess eligibility for cachexia clinical trials.

Keywords: cachexia; meta‐analysis; muscle wasting; non‐small‐cell lung cancer; systematic literature review; weight loss.

© 2024 Pfizer Inc and The Authors. Journal of Cachexia, Sarcopenia and Muscle published by Wiley Periodicals LLC.

Publication types

Grants and funding.

Understanding the influence of different proxy perspectives in explaining the difference between self-rated and proxy-rated quality of life in people living with dementia: a systematic literature review and meta-analysis

  • Open access
  • Published: 24 April 2024

Cite this article

You have full access to this open access article

a meta analysis literature review

  • Lidia Engel   ORCID: orcid.org/0000-0002-7959-3149 1 ,
  • Valeriia Sokolova 1 ,
  • Ekaterina Bogatyreva 2 &
  • Anna Leuenberger 2  

Proxy assessment can be elicited via the proxy-patient perspective (i.e., asking proxies to assess the patient’s quality of life (QoL) as they think the patient would respond) or proxy-proxy perspective (i.e., asking proxies to provide their own perspective on the patient’s QoL). This review aimed to identify the role of the proxy perspective in explaining the differences between self-rated and proxy-rated QoL in people living with dementia.

A systematic literate review was conducted by sourcing articles from a previously published review, supplemented by an update of the review in four bibliographic databases. Peer-reviewed studies that reported both self-reported and proxy-reported mean QoL estimates using the same standardized QoL instrument, published in English, and focused on the QoL of people with dementia were included. A meta-analysis was conducted to synthesize the mean differences between self- and proxy-report across different proxy perspectives.

The review included 96 articles from which 635 observations were extracted. Most observations extracted used the proxy-proxy perspective (79%) compared with the proxy-patient perspective (10%); with 11% of the studies not stating the perspective. The QOL-AD was the most commonly used measure, followed by the EQ-5D and DEMQOL. The standardized mean difference (SMD) between the self- and proxy-report was lower for the proxy-patient perspective (SMD: 0.250; 95% CI 0.116; 0.384) compared to the proxy-proxy perspective (SMD: 0.532; 95% CI 0.456; 0.609).

Different proxy perspectives affect the ratings of QoL, whereby adopting a proxy-proxy QoL perspective has a higher inter-rater gap in comparison with the proxy-patient perspective.

Avoid common mistakes on your manuscript.

Quality of life (QoL) has become an important outcome for research and practice but obtaining reliable and valid estimates remains a challenge in people living with dementia [ 1 ]. According to the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) criteria [ 2 ], dementia, termed as Major Neurocognitive Disorder (MND), involves a significant decline in at least one cognitive domain (executive function, complex attention, language, learning, memory, perceptual-motor, or social cognition), where the decline represents a change from a patient's prior level of cognitive ability, is persistent and progressive over time, is not associated exclusively with an episode of delirium, and reduces a person’s ability to perform everyday activities. Since dementia is one of the most pressing challenges for healthcare systems nowadays [ 3 ], it is critical to study its impact on QoL. The World Health Organization defines the concept of QoL as “individuals' perceptions of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards, and concerns” [ 4 ]. It is a broad ranging concept incorporating in a complex way the persons' physical health, psychological state, level of independence, social relationships, personal beliefs, and their relationships to salient features of the environment.

Although there is evidence that people with mild to moderate dementia can reliably rate their own QoL [ 5 ], as the disease progresses, there is typically a decline in memory, attention, judgment, insight, and communication that may compromise self-reporting of QoL [ 6 ]. Additionally, behavioral symptoms, such as agitation, and affective symptoms, such as depression, may present another challenge in obtaining self-reported QoL ratings due to emotional shifts and unwillingness to complete the assessment [ 7 ]. Although QoL is subjective and should ideally be assessed from an individual’s own perspective [ 8 ], the decline in cognitive function emphasizes the need for proxy-reporting by family members, health professionals, or care staff who are asked to report on behalf of the person with dementia. However, proxy-reports are not substitutable for self-reports from people with dementia, as they offer supplementary insights, reflecting the perceptions and viewpoints of people surrounding the person with dementia [ 9 ].

Previous research has consistently highlighted a disagreement between self-rated and proxy-rated QoL in people living with dementia, with proxies generally providing lower ratings (indicating poorer QoL) compared with person’s own ratings [ 8 , 10 , 11 , 12 ]. Impairment in cognition associated with greater dementia severity has been found to be associated with larger difference between self-rating and proxy-rating obtained from family caregivers, as it becomes increasingly difficult for severely cognitively impaired individuals to respond to questions that require contemplation, introspection, and sustained attention [ 13 , 14 ]. Moreover, non-cognitive factors, such as awareness of disease and depressive symptoms play an important role when comparing QoL ratings between individuals with dementia and their proxies [ 15 ]. Qualitative evidence has also shown that people with dementia tend to compare themselves with their peers, whereas carers make comparisons with how the person used to be in the past [ 9 ]. The disagreement between self-reported QoL and carer proxy-rated QoL could be modulated by some personal, cognitive or relational factors, for example, the type of relationship or the frequency of contact maintained, person’s cognitive status, carer’s own feeling about dementia, carer’s mood, and perceived burden of caregiving [ 14 , 16 ]. Disagreement may also arise from the person with dementia’s problems to communicate symptoms, and proxies’ inability to recognize certain symptoms, like pain [ 17 ], or be impacted by the amount of time spent with the person with dementia [ 18 ]. This may also prevent proxies to rate accurately certain domains of QoL, with previous evidence showing higher level of agreement for observable domains, such as mobility, compared with less observable domains like emotional wellbeing [ 8 ]. Finally, agreement also depends on the type of proxy (i.e., informal/family carers or professional staff) and the nature of their relationship, for instance, proxy QoL scores provided by formal carers tend to be higher (reflecting better QoL) compared to the scores supplied by family members [ 19 , 20 ]. Staff members might associate residents’ QoL with the quality of care delivered or the stage of their cognitive impairment, whereas relatives often focus on comparison with the person’s QoL when they were younger, lived in their own home and did not have dementia [ 20 ].

What has been not been fully examined to date is the role of different proxy perspectives employed in QoL questionnaires in explaining disagreement between self-rated and proxy-rated scores in people with dementia. Pickard et al. (2005) have proposed a conceptual framework for proxy assessments that distinguish between the proxy-patient perspective (i.e., asking proxies to assess the patient’s QoL as they think the patient would respond) or proxy-proxy perspective (i.e., asking proxies to provide their own perspective on the patient’s QoL) [ 21 ]. In this context, the intra-proxy gap describes the differences between proxy-patient and proxy-proxy perspective, whereas the inter-rater gap is the difference between self-report and proxy-report [ 21 ].

Existing generic and dementia-specific QoL instruments specify the perspective explicitly in their instructions or imply the perspective indirectly in their wording. For example, the instructions of the Dementia Quality of Life Measure (DEMQOL) asks proxies to give the answer they think their relative would give (i.e., proxy-patient perspective) [ 22 ], whereas the family version of the Quality of Life in Alzheimer’s Disease (QOL-AD) instructs the proxies to rate their relative’s current situation as they (the proxy) see it (i.e., proxy-proxy perspective) [ 7 ]. Some instruments, like the EQ-5D measures, have two proxy versions for each respective perspective [ 23 , 24 ]. The Adult Social Care Outcome Toolkit (ASCOT) proxy version, on the other hand, asks proxies to complete the questions from both perspectives, from their own opinion and how they think the person would answer [ 25 ].

QoL scores generated using different perspectives are expected to differ, with qualitative evidence showing that carers rate the person with dementia’s QoL lower (worse) when instructed to comment from their own perspective than from the perspective of the person with dementia [ 26 ]. However, to our knowledge, no previous review has fully synthesized existing evidence in this area. Therefore, we aimed to undertake a systematic literature review to examine the role of different proxy-assessment perspectives in explaining differences between self-rated and proxy-rated QoL in people living with dementia. The review was conducted under the hypothesis that the difference in QoL estimates will be larger when adopting the proxy-proxy perspective compared with proxy-patient perspective.

The review was registered with the International Prospective Register of Systematic Reviews (CRD42022333542) and followed the Preferred Reporting Items System for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (see Appendix 1 ) [ 27 ].

Search strategy

This review used two approaches to obtain literature. First, primary articles from an existing review by Roydhouse et al. were retrieved [ 28 ]. The review included studies published from inception to February 2018 that compared self- and proxy-reports. Studies that focused explicitly on Alzheimer’s Disease or dementia were retrieved for the current review. Two reviewers conducted a full-text review to assess whether the eligibility criteria listed below for the respective study were met. An update of the Roydhouse et al. review was undertaken to capture more recent studies. The search strategy by Roydhouse et al. was amended and covered studies published after January 1, 2018, and was limited to studies within the context of dementia. The original search was undertaken over a three-week period (17/11/2021–9/12/2021) and then updated on July 3, 2023. Peer-reviewed literature was sourced from MEDLINE, CINAHL, and PsycINFO databases via EBSCOHost as well as EMBASE. Four main search term categories were used: (1) proxy terms (i.e., care*-report*), (2) QoL/ outcome terms (i.e., ‘quality of life’), (3) disease terms (i.e., ‘dementia’), and (4) pediatric terms (i.e., ‘pediatric*’) (for exclusion). Keywords were limited to appear in titles and abstracts only, and MeSH terms were included for all databases. A list of search strategy can be found in Appendix 2 . The first three search term categories were searched with AND, and the NOT function was used to exclude pediatric terms. A limiter was applied in all database searches to only include studies with human participants and articles published in English.

Selection criteria

Studies from all geographical locations were included in the review if they (1) were published in English in a peer-reviewed journal (conference abstracts, dissertations, a gray literature were excluded); (2) were primary studies (reviews were excluded); (3) clearly defined the disease of participants, which were limited to Alzheimer’s disease or dementia; (4) reported separate QoL scores for people with dementia (studies that included mixed populations had to report a separate QoL score for people with dementia to be considered); (5) were using a standardized and existing QoL instrument for assessment; and (6) provided a mean self-reported and proxy-reported QoL score for the same dyads sample (studies that reported means for non-matched samples were excluded) using the same QoL instrument.

Four reviewers (LE, VS, KB, AL) were grouped into two groups who independently screened the 179 full texts from the Roydhouse et. al (2022) study that included Alzheimer’s disease or dementia patients. If a discrepancy within the inclusion selection occurred, articles were discussed among all the reviewers until a consensus was reached. Studies identified from the database search were imported into EndNote [ 29 ]. Duplicates were removed through EndNote and then uploaded to Rayyan [ 30 ]. Each abstract was reviewed by two independent reviewers (any two from four reviewers). Disagreements regarding study inclusions were discussed between all reviewers until a consensus was reached. Full-text screening of each eligible article was completed by two independent reviewers (any two from four reviewers). Again, a discussion between all reviewers was used in case of disagreements.

Data extraction

A data extraction template was created in Microsoft Excel. The following information were extracted if available: country, study design, study sample, study setting, dementia type, disease severity, Mini-Mental Health State Exam (MMSE) score details, proxy type, perspective, living arrangements, QoL assessment measure/instrument, self-reported scores (mean, SD), proxy-reported scores (mean, SD), and agreement statistics. If a study reported the mean (SD) for the total score as well as for specific QoL domains of the measure, we extracted both. If studies reported multiple scores across different time points or subgroups, we extracted all scores. For interventional studies, scores from both the intervention group and the control group were recorded. In determining the proxy perspective, we relied on authors’ description in the article. If the perspective was not explicitly stated, we adopted the perspective of the instrument developers; where more perspectives were possible (e.g., in the case of the EQ-5D measures) and the perspective was not explicitly stated, it was categorized as ‘undefined.’ For agreement, we extracted the Intraclass Correlation Coefficient (ICC), a reliability index that reflects both degree of correlation and agreement between measurements of continuous variables. While there are different forms of ICC based on the model (1-way random effects, 2-wy random effects, or 2-way fixed effects), the type (single rater/measurement or the mean k raters/measurements), and definition of relationship [ 31 ], this level of information was not extracted due to insufficient information provided in the original studies. Values for ICC range between 0 and 1, with values interpreted as poor (less than 0.5), moderate (0.5–0.75), good (0.75–0.9), and excellent (greater than 0.9) reliability between raters [ 31 ].

Data synthesis and analysis

Characteristics of studies were summarized descriptively. Self-reported and proxy-reported means and SD were extracted from the full texts and the mean difference was calculated (or extracted if available) for each pair. Studies that reported median values instead of mean values were converted using the approach outlined by Wan et al. (2014) [ 32 ]. Missing SDs (5 studies, 20 observations) were obtained from standard errors or confidence intervals reported following the Cochrane guidelines [ 33 ]. Missing SDs (6 studies, 29 observations) in studies that only presented the mean value without any additional summary statistics were imputed using the prognostic method [ 34 ]. Thereby, we predicted the missing SDs by calculating the average SDs of observed studies with full information by the respective measure and source (self-report versus proxy-report).

A meta-analysis was performed in Stata (17.1 Stata Corp LLC, College Station, TX) to synthesize mean differences between self- and proxy-reported scores across different proxy perspectives. First, the pooled raw mean differences were calculated for each QoL measure separately, given differences in scales between measures. Secondly, we calculated the pooled standardized mean difference (SMD) for all studies stratified by proxy type (family carer, formal carers, mixed), dementia severity (mild, moderate, severe), and living arrangement (residential/institutional care, mixed). SMD accounts for the use of different measurement scales, where effect sizes were estimated using Cohen’s d. Random-effects models were used to allow for unexplained between-study variability based on the restricted maximum-likelihood (REML) estimator. The percentage of variability attributed to heterogeneity between the studies was assessed using the I 2 statistic; an I 2 of 0%-40% represents possibly unimportant heterogeneity, 30–60% moderate heterogeneity, 50–90% substantial heterogeneity, and 75%-100% considerable heterogeneity [ 35 ]. Chi-squared statistics (χ 2 ) provided evidence of heterogeneity, where a p -value of 0.1 was used as significance level. For studies that reported agreement statistics, based on ICC, we also ran a forest plot stratified by the study perspective. We also calculated Q statistic (Cochran’s test of homogeneity), which assesses whether observed differences in results are compatible with chance alone.

Risk of bias and quality assessment

The quality of studies was assessed using the using a checklist for assessing the quality of quantitative studies developed by Kmet et al. (2004) [ 36 ]. The checklist consists of 14 items and items are scored as ‘2’ (yes, item sufficiently addressed), ‘1’ (item partially addressed), ‘0’ (no, not addressed), or ‘not applicable.’ A summary score was calculated for each study by summing the total score obtained across relevant items and dividing by the total possible score. Scores were adjusted by excluding items that were not applicable from the total score. Quality assessment was undertaken by one reviewer, with 25% of the papers assessed independently by a second reviewer.

The PRISMA diagram in Fig.  1 shows that after the abstract and full-text screening, 38 studies from the database search and 58 studies from the Roydhouse et al. (2022) review were included in this review—a total of 96 studies. A list of all studies included and their characteristics can be found in Appendix 3.

figure 1

PRISMA 2020 flow diagram

General study characteristics

The 96 articles included in the review were published between 1999 and 2023 from across the globe; most studies (36%) were conducted in Europe. People with dementia in these studies were living in the community (67%), residential/institutional care (15%), as well as mixed dwelling settings (18%). Most proxy-reports were provided by family carers (85%) and only 8 studies (8%) included formal carers. The mean MMSE score for dementia and Alzheimer’s participants was 18.77 (SD = 4.34; N  = 85 studies), which corresponds to moderate cognitive impairment [ 37 ]. Further characteristics of studies included are provided in Table  1 . The quality of studies included (see Appendix 4) was generally very good, scoring on average 91% (SD: 9.1) with scores ranging from 50 to 100%.

Quality of life measure and proxy perspective used

A total of 635 observations were recorded from the 96 studies. The majority of studies and observations extracted assumed the proxy-proxy perspective (77 studies, 501 observations), followed by the proxy-patient perspective (18 studies, 62 observations), with 18 studies (72 observations) not clearly defining the perspective. Table 2 provides a detailed overview of number of studies and observations across the respective QoL measures and proxy perspectives. Two studies (14 observations) adopted both perspectives within the same study design: one using the QOL-AD measure [ 5 ] and the second study exploring the EQ-5D-3L and EQ VAS [ 38 ]. Overall, the QOL-AD was the most often used QoL measure, followed by the EQ-5D and DEMQOL. Mean scores for specific QoL domains were accessible for the DEMQOL and QOL-AD. However, only the QOL-AD provided domain-specific mean scores from both proxy perspectives.

Mean scores and mean differences by proxy perspective and QoL measure

The raw mean scores for self-reported and proxy-reported QoL scores are provided in the Supplementary file 2. The pooled raw mean difference by proxy perspective and measure is shown in Table  3 . Regardless of the perspective adopted and the QoL instrument used, self-reported scores were higher (indicating better QoL) compared with proxy-reported scores, except for the DEMQOL, where proxies reported better QoL than people with dementia themselves. Most instruments were explored from one perspective, except for the EQ-5D-3L, EQ VAS, and QOL-AD, for which mean differences were available for both perspectives. For these three measures, mean differences were smaller when adopting the proxy-patient perspective compared with proxy-proxy perspective, although mean scores for the QOL-AD were slightly lower from the proxy-proxy perspective. I 2 statistics indicate considerable heterogeneity (I 2  > 75%) between studies. Mean differences by specific QoL domains are provided in Appendix 5, but only for the QOL-AD measure that was explored from both perspectives. Generally, mean differences appeared to be smaller for the proxy-proxy perspective than the proxy-patient perspective across all domains, except for ‘physical health’ and ‘doing chores around the house.’ However, results need to be interpreted carefully as proxy-patient perspective scores were derived from only one study.

Standardized mean differences by proxy perspective, stratified by proxy type, dementia severity, and living arrangement

Table 4 provides the SMD by proxy perspective, which adjusts for the different QoL measurement scales. Findings suggest that adopting the proxy-patient perspective results in lower SMDs (SMD: 0.250; 95% CI 0.116; 0.384) compared with the proxy-proxy perspective (SMD: 0.532; 95% CI 0.456; 0.609). The largest SMD was recorded for studies that did not define the study perspective (SMD: 0.594; 95% CI 0.469; 0.718). A comparison by different proxy types (formal carers, family carers, and mixed proxies) revealed some mixed results. When adopting the proxy-proxy perspective, the largest SMD was found for family carers (SMD: 0.556; 95% CI 0.465; 0.646) compared with formal carers (SMD: 0.446; 95% CI 0.305; 0.586) or mixed proxies (SMD: 0.335; 95% CI 0.211; 0.459). However, the opposite relationship was found when the proxy-patient perspective was used, where the smallest SMD was found for family carers compared with formal carers and mixed proxies. The SMD increased with greater level of dementia severity, suggesting a greater disagreement. However, compared with the proxy-proxy perspective, where self-reported scores were greater (i.e., better QoL) than proxy-reported scores across all dementia severity levels, the opposite was found when adopting the proxy-patient perspective, where proxies reported better QoL than people with dementia themselves, except for the severe subgroup. No clear trend was observed for different living settings, although the SMD appeared to be smaller for people with dementia living in residential care compared with those living in the community.

Direct proxy perspectives comparison studies

Two studies assessed both proxy perspectives within the same study design. Bosboom et al. (2012) found that compared with self-reported scores (mean: 34.7; SD: 5.3) using the QOL-AD, proxy scores using the proxy-patient perspective were closer to the self-reported scores (mean: 32.1; SD: 6.1) compared with the proxy-proxy perspective (mean: 29.5; SD: 5.4) [ 5 ]. Similar findings were reported by Leontjevas et al. (2016) using the EQ-5D-3L, including the EQ VAS, showing that the inter-proxy gap between self-report (EQ-5D-3L: 0.609; EQ VAS: 65.37) and proxy-report was smaller when adopting the proxy-patient perspective (EQ-5D-3L: 0.555; EQ VAS: 65.15) compared with the proxy-proxy perspective (EQ-5D-3L: 0.492; EQ VAS: 64.42) [ 38 ].

Inter-rater agreement (ICC) statistics

Six studies reported agreement statistics based on ICC, from which we extracted 17 observations that were included in the meta-analysis. Figure  2 shows the study-specific and overall estimates of ICC by the respective study perspective. The heterogeneity between studies was high ( I 2  = 88.20%), with a Q test score of 135.49 ( p  < 0.001). While the overall ICC for the 17 observations was 0.3 (95% CI 0.22; 0.38), indicating low agreement, the level of agreement was slightly better when adopting a proxy-patient perspective (ICC: 0.36, 95% CI 0.23; 0.49) than a proxy-proxy perspective (ICC: 0.26, 95% CI 0.17; 0.35).

figure 2

Forest plot depicting study-specific and overall ICC estimates by study perspective

While previous studies highlighted a disagreement between self-rated and proxy-rated QoL in people living with dementia, this review, for the first time, assessed the role of different proxy perspectives in explaining the inter-rater gap. Our findings align with the baseline hypothesis and indicate that QoL scores reported from the proxy-patient perspective are closer to self-reported QoL scores than the proxy-proxy perspective, suggesting that the proxy perspective does impact the inter-rater gap and should not be ignored. This finding was observed across different analyses conducted in this review (i.e., pooled raw mean difference, SMD, ICC analysis), which also confirms the results of two previous primary studies that adopted both proxy perspectives within the same study design [ 5 , 38 ]. Our findings emphasize the need for transparency in reporting the proxy perspective used in future studies, as it can impact results and interpretation. This was also noted by the recent ISPOR Proxy Task Force that developed a checklist of considerations when using proxy-reporting [ 39 ]. While consistency in proxy-reports is desirable, it is crucial to acknowledge that each proxy perspective holds significance in future research, depending on study objectives. It is evident that both proxy perspectives offer distinct insights—one encapsulating the perspectives of people with dementia, and the other reflecting the viewpoints of proxies. Therefore, in situations where self-report is unattainable due to advanced disease severity and the person’s perspective on their own QoL assessment is sought, it is recommended to use the proxy-patient perspective. Conversely, if the objective of future research is to encompass the viewpoints of proxies, opting for the proxy-proxy perspective is advisable. However, it is important to note that proxies may deviate from instructed perspectives, requiring future qualitative research to examine the adherence to proxy perspectives. Additionally, others have argued that proxy-reports should not substitute self-reports, and only serve as supplementary sources alongside patient self-reports whenever possible [ 9 ].

This review considered various QoL instruments, but most instruments adopted one specific proxy perspective, limiting detailed analyses. QoL instruments differ in their scope (generic versus disease-specific) as well as coverage of QoL domains. The QOL-AD, an Alzheimer's Disease-specific measure, was commonly used. Surprisingly, for this measure, the mean differences between self-reported and proxy-reported scores were smaller using the proxy-proxy perspective, contrary to the patterns observed with all other instruments. This may be due to the lack of studies reporting QOL-AD proxy scores from the proxy-patient perspective, as the study by Bosboom et al. (2012) found the opposite [ 5 ]. Previous research has also suggested that the inter-rater gap is dependent on the QoL domains and that the risk of bias is greater for more ‘subjective’ (less observable) domains such as emotions, feelings, and moods in comparison with observable, objective areas such as physical domains [ 8 , 40 ]. However, this review lacks sufficient observations for definitive results on QoL dimensions and their impact on self-proxy differences, emphasizing the need for future research in this area.

With regard to proxy type, there is an observable trend suggesting a wider inter-rater gap when family proxies are employed using the proxy-proxy perspective, in contrast to formal proxies. This variance might be attributed to the use of distinct anchoring points; family proxies tend to assess the individual's QoL in relation to their past self before having dementia, while formal caregivers may draw comparisons with other individuals with dementia under their care [ 41 ]. However, the opposite was found when the proxy-patient perspective was used, where family proxies scores seemed to align more closely with self-reported scores, resulting in lower SMD scores. This suggests that family proxies might possess a better ability to empathize with the perspective of the person with dementia compared to formal proxies. Nonetheless, it is important to interpret these findings cautiously, given the relatively small number of observations for formal caregiver reports. Additionally, other factors such as emotional connection, caregiver burden, and caregiver QoL may also impact proxy-reports by family proxies [ 14 , 16 ] that have not been explored in this review.

Our review found that the SMD between proxy and self-report increased with greater level of dementia severity, contrasting a previous study, which showed that cognitive impairment was not the primary factor that accounted for the differences in the QoL assessments between family proxies and the person with dementia [ 15 ]. However, it is noteworthy that different interpretations and classifications were used across studies to define mild, moderate, and severe dementia, which needs to be considered. Most studies used MMSE to define dementia severity levels. Given the MMSE’s role as a standard measure of cognitive function, the study findings are considered generalizable and clinically relevant for people with dementia across different dementia severity levels. When examining the role of the proxy perspective by level of severity, we found that compared with the proxy-proxy perspective, where self-reported scores were greater than proxy-reported scores across all dementia severity levels, the proxy-patient perspective yielded the opposite results, and proxies reported better QoL than people with dementia themselves, except for the severe subgroup. It is possible that in the early stages of dementia, the person with dementia has a greater awareness of increasing deficits, coupled with denial and lack of acceptance, leading to a more critical view of their own QoL than how proxies think they would rate their QoL. However, future studies are warranted, given the small number of observations adopting the proxy-patient perspective in our review.

The heterogeneity observed in the studies included was high, supporting the use of random-effects meta-analysis. This is not surprising given the diverse nature of studies included (i.e., RCTs, cross-sectional studies), differences in the population (i.e., people living in residential care versus community-dwelling people), mixed levels of dementia severity, and differences between instruments. While similar heterogeneity was observed in another review on a similar topic [ 42 ], our presentation of findings stratified by proxy type, dementia severity, and living arrangement attempted to account for such differences across studies.

Limitations and recommendations for future studies

Our review has some limitations. Firstly, proxy perspectives were categorized based on the authors' descriptions, but many papers did not explicitly state the perspective, which led to the use of assumptions based on instrument developers. Some studies may have modified the perspective's wording without reporting it. Due to lack of resources, we did not contact the authors of the original studies directly to seek clarification around the proxy perspective adopted. Regarding studies using the EQ-5D, which has two proxy perspectives, some studies did not specify which proxy version was used, suggesting the potential use of self-reported versions for proxies. In such cases, the proxy perspective was categorized as undefined. Despite accounting for factors like QoL measure, proxy type, setting, and dementia severity, we could not assess the impact of proxy characteristics (e.g., carer burden) or dementia type due to limited information provided in the studies. We also faced limitations in exploring the proxy perspective by QoL domains due to limited information. Further, not all studies outlined the data collection process in full detail. For example, it is possible that the proxy also assisted the person with dementia with their self-report, which could have resulted in biased estimates and the need for future studies applying blinding. Although we assessed the risk of bias of included studies, the checklist was not directly reflecting the purpose of our study that looked into inter-rater agreement. No checklist for this purpose currently exists. Finally, quality appraisal by a second reviewer was only conducted for the first 25% of the studies due to resource constraints and a low rate of disagreement between the two assessors. However, an agreement index between reviewers regarding the concordance in selecting full texts for inclusion and conducting risk of bias assessments was not calculated.

This review demonstrates that the choice of proxy perspective impacts the inter-rater gap. QoL scores from the proxy-patient perspective align more closely with self-reported scores than the proxy-proxy perspective. These findings contribute to the broader literature investigating factors influencing differences in QoL scores between proxies and individuals with dementia. While self-reported QoL is the gold standard, proxy-reports should be viewed as complements rather than substitutes. Both proxy perspectives offer unique insights, yet QoL assessments in people with dementia are complex. The difference in self- and proxy-reports can be influenced by various factors, necessitating further research before presenting definitive results that inform care provision and policy.

Data availability

All data associated with the systematic literature review are available in the supplementary file.

Moyle, W., & Murfield, J. E. (2013). Health-related quality of life in older people with severe dementia: Challenges for measurement and management. Expert Review of Pharmacoeconomics & Outcomes, 13 (1), 109–122. https://doi.org/10.1586/erp.12.84

Article   Google Scholar  

Sachdev, P. S., Blacker, D., Blazer, D. G., Ganguli, M., Jeste, D. V., Paulsen, J. S., & Petersen, R. C. (2014). Classifying neurocognitive disorders: The DSM-5 approach. Nature reviews Neurology, 10 (11), 634–642. https://doi.org/10.1038/nrneurol.2014.181

Article   PubMed   Google Scholar  

Health, The Lancet Regional., & – Europe. (2022). Challenges for addressing dementia. The Lancet Regional Health . https://doi.org/10.1016/j.lanepe.2022.100504

The WHOQOL Group. (1995). The World Health Organization quality of life assessment (WHOQOL): Position paper from the World Health Organization. Social Science & Medicine, 41 (10), 1403–1409. https://doi.org/10.1016/0277-9536(95)00112-K

Bosboom, P. R., Alfonso, H., Eaton, J., & Almeida, O. P. (2012). Quality of life in Alzheimer’s disease: Different factors associated with complementary ratings by patients and family carers. International Psychogeriatrics, 24 (5), 708–721. https://doi.org/10.1017/S1041610211002493

Scholzel-Dorenbos, C. J., Rikkert, M. G., Adang, E. M., & Krabbe, P. F. (2009). The challenges of accurate measurement of health-related quality of life in frail elderly people and dementia. Journal of the American Geriatrics Society, 57 (12), 2356–2357. https://doi.org/10.1111/j.1532-5415.2009.02586.x

Logsdon, R. G., Gibbons, L. E., McCurry, S. M., & Teri, L. (2002). Assessing quality of life in older adults with cognitive impairment. Psychosomatic Medicine, 64 (3), 510–519. https://doi.org/10.1097/00006842-200205000-00016

Hutchinson, C., Worley, A., Khadka, J., Milte, R., Cleland, J., & Ratcliffe, J. (2022). Do we agree or disagree? A systematic review of the application of preference-based instruments in self and proxy reporting of quality of life in older people. Social Science & Medicine, 305 , 115046. https://doi.org/10.1016/j.socscimed.2022.115046

Smith, S. C., Hendriks, A. A. J., Cano, S. J., & Black, N. (2020). Proxy reporting of health-related quality of life for people with dementia: A psychometric solution. Health and Quality of Life Outcomes, 18 (1), 148. https://doi.org/10.1186/s12955-020-01396-y

Article   CAS   PubMed   PubMed Central   Google Scholar  

Andrieu, S., Coley, N., Rolland, Y., Cantet, C., Arnaud, C., Guyonnet, S., Nourhashemi, F., Grand, A., Vellas, B., & group, P. (2016). Assessing Alzheimer’s disease patients’ quality of life: Discrepancies between patient and caregiver perspectives. Alzheimer’s & Dementia, 12 (4), 427–437. https://doi.org/10.1016/j.jalz.2015.09.003

Jönsson, L., Andreasen, N., Kilander, L., Soininen, H., Waldemar, G., Nygaard, H., Winblad, B., Jonhagen, M. E., Hallikainen, M., & Wimo, A. (2006). Patient- and proxy-reported utility in Alzheimer disease using the EuroQoL. Alzheimer Disease & Associated Disorders, 20 (1), 49–55. https://doi.org/10.1097/01.wad.0000201851.52707.c9

Zucchella, C., Bartolo, M., Bernini, S., Picascia, M., & Sinforiani, E. (2015). Quality of life in Alzheimer disease: A comparison of patients’ and caregivers’ points of view. Alzheimer Disease & Associated Disorders, 29 (1), 50–54. https://doi.org/10.1097/WAD.0000000000000050

Article   CAS   Google Scholar  

Buckley, T., Fauth, E. B., Morrison, A., Tschanz, J., Rabins, P. V., Piercy, K. W., Norton, M., & Lyketsos, C. G. (2012). Predictors of quality of life ratings for persons with dementia simultaneously reported by patients and their caregivers: The Cache County (Utah) study. International Psychogeriatrics, 24 (7), 1094–1102. https://doi.org/10.1017/S1041610212000063

Article   PubMed   PubMed Central   Google Scholar  

Schiffczyk, C., Romero, B., Jonas, C., Lahmeyer, C., Muller, F., & Riepe, M. W. (2010). Generic quality of life assessment in dementia patients: A prospective cohort study. BMC Neurology, 10 , 48. https://doi.org/10.1186/1471-2377-10-48

Sousa, M. F., Santos, R. L., Arcoverde, C., Simoes, P., Belfort, T., Adler, I., Leal, C., & Dourado, M. C. (2013). Quality of life in dementia: The role of non-cognitive factors in the ratings of people with dementia and family caregivers. International Psychogeriatrics, 25 (7), 1097–1105. https://doi.org/10.1017/S1041610213000410

Arons, A. M., Krabbe, P. F., Scholzel-Dorenbos, C. J., van der Wilt, G. J., & Rikkert, M. G. (2013). Quality of life in dementia: A study on proxy bias. BMC Medical Research Methodology, 13 , 110. https://doi.org/10.1186/1471-2288-13-110

Gomez-Gallego, M., Gomez-Garcia, J., & Ato-Lozano, E. (2015). Addressing the bias problem in the assessment of the quality of life of patients with dementia: Determinants of the accuracy and precision of the proxy ratings. The Journal of Nutrition, Health & Aging, 19 (3), 365–372. https://doi.org/10.1007/s12603-014-0564-7

Moon, H., Townsend, A. L., Dilworth-Anderson, P., & Whitlatch, C. J. (2016). Predictors of discrepancy between care recipients with mild-to-moderate dementia and their caregivers on perceptions of the care recipients’ quality of life. American Journal of Alzheimer’s Disease & Other Dementias, 31 (6), 508–515. https://doi.org/10.1177/1533317516653819

Crespo, M., Bernaldo de Quiros, M., Gomez, M. M., & Hornillos, C. (2012). Quality of life of nursing home residents with dementia: A comparison of perspectives of residents, family, and staff. The Gerontologist, 52 (1), 56–65. https://doi.org/10.1093/geront/gnr080

Griffiths, A. W., Smith, S. J., Martin, A., Meads, D., Kelley, R., & Surr, C. A. (2020). Exploring self-report and proxy-report quality-of-life measures for people living with dementia in care homes. Quality of Life Research, 29 (2), 463–472. https://doi.org/10.1007/s11136-019-02333-3

Pickard, A. S., & Knight, S. J. (2005). Proxy evaluation of health-related quality of life: A conceptual framework for understanding multiple proxy perspectives. Medical Care, 43 (5), 493–499. https://doi.org/10.1097/01.mlr.0000160419.27642.a8

Smith, S. C., Lamping, D. L., Banerjee, S., Harwood, R. H., Foley, B., Smith, P., Cook, J. C., Murray, J., Prince, M., Levin, E., Mann, A., & Knapp, M. (2007). Development of a new measure of health-related quality of life for people with dementia: DEMQOL. Psychological Medicine, 37 (5), 737–746. https://doi.org/10.1017/S0033291706009469

Article   CAS   PubMed   Google Scholar  

Brooks, R. (1996). EuroQol: The current state of play. Health Policy, 37 (1), 53–72. https://doi.org/10.1016/0168-8510(96)00822-6

Herdman, M., Gudex, C., Lloyd, A., Janssen, M., Kind, P., Parkin, D., Bonsel, G., & Badia, X. (2011). Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Quality of Life Research, 20 (10), 1727–1736. https://doi.org/10.1007/s11136-011-9903-x

Rand, S., Caiels, J., Collins, G., & Forder, J. (2017). Developing a proxy version of the adult social care outcome toolkit (ASCOT). Health and Quality of Life Outcomes, 15 (1), 108. https://doi.org/10.1186/s12955-017-0682-0

Engel, L., Bucholc, J., Mihalopoulos, C., Mulhern, B., Ratcliffe, J., Yates, M., & Hanna, L. (2020). A qualitative exploration of the content and face validity of preference-based measures within the context of dementia. Health and Quality of Life Outcomes, 18 (1), 178. https://doi.org/10.1186/s12955-020-01425-w

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hrobjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. British Medical Journal, 372 , n71. https://doi.org/10.1136/bmj.n71

Roydhouse, J. K., Cohen, M. L., Eshoj, H. R., Corsini, N., Yucel, E., Rutherford, C., Wac, K., Berrocal, A., Lanzi, A., Nowinski, C., Roberts, N., Kassianos, A. P., Sebille, V., King, M. T., Mercieca-Bebber, R., Force, I. P. T., & the, I. B. o. D. (2022). The use of proxies and proxy-reported measures: A report of the international society for quality of life research (ISOQOL) proxy task force. Quality of Life Research, 31 (2), 317–327. https://doi.org/10.1007/s11136-021-02937-8

The EndNote Team. (2013). EndNote (Version EndNote X9) [64 bit]. Philadelphia, PA: Clarivate.

Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—A web and mobile app for systematic reviews. Systematic Reviews, 5 (1), 210. https://doi.org/10.1186/s13643-016-0384-4

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15 (2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14 , 135. https://doi.org/10.1186/1471-2288-14-135

Higgins JPT and Green S (editors). (2011). Cochrane handbook for systematic reviews of interventions.Version 5.1.0 [updated March 2011]. Retrieved 20 Jan 2023, from https://handbook-5-1.cochrane.org/chapter_7/7_7_3_2_obtaining_standard_deviations_from_standard_errors_and.htm

Ma, J., Liu, W., Hunter, A., & Zhang, W. (2008). Performing meta-analysis with incomplete statistical information in clinical trials. BMC Medical Research Methodology, 8 , 56. https://doi.org/10.1186/1471-2288-8-56

Deeks, J. J., Higgins, J. P. T., Altman, D. G., & on behalf of the Cochrane Statistical Methods Group. (2023). Chapter 10: Analysing data and undertaking meta-analyses. In J. Higgins & J. Thomas (Eds.), Cochrane Handbook for Systematic Reviews of Interventions. Version 6.4.

Kmet, L. M., Cook, L. S., & Lee, R. C. (2004). Standard quality assessment criteria for evaluating primary research papers from a variety of fields: Health and technology assessment unit. Alberta Heritage Foundation for Medical Research.

Lewis, T. J., & Trempe, C. L. (2017). Diagnosis of Alzheimer’s: Standard-of-care . USA: Elsevier Science & Technology.

Google Scholar  

Leontjevas, R., Teerenstra, S., Smalbrugge, M., Koopmans, R. T., & Gerritsen, D. L. (2016). Quality of life assessments in nursing homes revealed a tendency of proxies to moderate patients’ self-reports. Journal of Clinical Epidemiology, 80 , 123–133. https://doi.org/10.1016/j.jclinepi.2016.07.009

Lapin, B., Cohen, M. L., Corsini, N., Lanzi, A., Smith, S. C., Bennett, A. V., Mayo, N., Mercieca-Bebber, R., Mitchell, S. A., Rutherford, C., & Roydhouse, J. (2023). Development of consensus-based considerations for use of adult proxy reporting: An ISOQOL task force initiative. Journal of Patient-Reported Outcomes, 7 (1), 52. https://doi.org/10.1186/s41687-023-00588-6

Li, M., Harris, I., & Lu, Z. K. (2015). Differences in proxy-reported and patient-reported outcomes: assessing health and functional status among medicare beneficiaries. BMC Medical Research Methodology . https://doi.org/10.1186/s12874-015-0053-7

Robertson, S., Cooper, C., Hoe, J., Lord, K., Rapaport, P., Marston, L., Cousins, S., Lyketsos, C. G., & Livingston, G. (2020). Comparing proxy rated quality of life of people living with dementia in care homes. Psychological Medicine, 50 (1), 86–95. https://doi.org/10.1017/S0033291718003987

Khanna, D., Khadka, J., Mpundu-Kaambwa, C., Lay, K., Russo, R., Ratcliffe, J., & Quality of Life in Kids: Key Evidence to Strengthen Decisions in Australia Project, T. (2022). Are We Agreed? Self- versus proxy-reporting of paediatric health-related quality of life (HRQoL) Using generic preference-based measures: A systematic review and meta-analysis. PharmacoEconomics, 40 (11), 1043–1067. https://doi.org/10.1007/s40273-022-01177-z

Download references

Open Access funding enabled and organized by CAUL and its Member Institutions. This study was conducted without financial support.

Author information

Authors and affiliations.

Monash University Health Economics Group, School of Public Health and Preventive Medicine, Monash University, Level 4, 553 St. Kilda Road, Melbourne, VIC, 3004, Australia

Lidia Engel & Valeriia Sokolova

School of Health and Social Development, Deakin University, Burwood, VIC, Australia

Ekaterina Bogatyreva & Anna Leuenberger

You can also search for this author in PubMed   Google Scholar

Contributions

LE contributed to the study conception and design. The original database search was performed by AL and later updated by VS. All authors were involved in the screening process, data extraction, and data analyses. Quality assessment was conducted by VS and LE. The first draft of the manuscript was written by LE and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lidia Engel .

Ethics declarations

Competing interests.

Lidia Engel is a member of the EuroQol Group.

Ethical approval

Not applicable.

Consent to participate

Consent to publish, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 67 KB)

Supplementary file2 (docx 234 kb), rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Engel, L., Sokolova, V., Bogatyreva, E. et al. Understanding the influence of different proxy perspectives in explaining the difference between self-rated and proxy-rated quality of life in people living with dementia: a systematic literature review and meta-analysis. Qual Life Res (2024). https://doi.org/10.1007/s11136-024-03660-w

Download citation

Accepted : 27 March 2024

Published : 24 April 2024

DOI : https://doi.org/10.1007/s11136-024-03660-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quality of Life
  • Outcome measurement
  • Find a journal
  • Publish with us
  • Track your research

SYSTEMATIC REVIEW article

Predicting histologic grades for pancreatic neuroendocrine tumors by radiologic image-based artificial intelligence: a systematic review and meta-analysis.

Qian Yan,&#x;

  • 1 Department of General Surgery, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
  • 2 School of Medicine, South China University of Technology, Guangzhou, China
  • 3 Department of General Surgery, Heyuan People’s Hospital, Heyuan, China

Background: Accurate detection of the histological grade of pancreatic neuroendocrine tumors (PNETs) is important for patients’ prognoses and treatment. Here, we investigated the performance of radiological image-based artificial intelligence (AI) models in predicting histological grades using meta-analysis.

Method: A systematic literature search was performed for studies published before September 2023. Study characteristics and diagnostic measures were extracted. Estimates were pooled using random-effects meta-analysis. Evaluation of risk of bias was performed by the QUADAS-2 tool.

Results: A total of 26 studies were included, 20 of which met the meta-analysis criteria. We found that the AI-based models had high area under the curve (AUC) values and showed moderate predictive value. The pooled distinguishing abilities between different grades of PNETs were 0.89 [0.84-0.90]. By performing subgroup analysis, we found that the radiomics feature-only models had a predictive value of 0.90 [0.87-0.92] with I 2  = 89.91%, while the pooled AUC value of the combined group was 0.81 [0.77-0.84] with I 2  = 41.54%. The validation group had a pooled AUC of 0.84 [0.81-0.87] without heterogenicity, whereas the validation-free group had high heterogenicity (I 2  = 91.65%, P=0.000). The machine learning group had a pooled AUC of 0.83 [0.80-0.86] with I 2  = 82.28%.

Conclusion: AI can be considered as a potential tool to detect histological PNETs grades. Sample diversity, lack of external validation, imaging modalities, inconsistent radiomics feature extraction across platforms, different modeling algorithms and software choices were sources of heterogeneity. Standardized imaging, transparent statistical methodologies for feature selection and model development are still needed in the future to achieve the transformation of radiomics results into clinical applications.

Systematic Review Registration: https://www.crd.york.ac.uk/prospero/ , identifier CRD42022341852.

Introduction

Pancreatic neuroendocrine tumors (PNETs), which account for 3–5% of all pancreatic tumors, are a heterogeneous group of tumors derived from pluripotent stem cells of the neuroendocrine system ( 1 – 3 ). In the past 10 years, the incidence and prevalence of PNETs have steadily increased ( 4 – 6 ). Unlike malignant tumors, PNETs are heterogeneous: they range from indolent to aggressive ( 7 , 8 ). The World Health Organization (WHO) histological grading system is used to evaluate the features of PNETs, and a treatment strategy is developed accordingly ( 9 , 10 ). Therefore, accurate evaluation of the histological grade is crucial for patients with PNETs; non-invasive methods are helpful, especially for tumors that are difficult to biopsy.

The application of artificial intelligence (AI) to medicine is becoming more common; it is useful in areas such as radiology, pathology, genomics, and proteomics ( 11 – 14 ), with broad applications in disease diagnosis and treatment ( 15 – 18 ). Owing to developments in AI technology, radiomic analysis can now be used to predict PNETs grade, with promising results ( 19 , 20 ). A study by Guo et al. ( 21 ), which included 37 patients with PNETs, showed that the portal enhancement ratio, arterial enhancement ratio, mean grey-level intensity, kurtosis, entropy, and uniformity were significant predictors of histological grade. Luo et al. ( 22 ) found that by using specific computed tomography (CT) images, the deep learning (DL) algorithm achieved a higher accuracy rate than radiologists (73.12% vs. 58.1%) from G3 to G1/G2. Despite promising results, other studies with different methodologies have produced different findings. Thus, quantitative analysis will be valuable for comparing study efficacy and assessing the overall predictive power of AI in detecting the histological grade for PNETs.

In this review, we aimed to systematically summarize the latest literature on AI histological grade prediction for PNETs. By performing a meta-analysis, we aimed to evaluate AI accuracy and provide evidence for its clinical application and role in decision making.

Materials and methods

This combined systematic review and meta-analysis was based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. The study was registered in the Prospective Register of Systematic Reviews (PROSPERO ID: CRD42022341852).

Search strategy

Primary publications were extracted from multiple electronic databases (PubMed, MEDLINE, Cocorane and Web of Science) in September 2023 using radiomics/DL/machine learning (ML)/AI on CT/magnetic resonance imaging (MRI) examinations of PNETs grade. The search terms consisted of ML, AI, radiomics, or DL, along with PNETs grade. The detail of search string was as follows: (radiomics or machine learning or deep learning or artificial intelligence)and (PNETs or pancreatic neuroendocrine tumors). The reference lists of generated studies were then screened for eligibility.

Study selection

Two researchers determined the eligibility of each article by title and abstract evaluation and removed the duplicates. Case reports, non-original investigations (e.g., editorials, letters, and reviews), and studies that did not focus on the topic of interest were excluded. Based on the “PICOS” principle, the following inclusion criteria were designed. 1) All studies about PNETs grading which trained the models using only histology (and not biopsy) as gold standard were selected; 2) All PNETs grading predictive models built by AI were included. 3) Compared with physicians or models obtained from clinical and traditional imaging characteristics; 4) The main research purposes of the included studies were to differentiate the grades of PNETs; 5) Research types: case-control studies, cohort studies, nested case-control studies, and case-cohort studies; 6) English language. Exclusion criteria were: 1) Only the influencing factors were analyzed and a complete risk model was not built; 2) guides, case reports and non-original investigations (e.g., editorials, letters, meta-analyses and reviews); 3) other than English and animal studies. Any disagreements were resolved by consensus arbitrated by a third author.

Data extraction

Data extraction was performed independently by two reviewers, and any discrepancies were resolved by a third reviewer. The extracted data included first author, country, year of publication, study aim, study type, number of patients, sample size, validation, treatment, reference standard, imaging modality and features, methodology, model features and algorithm, software segmentation, and use of clinical information (e.g., age, tumor stage, and expression biomarkers). A detailed description of the true positive (TP), false positive (FP), true negative (TN), false negative (FN), sensitivity, and specificity were recorded. The AUC value of the validation group along with the 95% confidence interval (CI) or standard error (SE) of the model was also collected if reported.

Quality assessment

All included studies were independently assessed using the radiomics quality score (RQS), for image acquisition, radiomics feature extraction, data modeling, model validation, and data sharing. Each of the sixteen items was scored within a range of -8–36. Subsequently, the score was converted to a percentage, where -8 to 0 was defined as 0% and 36 as 100% ( 23 ).

The methodological quality of the included studies was accessed by the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria ( 24 ). Two reviewers independently performed data extraction and quality assessment. Disagreements between the two reviewers were discussed at a research meeting until a consensus was reached.

Statistical analysis

Three software packages, Stata, version 12.0, MedCalc for Windows, version 16.4.3 (MedCalc Software, Ostend, Belgium), and RevMan, version 5.3.21 were used for statistical analysis. A bivariate meta-analysis model was employed to calculate the pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR), respectively. The symmetric receiver operating characteristic (SROC) curve was generated. The I 2 value was used to assess statistical heterogeneity and estimate the percentage of variability among the included studies. An I 2 value >50% indicated substantial heterogeneity, and a random-effects model was used to analyze the differences within and between studies. In contrast, if the value was <50%, it signified less heterogeneity and a fixed-effects model was used ( 25 ). Meta-regression and subgroup analysis were conducted to explore the sources of heterogeneity. Moreover, the sensitivity analysis was also performed to evaluate the stability. Deeks’ funnel plot was used to examine publication bias. A p value less than 0.05 was considered significant. Fagan’s nomogram was employed to examine the post-test probability.

Literature selection

We retrieved 260 articles from PubMed and 156 from Web of Science; 137 were duplicates and were eliminated, resulting in 343 studies. After screening titles and abstracts, 85 potentially eligible articles were identified. After full-text review, six articles were excluded because of insufficient information; thus, 26 articles were included in this systematic review ( 21 , 22 , 26 – 49 ). Among them, six studies lacked information on positive and negative diagnosis values; therefore, only 20 articles were eligible for the meta-analysis. The results of the literature search are shown in Figure 1 .

www.frontiersin.org

Figure 1 Flowchart of the article selection.

Quality and risk bias assessment

As shown in Table 1 , the selected articles were published between 2018 and 2023. The RQS average total and relative scores were 9.58 (2–20) and 26.60% (5.56–55.56%), respectively. No validation group in 13 studies, and five were based on two datasets from more than two distinct institutes. Due to the lack of prospective studies, deficiency of phantom studies on all scanners, absence of imaging at multiple time points, shortness of cost-effectiveness analysis, and unavailable open science and data, all the 11 included studies obtained the point of zero in these items. A detailed report of the RQS allocated by the expert reader is presented in Supplementary Table S1 .

www.frontiersin.org

Table 1 Characteristic of all included studies.

Study quality and risk of bias were assessed using the QUADAS-2 criteria; the details are presented in Supplementary Figure S1 . A majority of studies showed a low or unclear risk of bias in each domain. In the Patient Selection domain, one study is at high risk, 25 studies are at moderate risk, and this risk mainly arises from “discontinuous patient inclusion”. In the Index Test domain, 9 studies are at moderate risk due to the insufficient information provided to make a judgment, while others were at low risk. In the Reference Standard domain, only one study is at high risk because some patients cannot be accurately categorized to the specific grading in this study. In the Flow and Timing domain, all were at low risk.

Publication bias

Deeks’ funnel plot asymmetry test was adopted to detect publication bias: no bias was detected within the meta-analysis (p=0.347, Figure 2 ).

www.frontiersin.org

Figure 2 Deeks’ funnel plot evaluating the potential publication bias (p=0.034).

Clinical diagnostic value of grading PNETs

As shown in Figure 3 , Fagan’s nomogram was useful for evaluating the diagnostic value of PNETs grade, and clinical application. The results showed an increase of post-test probability of the positive result (at 50%) to 81%, and a decrease of the negative result to 4%.

www.frontiersin.org

Figure 3 Fagan’s nomogram assessing the clinical diagnostic value of PNETs.

Study characteristics

Study characteristics are summarized in Table 1 . All studies employed a retrospective design, were published between 2018 and 2023, and the number of included patients was 32–270. Among the 26 studies, China was the main publication country (15 studies), followed by Italy (5 studies), the USA (3 studies), Korea (2 study), and Japan (1 study). Nineteen studies were based on CT and eight on MRI images, while two combined images from CT and MRI, and one applied for PET-CT. Thirteen of the 26 studies had validation sets; five were externally validated using data from another institute. The majority (20/26) used different kinds of ML classifications (such as Randon Rorest (RF); Support Vector Machine (SVM); Least absolute shrinkage and selection operator (LASSO) logistic regression), and two of them adopted Convolutional Neural Network (CNN). About half of the included studies (11/21) used models combined with clinical features (such as tumor size, tumor margin, TMN stage, etc.), while others used only radiomics features. Thirteen studies applied cross-validation to select stable features between observers.

The details of TP/FP/FN/TN and the models’ sensitivity and specificity are shown in Table 2 . The highest area under the curve (AUC) value of the AI-based validation model was 0.99 (95% CI: 0.97–1.00). Six studies offered no details regarding TP/FP/FN/TN, and the AUC value of four studies was incomplete; thus, all of these six studies were excluded in meta-analysis.

www.frontiersin.org

Table 2 Results for accuracy to predict grade of PNETs.

Meta-analysis

Overall performance of the ai models.

Twenty studies with 2639 patients were included in the meta-analysis, which provided data on TP/FP/FN/TN and model sensitivity and specificity, and 19 studies offered the AUC with 95% CI of the models. The results are reported in Tables 2 and 3 and Figure 4 . The AI models for PNETs showed an overall pooled sensitivity of 0.826 [0.759, 0.877], a pooled specificity of 0.812 [0.765, 0.851] and the pooled PLR and NLR were 4.382 [3.509, 5.472] and 0.215 [0.155, 0.298], respectively. Moreover, the pooled DOR was 20.387 [13.108, 31.706], and the AUC of the SROC curve was 0.89 [0.84-0.90] with I 2  = 90.42% [81.10-99.73], P=0.000.

www.frontiersin.org

Table 3 Subgroup analysis and estimates pooled of PNETs.

www.frontiersin.org

Figure 4 Pooled diagnostic accuracy of PNETs. (A, B) Forest plots of sensitivity, specificity; (C) . Summary receiver operator characteristic curve.

Subgroup analysis based on the image source and AI methodology

Meta-regression was conducted and found there was no significant differences between groups ( Supplementary Table S2 ). Then subgroup analysis was performed to compare studies evaluating the performance of different image sources: CT and MRI. Two models used both CT and MRI images; thus, 16 models extracted radiomic features from CT images and six models from MRI. The pooled SE, SP, PLR, and NLR were 0.849 [0.786, 0.895], 0.803 [0.748, 0.847], 4.297 [3.386, 5.451], and 0.189 [0.134, 0.266], respectively for CT models, and 0.791 [0.643, 0.888], 0.820 [0.764, 0.866], 4.407 [3.206, 6.058], 0.255 [0.141, 0.459], respectively for MRI models. The pooled DOR was 22.769 [14.707, 35.250] and 17.304 [7.713, 38.822] for CT and MRI models, respectively. The AUC of the SROC curve was 0.88 [0.85-0.91] with heterogeneity (I 2  = 79.25% [55.20-100.00], P=0.004) on CT images compared with MRI (AUC=0.83 [0.79-0.86], I 2  = 71.55%[36.80-100.00], P=0.015).

Subgroup analysis of different AI methodologies was used to compare algorithm architecture; most models not only applied ML classifiers, but more than one classifier. In total, 15 models were conducted using ML for PNETs. The pooled SE, SP, PLR, and NLR were 0.806 [0.727, 0.867, 0.789 [0.742, 0.829], 3.813 [3.156, 4.606], and 0.246 [0.175, 0.346], respectively. The pooled DOR was 15.508 [10.196, 23.589] and the AUC of the SROC curve was 0.84[0.81-0.87] with heterogenicity, I 2  = 89.88%[79.90-99.86]. Of the remaining three models for non-ML, the pooled AUC value was 0.89 [0.86-0.92] with I 2  = 28.80[0.00-100.00] ( Table 3 ).

There were ten models using cross-validation to select the best features and models. The group with cross-validation had a pooled AUC of 0.87 [0.83-0.91] with I 2  = 78.98%, while the group without was 0.88 [0.84-0.90] with I 2  = 75.30%. The pooled SE, SP, PLR, and NLR were 0.831 [0.784, 0.871], 0.785 [0.737, 0.828], 3.523 [2.812, 4.414] and 0.196 [0.127, 0.302], respectively for the cross-validation group, and 0.799 [0.670, 0.866], 0.835 [0.772, 0.884], 4.849 [3.365, 6.698], and 0.241 [0.141, 0.413], respectively for the group without ICC. The pooled DOR were 20.262 [12.084, 33.973] and 20.120 [9.171, 44.140] for the groups with and without ICC, respectively.

Subgroup analysis based on dataset characteristics

We also compared the models that included clinical data and by utilizing radiomics features only, and found that clinical features reduced heterogenicity. The pooled SE, SP, PLR, and NLR were 0.801 [0.707, 0.870], 0.795 [0.739, 0.842], 3.906 [2.983, 5.115], and 0.251 [0.166, 0.379], respectively for the group including clinical data, and 0.847 [0.747, 0.913], 0.829 [0.749, 0.888], 4.970 [3.349, 7.377], and 0.184 [0.109, 0.310], respectively for the radiomics-only group. The pooled DOR for the radiomics group was 27.034 [13.412, 54.492], and the AUC of the SROC curve was 0.81 [0.77-0.84] with I 2  = 41.54%, which was a little higher than that of the included clinical data group (DOR: 16.581 [9.466, 29.044]); AUC: 0.90 [0.87-0.92]) ( Table 3 ).

Moreover, 11 models were validated, while nine models were not. The pooled SE, SP, PLR, and NLR were 0.823 [0.754, 0.876], 0.799 [0.744, 0.846], 4.106 [3.128, 5.389], and 0.221 [0.155, 0.315], respectively for the validated group, and 0.836 [0.708, 0.914], 0.824 [0.741, 0.884], 4.741 [3.248, 6.920], and 0.199 [0.110,0.361], respectively for the control group. The pooled DOR for the validated group was 15.574 [8.579, 28.273] and 23.766 [11.504, 49.095] for the control group. The AUC of the SROC curve was 0.84 [0.81-0.87] without heterogeneity for the validation group and 0.89 [0.86-0.91] with I 2  = 91.65% for the control group.

In a subgroup analysis based on the number of patients, the pooled results of 12 models, which included >100 patients, were 0.815 [0.737, 0.874], 0.784 [0.735, 0.826], 3.769 [3.086, 4.603], and 0.236 [0.165, 0.337] for the pooled SE, SP, PLR, and NLR, respectively. For the remaining eight models, the pooled SE, SP, PLR, and NLR were 0.847 [0.715, 0.925], 0.871 [0.799, 0.920], 6.560 [4.224, 10.187], and 0.175 [0.091, 0.338], respectively. The pooled DOR and the AUC values for the two groups were 15.974 [10.228, 24.948] and 0.84 [0.81-0.87] vs. 37.404 [16.542, 84.577] and 0.91 [0.88-0.93] ( Table 3 ).

PNETs are a heterogeneous group of malignancies: they can be grouped into grades G1, G2, and G3 according to mitotic count and Ki-67 index ( 1 – 3 ). Accurate classification of PNETs grades is important for treatment selection, prognosis, and follow-up. However, due to the heterogeneity of PNETs, tumor grading may not be uniform within a single lesion or between different lesions in the same patient ( 7 , 8 ). Moreover, histology is currently the only validated tool to grade tumors and describe their characteristics; surgery and endoscopic biopsy are used clinically to analyze the histological grade of PNETs. However, it is difficult to perform a satisfactory biopsy for PNETs located around major vessels, or small tumors—especially using fine-needle aspiration biopsy ( 50 – 53 ). Therefore, the detection of histological grades based on radiological images is also an important diagnostic tool. With increasing AI application in medical fields, we believe that AI-based models can enhance the prediction value of tumor grading. To the best of our knowledge, we are only aware of few and insufficiently updated systematic review on this topic that has evaluated the diagnostic accuracy of radiomics.

In our study, we investigated the ability of imaging-based AI to detect PNETs histologic grading. Our results showed that AI-based grading of PNETs with an AUC of 0.89 [0.84 - 0.90] exhibited good performance but high heterogeneity (I 2  = 90.42% [81.10-99.73], P = 0.000). Among the included studies, we found considerable heterogeneity in pooled sensitivity and specificity. Moreover, according to our sensitive analysis, 3 articles ( 29 , 40 , 46 ) had poor robustness and may be one of the sources of heterogeneity ( Figure S2 ). There was no significant publication bias between studies.

The diagnostic performance of the radiomics model varied with the strategies employed. CT and MRI images are the main sources for analyzing PNETs. Because of its high availability and low cost, CT is widely used than MRI. In this study, we found that imaging techniques may be influencing factors of prediction power, but not independently so. CT was more commonly used (16 studies) and showed better performance than MRI (6 studies) in grading PNETs, with an AUC of 0.88 [0.85-0.91] vs. 0.83 [0.79-0.86]. Although unconfirmed, we speculate that CT may be more powerful for obtaining vessel enhancement characteristics and observing the neo-vascular distribution, which is useful in vascularly-rich PNETs ( 54 ). Future studies are needed to validate this finding. We had only one study applied PET-CT grading PNETs and found AUROC of 0.864 in the tumor grade prediction model which showed good forecasting ability ( 47 ). Thus, more investigation into PET-CT will be useful in developing AI models, which showing good predictive performance (AUC = 0.992) and can detect cell surface expression of somatostatin receptors ( 55 , 56 ).

Clinical data such as age, gender, tumor size, tumor shape, tumor margin and CT stage are closely related to the pathogenic process of PNETs and therefore should not be ignored in diagnostic models ( 27 – 29 , 47 , 49 ).,Liang et al. ( 37 ) built a combined model which can improve the performance (0.856, [0.730–0.939] vs. 0.885 [0.765–0.957]). Wang et al. ( 42 ) found that the addition of clinical features can improve the radiomics models (from 0.837 [0.827–0.847] to 0.879 [0.869–0.889]). However, we found that including clinical factors did not always result in better performance but did decrease the heterogenicity (AUC of 0.81 [0.77-0.84] with I 2  = 41.94% vs. 0.90 [0.87-0.92] with I 2  = 89.91%). This may due to the data are processed differently, such as age or other clinical numerical data can be easily quantified by radiomic modeling (i.e., age as a variable in an algorithm or function). And in clinical models, age regarded as risk factors always varied in different situations. Therefore, future radiomics analyses should incorporate clinical features to create more reliable models or add radiomics features to existing diagnostic models to validate their true diagnostic power.

The lack of standardized quality control and reporting throughout the workflow limits the application of radiomics ( 17 , 57 ). For example, validation/testing data must remain completely independent or hidden until validation/testing is performed in order to create generalizable predictive models at each step of a radiomics study. In our study, 11 studies of 20 had validation set and only 3 had external validation. Lack of proper external validation would influence the transportability and generalizability of the models in the studies and also hamper the conclusions of the review. Moreover, according to our findings, lacking validation sets was also one of the main causes of heterogeneity. There should be no direct comparison between the results obtained by studying only the primary cohort and those obtained by studying both the primary and validation cohorts. Validated models should be considered more reliable and promising, even if the reported performance is lower.

As shown in Table 1 , there were also a wide variety of feature extraction and model selection methods, and although AI classifiers did not show outstanding diagnostic performance in our evaluation, it is undeniably a future research direction and trend. Most of the included studies used more than one machine learning or deep learning for feature selection or classification, but the best performing AI classifiers varied from study to study. To date, there are no universal and well-recognized classifiers, and the characteristics of the samples are a key factor affecting the performance of classifiers ( 58 , 59 ). Finding uniform and robust classifiers for specific medical problems has always been a challenge.

Despite the encouraging results of this meta-analysis, the overall methodological quality of the included literature was poor, reducing the reliability and reproducibility of radiomics models for clinical applications. Lack of prospective studies with scanner modeling studies, lack of imaging studies at multiple time points, insufficient validation and calibration validity of the models, short time frame for cost-effectiveness analyses, insufficient cost-effectiveness analyses, and lack of publicly available science and data contributed to the low RQS scores. In addition, only half of the studies were internally validated and less independent external validation. To further standardize the process and improve the quality of radiomics, the RQS should be used not only to assess the methodological quality of radiomics studies, but also to guide the design of radiomics studies ( 17 ).

Diversity of the samples, inconsistencies with radiomics feature extraction across platforms, different modeling algorithms, and simultaneous incorporation of clinical features may all account for the high heterogenicity of the combined models. According to our sub-analysis results, the heterogenicity mainly came from the different imaging materials (CT vs MRI), the algorithm architecture (ML vs non-ML), whether validated or not and clinical features included. Thus, standardized imaging, a standardized independent and robust set of features, as well as validation even external validation are all approaches to lower the heterogenicity and highlights for attention in future research. To sum, the AI method was effective in the preoperative prediction of PNETs grade; this may help with the understanding of tumor behavior, and facilitate vision-making in clinical practice.

Our study has several limitations. First, most included studies were single-center and retrospective, inevitably causing patient selection bias. Second, different methods were investigated, including the type of imaging scans utilized, the type and number of radiological features studied, the choice of software, and the type of analysis/methods implemented, thus leading to the high heterogeneity among studies. Therefore, some pooled estimates of the quantitative results must be interpreted with caution. Further prospective studies could validate these results; a stable method of data extraction and analysis is important for developing a reproducible AI model.

Conclusions

Overall, this meta-analysis demonstrated the value of AI models in predicting PNETs grading. According to our result, diversity of the samples, lack of external validation, imaging modalities, inconsistencies with radiomics feature extraction across platforms, different modeling algorithms and the choice of software all are sources of heterogeneity. Thus, standardized imaging, as well as a standardized, independent and robust set of features will be important for future application. Multi-center, large-sample, randomized clinical trials could be used to confirm the predictive power of image-based AI systems in clinical practice. To sum, AI can be considered as a potential tool to detect histological PNETs grades.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary Material . Further inquiries can be directed to the corresponding authors.

Author contributions

QY: Conceptualization, Methodology, Visualization, Writing – original draft. YC: Data curation, Methodology, Software, Writing – original draft. CL: Data curation, Investigation, Software, Writing – original draft. HS: Data curation, Investigation, Validation, Writing – original draft. MH: Formal analysis, Software, Writing – original draft. ZW: Investigation, Validation, Writing – original draft. SH: Data curation, Formal analysis, Funding acquisition, Writing – review & editing. CZ: Funding acquisition, Project administration, Supervision, Writing – review & editing. BH: Funding acquisition, Project administration, Supervision, Writing – review & editing.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was supported by the Natural Science Foundation of China (82102961, 82072635 and 82072637), High-level Hospital Construction Research Project of Heyuan People's Hospital (YNKT202202), the Science and Technology Program of Heyuan (23051017147335), the Science and Technology Program of Guangzhou (2024A04J10016 and 202201011642).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2024.1332387/full#supplementary-material

Supplementary Figure 1 | The quality assessment of 26 included studies by QUADAS-2 tool.

Supplementary Figure 2 | The sensitive analysis of 26 included studies.

1. Chang A, Sherman SK, Howe JR, Sahai V. Progress in the management of pancreatic neuroendocrine tumors. Annu Rev Med . (2022) 73:213–29. doi: 10.1146/annurev-med-042320-011248

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Ma ZY, Gong YF, Zhuang HK, Zhou ZX, Huang SZ, Zou YP, et al. Pancreatic neuroendocrine tumors: A review of serum biomarkers, staging, and management. World J Gastroenterol . (2020) 26:2305–22. doi: 10.3748/wjg.v26.i19.2305

3. Cives M, Strosberg JR. Gastroenteropancreatic neuroendocrine tumors. CA Cancer J Clin . (2018) 68:471–87. doi: 10.3322/caac.21493

4. Pulvirenti A, Marchegiani G, Pea A, Allegrini V, Esposito A, Casetti L, et al. Clinical implications of the 2016 international study group on pancreatic surgery definition and grading of postoperative pancreatic fistula on 775 consecutive pancreatic resections. Ann Surg . (2018) 268:1069–75. doi: 10.1097/SLA.0000000000002362

5. Fan JH, Zhang YQ, Shi SS, Chen YJ, Yuan XH, Jiang LM, et al. A nation-wide retrospective epidemiological study of gastroenteropancreatic neuroendocrine neoplasms in China. Oncotarget . (2017) 8:71699–708. doi: 10.18632/oncotarget.17599

6. Yao JC, Hassan M, Phan A, Dagohoy C, Leary C, Mares JE, et al. One hundred years after "carcinoid": epidemiology of and prognostic factors for neuroendocrine tumors in 35,825 cases in the United States. J Clin Oncol . (2008) 26:3063–72. doi: 10.1200/JCO.2007.15.4377

7. Yang Z, Tang LH, Klimstra DS. Effect of tumor heterogeneity on the assessment of Ki67 labeling index in well-differentiated neuroendocrine tumors metastatic to the liver: implications for prognostic stratification. Am J Surg Pathol . (2011) 35:853–60. doi: 10.1097/PAS.0b013e31821a0696

8. Partelli S, Gaujoux S, Boninsegna L, Cherif R, Crippa S, Couvelard A, et al. Pattern and clinical predictors of lymph node involvement in nonfunctioning pancreatic neuroendocrine tumors (NF-PanNETs). JAMA Surg . (2013) 148:932–9. doi: 10.1001/jamasurg.2013.3376

9. Marchegiani G, Landoni L, Andrianello S, Masini G, Cingarlini S, D'Onofrio M, et al. Patterns of recurrence after resection for pancreatic neuroendocrine tumors: who, when, and where? Neuroendocrinology . (2019) 108:161–71. doi: 10.1159/000495774

10. Nagtegaal ID, Odze RD, Klimstra D, Paradis V, Rugge M, Schirmacher P, et al. The 2019 WHO classification of tumours of the digestive system. Histopathology . (2020) 76:182–8. doi: 10.1111/his.13975

11. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer . (2018) 18:500–10. doi: 10.1038/s41568-018-0016-5

12. Jin P, Ji X, Kang W, Li Y, Liu H, Ma F, et al. Artificial intelligence in gastric cancer: a systematic review. J Cancer Res Clin Oncol . (2020) 146:2339–50. doi: 10.1007/s00432-020-03304-9

13. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat BioMed Eng . (2018) 2:719–31. doi: 10.1038/s41551-018-0305-z

14. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA . (2018) 319:1317–8. doi: 10.1001/jama.2017.18391

15. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol . (2022) 23:40–55. doi: 10.1038/s41580-021-00407-0

16. Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol . (2021) 68:132–42. doi: 10.1016/j.semcancer.2019.12.011

17. Bezzi C, Mapelli P, Presotto L, Neri I, Scifo P, Savi A, et al. Radiomics in pancreatic neuroendocrine tumors: methodological issues and clinical significance. Eur J Nucl Med Mol Imaging . (2021) 48:4002–15. doi: 10.1007/s00259-021-05338-8

18. Rauschecker AM, Rudie JD, Xie L, Wang J, Duong MT, Botzolakis EJ, et al. Artificial intelligence system approaching neuroradiologist-level differential diagnosis accuracy at brain MRI. Radiology . (2020) 295:626–37. doi: 10.1148/radiol.2020190283

19. Caruso D, Polici M, Rinzivillo M, Zerunian M, Nacci I, Marasco M, et al. CT-based radiomics for prediction of therapeutic response to Everolimus in metastatic neuroendocrine tumors. Radiol Med . (2022) 127:691–701. doi: 10.1007/s11547-022-01506-4

20. Yang J, Xu L, Yang P, Wan Y, Luo C, Yen EA, et al. Generalized methodology for radiomic feature selection and modeling in predicting clinical outcomes. Phys Med Biol . (2021) 66:10.1088/1361-6560/ac2ea5. doi: 10.1088/1361-6560/ac2ea5

CrossRef Full Text | Google Scholar

21. Guo C, Zhuge X, Wang Z, Wang Q, Sun K, Feng Z, et al. Textural analysis on contrast-enhanced CT in pancreatic neuroendocrine neoplasms: association with WHO grade. Abdom Radiol (NY) . (2019) 44:576–85. doi: 10.1007/s00261-018-1763-1

22. Luo Y, Chen X, Chen J, Song C, Shen J, Xiao H, et al. Preoperative prediction of pancreatic neuroendocrine neoplasms grading based on enhanced computed tomography imaging: validation of deep learning with a convolutional neural network. Neuroendocrinology . (2020) 110:338–50. doi: 10.1159/000503291

23. Lambin P, Leijenaar RTH, Deist TM. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol . (2017) 14:749–62. doi: 10.1038/nrclinonc.2017.141

24. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med . (2011) 155:529–36. doi: 10.7326/0003-4819-155-8-201110180-00009

25. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta- analyses. BMJ . (2003) 327:557–60. doi: 10.1136/bmj.327.7414.557

26. Benedetti G, Mori M, Panzeri MM, Barbera M, Palumbo D, Sini C, et al. CT-derived radiomic features to discriminate histologic characteristics of pancreatic neuroendocrine tumors. Radiol Med . (2021) 126:745–60. doi: 10.1007/s11547-021-01333-z

27. Bian Y, Jiang H, Ma C, Wang L, Zheng J, Jin G, et al. CT-based radiomics score for distinguishing between grade 1 and grade 2 nonfunctioning pancreatic neuroendocrine tumors. AJR Am J Roentgenol . (2020) 215:852–63. doi: 10.2214/AJR.19.22123

28. Bian Y, Zhao Z, Jiang H, Fang X, Li J, Cao K, et al. Noncontrast radiomics approach for predicting grades of nonfunctional pancreatic neuroendocrine tumors. J Magn Reson Imaging . (2020) 52:1124–36. doi: 10.1002/jmri.27176

29. Bian Y, Li J, Cao K, Fang X, Jiang H, Ma C, et al. Magnetic resonance imaging radiomic analysis can preoperatively predict G1 and G2/3 grades in patients with NF-pNETs. Abdom Radiol (NY) . (2021) 46:667–80. doi: 10.1007/s00261-020-02706-0

30. Canellas R, Burk KS, Parakh A, Sahani DV. Prediction of pancreatic neuroendocrine tumor grade based on CT features and texture analysis. AJR Am J Roentgenol . (2018) 210:341–6. doi: 10.2214/AJR.17.18417

31. Choi TW, Kim JH, Yu MH, Park SJ, Han JK. Pancreatic neuroendocrine tumor: prediction of the tumor grade using CT findings and computerized texture analysis. Acta Radiol . (2018) 59:383–92. doi: 10.1177/0284185117725367

32. Gao X, Wang X. Deep learning for World Health Organization grades of pancreatic neuroendocrine tumors on contrast-enhanced magnetic resonance images: a preliminary study. Int J Comput Assist Radiol Surg . (2019) 14:1981–91. doi: 10.1007/s11548-019-02070-5

33. Gu D, Hu Y, Ding H, Wei J, Chen K, Liu H, et al. CT radiomics may predict the grade of pancreatic neuroendocrine tumors: a multicenter study. Eur Radiol . (2019) 29:6880–90. doi: 10.1007/s00330-019-06176-x

34. Guo CG, Ren S, Chen X, Wang QD, Xiao WB, Zhang JF, et al. Pancreatic neuroendocrine tumor: prediction of the tumor grade using magnetic resonance imaging findings and texture analysis with 3-T magnetic resonance. Cancer Manag Res . (2019) 11:1933–44. doi: 10.2147/CMAR

35. Liu C, Bian Y, Meng Y, Liu F, Cao K, Zhang H, et al. Preoperative prediction of G1 and G2/3 grades in patients with nonfunctional pancreatic neuroendocrine tumors using multimodality imaging. Acad Radiol . (2022) 29:e49–60. doi: 10.1016/j.acra.2021.05.017

36. Li W, Xu C, Ye Z. Prediction of pancreatic neuroendocrine tumor grading risk based on quantitative radiomic analysis of MR. Front Oncol . (2021) 11:758062. doi: 10.3389/fonc.2021.758062

37. Liang W, Yang P, Huang R, Xu L, Wang J, Liu W, et al. A combined nomogram model to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Clin Cancer Res . (2019) 25:584–94. doi: 10.1158/1078-0432.CCR-18-1305

38. Ohki K, Igarashi T, Ashida H, Takenaga S, Shiraishi M, Nozawa Y, et al. Usefulness of texture analysis for grading pancreatic neuroendocrine tumors on contrast-enhanced computed tomography and apparent diffusion coefficient maps. Jpn J Radiol . (2021) 39:66–75. doi: 10.1007/s11604-020-01038-9

39. D'Onofrio M, Ciaravino V, Cardobi N, De Robertis R, Cingarlini S, Landoni L, et al. CT enhancement and 3D texture analysis of pancreatic neuroendocrine neoplasms. Sci Rep . (2019) 9:2176. doi: 10.1038/s41598-018-38459-6

40. Pulvirenti A, Yamashita R, Chakraborty J, Horvat N, Seier K, McIntyre CA, et al. Quantitative computed tomography image analysis to predict pancreatic neuroendocrine tumor grade. JCO Clin Cancer Inform . (2021) 5:679–94. doi: 10.1200/CCI.20.00121

41. Ricci C, Mosconi C, Ingaldi C, Vara G, Verna M, Pettinari I, et al. The 3-dimensional-computed tomography texture is useful to predict pancreatic neuroendocrine tumor grading. Pancreas . (2021) 50:1392–9. doi: 10.1097/MPA.0000000000001927

42. Wang X, Qiu JJ, Tan CL, Chen YH, Tan QQ, Ren SJ, et al. Development and validation of a novel radiomics-based nomogram with machine learning to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Front Oncol . (2022) 12:843376. doi: 10.3389/fonc.2022.843376

43. Zhao Z, Bian Y, Jiang H, Fang X, Li J, Cao K, et al. CT-radiomic approach to predict G1/2 nonfunctional pancreatic neuroendocrine tumor. Acad Radiol . (2020) 27:e272–81. doi: 10.1016/j.acra.2020.01.002

44. Zhou RQ, Ji HC, Liu Q, Zhu CY, Liu R. Leveraging machine learning techniques for predicting pancreatic neuroendocrine tumor grades using biochemical and tumor markers. World J Clin Cases . (2019) 7:1611–22. doi: 10.12998/wjcc.v7.i13.1611

45. Chiti G, Grazzini G, Flammia F, Matteuzzi B, Tortoli P, Bettarini S, et al. Gastroenteropancreatic neuroendocrine neoplasms (GEP-NENs): a radiomic model to predict tumor grade. Radiol Med . (2022) 127:928–38. doi: 10.1007/s11547-022-01529-x

46. Mori M, Palumbo D, Muffatti F, Partelli S, Mushtaq J, Andreasi V, et al. Prediction of the characteristics of aggressiveness of pancreatic neuroendocrine neoplasms (PanNENs) based on CT radiomic features. Eur Radiol . (2023) 33:4412–21. doi: 10.1007/s00330-022-09351-9

47. Park YJ, Park YS, Kim ST, Hyun SH. A machine learning approach using [18F]FDG PET-based radiomics for prediction of tumor grade and prognosis in pancreatic neuroendocrine tumor. Mol Imaging Biol . (2023) 25:897–910. doi: 10.1007/s11307-023-01832-7

48. Javed AA, Zhu Z, Kinny-Köster B, Habib JR, Kawamoto S, Hruban RH, et al. Accurate non-invasive grading of nonfunctional pancreatic neuroendocrine tumors with a CT derived radiomics signature. Diagn Interv Imaging . (2024) 105:33–39. doi: 10.1016/j.diii.2023.08.002

49. Zhu HB, Zhu HT, Jiang L, Nie P, Hu J, Tang W, et al. Radiomics analysis from magnetic resonance imaging in predicting the grade of nonfunctioning pancreatic neuroendocrine tumors: a multicenter study. Eur Radiol . (2023) 34:90–102. doi: 10.1007/s00330-023-09957-7

50. Sadula A, Li G, Xiu D, Ye C, Ren S, Guo X, et al. Clinicopathological characteristics of nonfunctional pancreatic neuroendocrine neoplasms and the effect of surgical treatment on the prognosis of patients with liver metastases: A study based on the SEER database. Comput Math Methods Med . (2022) 2022:3689895. doi: 10.1155/2022/3689895

51. Wallace MB, Kennedy T, Durkalski V, Eloubeidi MA, Etamad R, Matsuda K, et al. Randomized controlled trial of EUS-guided fine needle aspiration techniques for the detection of Malignant lymphadenopathy. Gastrointest Endosc . (2001) 54:441–7. doi: 10.1067/mge.2001.117764

52. Canakis A, Lee LS. Current updates and future directions in diagnosis and management of gastroenteropancreatic neuroendocrine neoplasms. World J Gastrointest Endosc . (2022) 14:267–90. doi: 10.4253/wjge.v14.i5.267

53. Sallinen VJ, Le Large TYS, Tieftrunk E, Galeev S, Kovalenko Z, Haugvik SP, et al. Prognosis of sporadic resected small (≤2 cm) nonfunctional pancreatic neuroendocrine tumors—a multiinstitutional study. HPB . (2018) 20:251–9. doi: 10.1016/j.hpb.2017.08.034

54. Liu Y, Shi S, Hua J, Xu J, Zhang B, Liu J, et al. Differentiation of solid-pseudopapillary tumors of the pancreas from pancreatic neuroendocrine tumors by using endoscopic ultrasound. Clin Res Hepatol Gastroenterol . (2020) 44:947–53. doi: 10.1016/j.clinre.2020.02.002

55. Mapelli P, Bezzi C, Palumbo D, Canevari C, Ghezzo S, Samanes Gajate AM, et al. 68Ga-DOTATOC PET/MR imaging and radiomic parameters in predicting histopathological prognostic factors in patients with pancreatic neuroendocrine well-differentiated tumours. Eur J Nucl Med Mol Imaging . (2022) 49:2352–63. doi: 10.1007/s00259-022-05677-0

56. Atkinson C, Ganeshan B, Endozo R, Wan S, Aldridge MD, Groves AM, et al. Radiomics-based texture analysis of 68Ga-DOTATATE positron emission tomography and computed tomography images as a prognostic biomarker in adults with neuroendocrine cancers treated with 177Lu-DOTATATE. Front Oncol . (2021) 11:686235. doi: 10.3389/fonc.2021.686235

57. Jha AK, Mithun S, Sherkhane UB, Dwivedi P, Puts S, Osong B, et al. Emerging role of quantitative imaging (radiomics) and artificial intelligence in precision oncology. Explor Target Antitumor Ther . (2023) 4:569–82. doi: 10.37349/etat

58. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJWL. Machine learning methods for quantitative radiomic biomarkers. Sci Rep . (2015) 5:13087. doi: 10.3389/fonc.2015.00272

59. Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, et al. Machine and deep learning methods for radiomics. Med Phys . (2020) 47:e185–202. doi: 10.1002/mp.13678

Keywords: pancreatic neuroendocrine tumors, meta-analysis, radiomics, machine learning, deep learning

Citation: Yan Q, Chen Y, Liu C, Shi H, Han M, Wu Z, Huang S, Zhang C and Hou B (2024) Predicting histologic grades for pancreatic neuroendocrine tumors by radiologic image-based artificial intelligence: a systematic review and meta-analysis. Front. Oncol. 14:1332387. doi: 10.3389/fonc.2024.1332387

Received: 02 November 2023; Accepted: 02 April 2024; Published: 23 April 2024.

Reviewed by:

Copyright © 2024 Yan, Chen, Liu, Shi, Han, Wu, Huang, Zhang and Hou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shanzhou Huang, [email protected] ; Chuanzhao Zhang, [email protected] ; Baohua Hou, [email protected]

† These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Dtsch Arztebl Int
  • v.106(27); 2009 Jul

Systematic Literature Reviews and Meta-Analyses

Meike ressing.

1 Institut für Medizinische Biometrie, Epidemiologie und Informatik, Universitätsmedizin der Johannes Gutenberg-Universität Mainz

Maria Blettner

Stefanie j. klug.

Because of the rising number of scientific publications, it is important to have a means of jointly summarizing and assessing different studies on a single topic. Systematic literature reviews, meta-analyses of published data, and meta-analyses of individual data (pooled reanalyses) are now being published with increasing frequency. We here describe the essential features of these methods and discuss their strengths and weaknesses.

This article is based on a selective literature search. The different types of review and meta-analysis are described, the methods used in each are outlined so that they can be evaluated, and a checklist is given for the assessment of reviews and meta-analyses of scientific articles.

Systematic literature reviews provide an overview of the state of research on a given topic and enable an assessment of the quality of individual studies. They also allow the results of different studies to be evaluated together when these are inconsistent. Meta-analyses additionally allow calculation of pooled estimates of an effect. The different types of review and meta-analysis are discussed with examples from the literature on one particular topic.

Conclusions

Systematic literature reviews and meta-analyses enable the research findings and treatment effects obtained in different individual studies to be summed up and evaluated.

Every year, there is a great increase in the number of scientific publications. For example, the literature database PubMed registered 361 000 new publications in 1987, with 448 000 in 1997 and 766 000 in 2007 (research in Medline, last updated in January 2009). These figures make it clear how increasingly difficult it is for physicians in private practice, clinicians and scientists to obtain comprehensive current information on any given medical topic. This is why it is necessary to summarize and critically analyze individual studies on the same theme.

Summaries of individual studies are mostly prepared when the results of individual studies are unclear or inconsistent. They are also used to study relationships for which the individual studies do not have adequate statistical power, as the number of cases is too low ( 1 ).

The Cochrane Collaboration undertakes systematic processing and summary of the primary literature for many therapeutic topics, particularly randomized clinical studies ( www.cochrane.org ). They have published a handbook for the performance of systematic reviews and meta-analyses of randomized clinical studies ( 2 ). Cook et al. have published methodological guidelines for this process ( 3 ). Instructions of this sort help to lay down standards for the summary of individual studies. Guidelines have also been drawn up for the publication of meta-analyses on randomized clinical studies ( 4 ) and on observational studies ( 5 ).

Publications on individual studies may be summarized in various forms ( 1 , 6 – 10 ):

  • Narrative reviews
  • Systematic review articles
  • Meta-analyses of published data
  • Pooled reanalyses (meta-analyses with individual data).

These terms are often not clearly allocated in the literature. The aim of the present article is to describe and distinguish these forms and to allow the reader to perform a critical analysis of the results of individual studies and the quality of the systematic review or meta-analysis.

The various types of systematic reviews and meta-analyses of scientific articles will be defined and the procedure will be explained. A selective literature search was performed for this purpose.

A "review" is the qualitative summary of the results of individual studies ( 1 ). A distinction is made between narrative reviews and systematic reviews ( table 1 ). Narrative reviews (A) mostly provide a broad overview of a specific topic ( 1 , 11 ). They are therefore a good way of rapidly obtaining current information on research on a given topic. However, the articles to be included are selected subjectively and unsystematically ( 1 , 11 ). For some time, the Deutsches Ärzteblatt has been using the term "selective literature review" for this type of review. Narrative reviews will not be further discussed in this article.

In contrast, systematic review articles (B) claim that, if possible, they consider all published studies on a specific theme—after the application of previously defined inclusion and exclusion criteria ( 11 ). The aim is to extract relevant information systematically from the publications. What is important is to analyze the methodological quality of the included publications and to investigate the reasons for any differences between the results in the different studies. The results of each study are presented and analyzed according to defined criteria, such as study design and mode of recruitment.

The same applies to the meta-analysis of published data (C). In addition, the results are quantitatively summarized using statistical methods and pooled effect estimates ( glossary ) are calculated ( 1 ).

The summary of individual data

Distortion of study results from systematic errors

The confidence interval is the range within which the true value lies with a specified probability, usually 95%.

A confounder is a factor which is linked to both the studied disease and the studied exposure. For this reason, it can either enhance or weaken the true association between the disease and the target parameter.

An effect estimate, such as the odd ratio or relative risk, estimates the extent of the change in the frequency of a disease caused by a specific exposure.

Contact with a specific risk factor

A forest plot is a graphical representation of the individual studies, as well as the pooled estimate. The effect estimate of each individual study is generally represented on the horizontal or vertical axis, with a confidence interval. The larger the area of the effect estimate of the individual study, the greater is the weight of the study, as a result of the study size and other factors. The pooled effect estimate is mostly represented in the form of a diamond.

In a funnel plot, the study size is plotted against the effect estimates of the individual studies. The variances or the standard error of the effect estimate of the individual studies is given, rather than the study size. Smaller studies give larger variances and standard errors. The effect estimates from large studies are less scattered around the pooled effect estimate than are the effect estimates of small studies. This gives the shape of a funnel. A publication bias can be visualized with the help of funnel plots.

Statistical heterogeneity describes the differences between the studies with respect to the effect estimates. These may be caused by methodological differences between the studies, such as differences in study population or study size, or differences in the methods of measurement.

In individual data, all data (e.g. age, gender, diagnosis) are at the level of the individual.

In medicine and epidemiology, the odds is the ratio of the probability of exposure and the probability of not being exposed. The quotient of the odds of the cases and the odds of the controls gives the odds ratio. For rare diseases, the odds ratio is an approximation to the relative risk.

See individual data

Publication bias means that studies which failed to find any influence of exposure on the target disease ("negative studies") are more rarely published than studies which showed a positive or statistically significant association. Publication bias can be visualized with funnel plots.

A risk factor modifies the probability of the development of a specific disease. This can, for example, be an external environmental effect or an individual predisposition.

To calculate the relative risk, the probability that an exposed individual falls ill is divided by the probability that a non-exposed person falls ill. The relative risk is calculated on the basis of incident diseases.

Using sensitivity analyses, it is examined whether excluding individual studies from the analysis influences the pooled estimate. This tests the stability of the pooled effect estimate.

In subgroup analysis, separate groups in the study population, such as a homogenous ethnic group, are analyzed separately.

A pooled reanalysis (D) is a quantitative compilation of original data ( glossary ) from individual studies for combined analysis ( 1 ). The authors of each study included in the analysis then provide individual data ( glossary ). These are then compiled in a combined database and analyzed according to standard criteria fixed in advance. This form of pooled reanalysis is also referred to as "meta-analysis of individual data".

In a prospectively planned meta-analysis (E), the summary of the individual studies and the combined analysis is included in the planning of the individual studies. For this reason, the individual studies are performed in a standard manner. Prospectively planned meta-analyses will not be further discussed in this article.

It is essential for all forms of summary—except the narrative review—that they should include a prospectively prepared study protocol, with descriptions of the questions to be answered, the hypotheses, the inclusion and exclusion criteria, the selection of studies, and, where applicable, the combination of the data and the recoding of the individual data (only for pooled reanalysis).

Types of study summaries

The procedure for the summary of the studies will now be presented (modified from [7, 10, 12, 13]). This is intended to enable the reader to assess whether a given summary fulfils specific criteria ( Box ).

Checklist for the analysis of a systematic summary

  • Was there an a priori study protocol?
  • Was there an a priori hypothesis?
  • Was there a detailed description of the literature search used?
  • Were prospectively specified inclusion and exclusion criteria clearly described and applied?
  • Was the possible heterogeneity between the studies considered?
  • Was there a clear description of the statistical methods used?
  • Were the limitations of the summary discussed?

1. Was the question to be answered specified in advance?

The question to be answered in the review or meta-analysis and the hypotheses must be clearly defined and laid down in writing prospectively in a study protocol.

2. Were the inclusion and exclusion criteria specified in advance?

On the basis of the inclusion and exclusion criteria, it is decided whether the studies found in the literature search (see point 3) are included in the review/meta-analysis.

3. Were precautions taken to find all studies performed with reference to the specific question to be answered?

An extensive literature search must be performed for studies on the topic. If at all possible, this should be in several literature databases. To avoid bias, all relevant articles should be considered, whatever their language. Moreover, a search should be performed in the literature lists of the articles found and for unpublished studies in congress volumes, as well as with search machines on the Internet.

4. Was the relevant information extracted from the published articles or were the original data combined?

For a systematic review article (B) and for a meta-analysis of published data (C), relevant information should be extracted from the publications.

For a pooled reanalysis (D), authors of all identified studies must be contacted and requested to provide individual data. The individual data must then be coded according to standard specifications, compiled in a combined database and analyzed.

5. Was a descriptive analysis of the data performed?

In all forms of summary, it is usual for the most important characteristics of the individual studies to be presented in overview tables. Table 2 shows an example of such a table, taken from a meta-analysis with published data (C) ( 14 ). This helps to make the differences between the studies clear with respect to the data examined.

NK, not known; FISH, fluorescent in situ hybridation; *1 squamous cell carcinoma only; *2 ever use → 2 years’ use;

*3 relative risks for injectable contrceptives adjusted for oral contraceptive use; *4 Costa Rica, Colombia, Mexico, Panama;

*5 Australia, Chile, Colombia, Israel, Kenya, Mexico, Nigeria, Philippines, Thailand; *6 adenocarcinoma of the cervix only;

*7 Brasil, Colombia, Morocco, Paraguay, Peru, Philippines, Spain, Thailand (Shortened from: Smith J, Green J, Berrington de Gonzalez A et al.: Cervical cancer and use of hormonal contraceptives: a systematic review. Lancet 2003; 361: 1159–67. With the kind permission of Elsevier)

6. Were the calculations of the effect estimates of the individual studies and of the pooled effect estimate presented?

How were the effect estimates of the individual studies calculated?—Systematic review articles (B) usually contain tables with the effect estimates of the individual studies. In a meta-analysis of published data (C), the effect estimates of individual studies (for example, odds ratio or relative risk, see Glossary ) are either directly extracted from the publications or recalculated in a standard manner from the data in each publication ( figure 1 ). Depending on the nature of the factors and target parameters (binary, categorical or continuous variables), a logistic or a linear regression model is used to calculate the effect estimates of the individual studies in the meta-analyses of published data (C) and pooled reanalyses (D).

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-106-0456_001.jpg

The results of the individual studies and the pooled estimate, presented as forest plots on the association between oral contraceptives and cervical carcinoma, as an example of the meta-analysis of published data ( 14 ); N.A. = not available; * never use means <2 years use. CI = confidence interval

(Shortened from: Smith J, Green J, Berrington de Gonzalez A et al.: Cervical cancer and use of hormonal contraceptives: a systematic review. Lancet 2003; 361: 1159–67. With the kind permission of Elsevier).

How was the pooled effect estimate calculated?— The effect estimates of the individual studies are combined by statistical procedures to give a common pooled effect estimate ( 9 ) ( figure 1 ). In meta-analyses with published data (C), two methods are mostly used to calculate a pooled effect estimate: either the fixed effect model or the random effect model (15, 16). They differ with respect to assumptions about the heterogeneity of the estimate between individual studies (see point 7). The method used should be given in the publication and justified. The effect estimates of the individual studies and the pooled effect estimates can be graphically presented in the form of so-called forest plots ( Glossary ; Figure 1 ; [14]).

In pooled reanalyses (D), the pooled effect estimates are mostly calculated by logistic or linear regression. However, the statistical analysis must adequately allow for the origin of the data sets from different studies. The results of the pooled reanalyses can be presented like the results of a single combined study ( table 3 ).

Trend test: χ 2 = 66.2; p < 0.0001

RR, relative risk, adjusted for age, study or study center, age at first sexual intercourse,

number of sex partners, number of full-term pregnancies, smoking and screening status;

* Information taken from the publication; CI, confidence interval; N.A., not available;

s., significance at the level α = 5%; n.s., not significant at the level α = 5%

(Shortened and modified from: International Collaboration of Epidemiological Studies of Cervical Cancer: Cervical cancer and hormonal contraceptives: collaborative reanalysis of individual data for 16,573 women with cervical cancer and 35,509 women without cervical cancer from 24 epidemiological studies. Lancet 2007; 370: 1609–21. With the kind permission of Elsevier)

7. Were problems considered in the interpretation of pooled estimates?

Was the heterogeneity between the estimates considered?—There may be marked differences between the estimates in the individual studies. This statistical heterogeneity ( glossary ) between the studies may be caused by differences in study design, study populations (age, gender, ethnic group), methods of recruitment, diagnosis, or methods of measurement ( 17 , 18 ). The methodological heterogeneity between the studies can be visualized in an overview table, in which the most important characteristics of the individual studies are presented ( table 2 ). The heterogeneity can be formally investigated with the help of statistical tests. If there is statistical heterogeneity between the studies, the random effect model, rather than the fixed effect model, should be used for the calculation of the pooled estimate ( 7 , 15 , 16 ). There is, however, no clear definition as to when the statistical heterogeneity between the studies is so large that the pooled effect estimate should not be calculated ( 1 , 19 ). In addition, the heterogeneity between the studies should be examined by subgroup analysis ( glossary ). For example, this might involve combined analysis of only studies with the same characteristics in the study population, such as homogenous age groups, the same ethnic groups or the same histological findings. Moreover, studies with the same characteristics—such as study quality or study size—may be considered separately in subgroup analyses. This may indicate whether the effect of the corresponding risk factors ( glossary ) is different in the different subgroups.

Were sensitivity analyses performed?— Like subgroup analyses, sensitivity analyses ( glossary ) serve to test the stability of the pooled estimate. It is, for example, possible that the pooled effect estimate is mainly determined by one large study. If this study is excluded from the analysis, the pooled effect estimate may change. This must be borne in mind in the discussion and interpretation of the results.

Was a possible publication bias considered?— A publication bias ( glossary ) can be visualized with a so-called funnel plot ( glossary ) ( 7 , 20 – 22 ). Figure 2 shows an example with simulated data. In the upper funnel plot ( Figure 2a ), there is a roughly funnel shaped distribution of the effect estimates of the individual studies around the pooled effect estimates (middle broken line). There is no publication bias here. In the lower funnel plot ( Figure 2b ), the small studies are missing, which in this example show no increased risk. For this reason, there is probably a publication bias, because these studies had not been published.

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-106-0456_002.jpg

Visualization of publication bias with funnel plots of simulated data a) No publication bias; b) Publication bias; SE = standard error; OR = odds ratio

8. How were the results interpreted?

In the interpretation of the results, possible limitations should be discussed and considered. For example, the reliability of the results of individual studies can be limited by the inadequate quality of the individual studies—for example, by selection of the study population or from aggregated data ( glossary ).

The method section describes the individual steps for the extraction of the relevant points which must be considered in the systematic summary of scientific articles ( Box ). This checklist can also be used to analyze the quality of systematic review articles or meta-analysis.

Publications on the association between the administration of oral contraceptives and the development of cervical carcinoma were used as examples of the performance of a systematic literature review (B), a meta-analysis of published data (C), and a pooled reanalysis (D). This association has been scientifically studied for a long period.

In 1996, La Vecchia et al. published a systematic review article (B) on this topic, including six studies ( 23 ). Their overview table contained a variety of information on the individual studies. No pooled effect estimate was calculated.

In 2003, Smith et al. ( 14 ) presented a meta-analysis of published data (C) of 28 studies on the same topic. The included studies were first summarized in a descriptive overview, as is common in systematic review articles ( table 2 ). This table shows that the study methods were heterogenous ( glossary ); for example, HPV was detected in different ways ( table 2 ). The heterogeneity was also formally investigated with statistical tests and various subgroup analyses were performed. In contrast to the systematic review article (B) of LaVecchia et al., pooled effect estimates were calculated with the published data ( figure 1 ). The effect estimates for the individual studies and the pooled effect estimates with their confidence intervals ( glossary ) were presented as a forest plot ( figure 1 ).

In 2007, a pooled reanalysis (D) was published for 24 studies on the same topic for which the original data were available ( 24 ). In contrast to the meta-analysis of published data, the pooled effect estimates were calculated from the original data and only the combined results were presented ( table 3 ). This kind of analysis is only possible in a pooled reanalysis, as the original data with precise information on all parameters for each participant are then available. Nevertheless, here too it is necessary to consider that the individual data ( glossary ) are derived from different studies.

Systematic review articles (B) can provide a comprehensive overview of the current state of research ( 1 ). They are also necessary for the development of S2 and S3 guidelines for formal evidence-based research ( 25 ). Meta-analyses of published data (C) are performed to calculate additional pooled effect estimates from the individual studies ( 1 ). Like systematic review articles, they are feasible whether the authors of the original articles are prepared to cooperate or not.

The calculated pooled effect estimates may be of limited validity for various reasons. Firstly, it has not been clearly defined what is the maximum order of heterogeneity between the studies which is negligible and which then allows a meaningful calculation of a pooled effect estimate (1, 19). If the individual studies are too heterogenous, a pooled effect estimate should not be calculated. Secondly, the pooled effect estimate is mostly calculated from aggregated data. Subgroup analyses and the consideration of potential confounders ( glossary ) are often impossible, or only possible to a limited extent ( 1 , 19 ). Thirdly, publication bias is also a problem for the meta-analysis of published data.

In a pooled reanalysis (D), potential confounders and risk factors can be more easily considered ( 7 ), as they are usually only published in an aggregated form. With the individual data, the outcome parameters, risk factors, and confounders used in the analysis can be categorized in a standard manner and properly incorporated in the analysis. Individual data can be removed in accordance with the prospective specifications in the study protocol, without it being necessary to exclude the whole study. The disadvantages of pooled reanalysis are that it demands a great deal of time and money and that it is dependent on the willingness of the authors of the individual studies to cooperate. If not all authors send their individual data, this may result in biased results.

The level of evidence of the type of summary increases from the systematic review to the meta-analysis of published data to the pooled reanalysis. It is important that all three forms of summary should be performed with high quality.

Key messages

  • The various forms of summary can be categorized as systematic review articles, meta-analyses of published data, and pooled reanalyses.
  • Systematic review articles can provide a rapid overview of the status of research on a specific topic.
  • Meta-analyses of published data and pooled reanalyses additionally permit the calculation of pooled effect estimates.
  • Pooled reanalyses allow a detailed evaluation on the basis of individual data.
  • Like any original study, all these types of summary must have an a priori study protocol, laying down in detail the research questions, the hypothesis, the literature search, the inclusion and exclusion criteria, and the analysis strategies.

Acknowledgments

Translated from the original German by Rodney A. Yeates, M.A., Ph.D.

Conflict of interest statement

The authors declare that there is no conflict of interest in the sense of the guidelines of the International Committee of Medical Journal Editors.

This paper is in the following e-collection/theme issue:

Published on 23.4.2024 in Vol 26 (2024)

Electronic Media Use and Sleep Quality: Updated Systematic Review and Meta-Analysis

Authors of this article:

Author Orcid Image

  • Xiaoning Han * , PhD   ; 
  • Enze Zhou * , MA   ; 
  • Dong Liu * , PhD  

School of Journalism and Communication, Renmin University of China, Beijing, China

*all authors contributed equally

Corresponding Author:

Dong Liu, PhD

School of Journalism and Communication

Renmin University of China

No. 59 Zhongguancun Street, Haidian District

Beijing, 100872

Phone: 86 13693388506

Email: [email protected]

Background: This paper explores the widely discussed relationship between electronic media use and sleep quality, indicating negative effects due to various factors. However, existing meta-analyses on the topic have some limitations.

Objective: The study aims to analyze and compare the impacts of different digital media types, such as smartphones, online games, and social media, on sleep quality.

Methods: Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, the study performed a systematic meta-analysis of literature across multiple databases, including Web of Science, MEDLINE, PsycINFO, PubMed, Science Direct, Scopus, and Google Scholar, from January 2018 to October 2023. Two trained coders coded the study characteristics independently. The effect sizes were calculated using the correlation coefficient as a standardized measure of the relationship between electronic media use and sleep quality across studies. The Comprehensive Meta-Analysis software (version 3.0) was used to perform the meta-analysis. Statistical methods such as funnel plots were used to assess the presence of asymmetry and a p -curve test to test the p -hacking problem, which can indicate publication bias.

Results: Following a thorough screening process, the study involved 55 papers (56 items) with 41,716 participants from over 20 countries, classifying electronic media use into “general use” and “problematic use.” The meta-analysis revealed that electronic media use was significantly linked with decreased sleep quality and increased sleep problems with varying effect sizes across subgroups. A significant cultural difference was also observed in these effects. General use was associated with a significant decrease in sleep quality ( P <.001). The pooled effect size was 0.28 (95% CI 0.21-0.35; k =20). Problematic use was associated with a significant increase in sleep problems ( P ≤.001). The pooled effect size was 0.33 (95% CI 0.28-0.38; k =36). The subgroup analysis indicated that the effect of general smartphone use and sleep problems was r =0.33 (95% CI 0.27-0.40), which was the highest among the general group. The effect of problematic internet use and sleep problems was r =0.51 (95% CI 0.43-0.59), which was the highest among the problematic groups. There were significant differences among these subgroups (general: Q between =14.46, P =.001; problematic: Q between =27.37, P <.001). The results of the meta-regression analysis using age, gender, and culture as moderators indicated that only cultural difference in the relationship between Eastern and Western culture was significant ( Q between =6.69; P =.01). All funnel plots and p -curve analyses showed no evidence of publication and selection bias.

Conclusions: Despite some variability, the study overall confirms the correlation between increased electronic media use and poorer sleep outcomes, which is notably more significant in Eastern cultures.

Introduction

Sleep is vital to our health. Research has shown that high sleep quality can lead to improvements in a series of health outcomes, such as an improved immune system, better mood and mental health, enhanced physical performance, lower risk of chronic diseases, and a longer life span [ 1 - 5 ].

Electronic media refers to forms of media or communication that use electronic devices or technology to create, distribute, and display content. This can include various forms of digital media such as smartphones, tablets, instant messaging, phone calls, social media, online games, short video platforms, etc. Electronic media has permeated every aspect of our lives [ 6 ]. Many prefer to use smartphones or tablets before sleep, which can negatively affect sleep in many aspects, including delayed sleep onset, disrupted sleep patterns, shortened sleep duration, and poor sleep quality [ 7 - 10 ]. Furthermore, problematic use occurs when the behavior surpasses a certain limit. In this study, problematic use of electronic media is not solely determined by the amount of time spent on these platforms, but rather by behavioral indicators that suggest an unhealthy or harmful relationship with them.

Smartphones or tablet use can affect sleep quality in many ways. At first, the use of these devices may directly displace, delay, or interrupt sleep time, resulting in inadequate sleep quantity [ 11 ]. The sound of notifications and vibrations of these devices may interrupt sleep. Second, the screens of smartphones and tablets emit blue light, which can suppress the production of melatonin, the hormone responsible for regulating sleep-wake cycles [ 12 ]. Third, consuming emotionally charged content, such as news, suspenseful movies, or engaging in online arguments, can increase emotional arousal, making it harder to relax and fall asleep. This emotional arousal can also lead to disrupted sleep and nightmares [ 13 ]. Finally, the use of electronic devices before bedtime can lead to a delay in bedtime and a shortened sleep duration, as individuals may lose track of time while engaging with their devices. This can result in a disrupted sleep routine and decreased sleep quality [ 14 ].

Some studies have conducted meta-analyses on screen media use and sleep outcomes in 2016, 2019, and 2021 [ 15 - 17 ]. However, these studies had their own limitations. First, the sample size included in their meta-analyses was small (around 10). Second, these studies only focused on 1 aspect of the effect of digital media on sleep quality. For example, Carter et al [ 16 ] focused only on adolescents, and both Alimoradi et al [ 15 ] and Kristensen et al [ 17 ] only reviewed the relationship between problematic use of digital media or devices and sleep quality. Despite of the high heterogeneity found in the meta-analyses, none have compared the effects of different digital media or devices. This study aims to clarify and compare the effects of these different channels.

Literature Search

The research adhered to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines ( Multimedia Appendix 1 ) and followed a predetermined protocol [ 18 , 19 ]. As the idea and scope of this study evolved over time, the meta-analysis was not preregistered. However, the methodology was defined a priori and strictly followed to reduce biases, and the possible influence of post hoc decisions was minimized. All relevant studies in English, published from January 1, 2018, to October 9, 2023, were searched. We searched the following databases: Web of Science, MEDLINE, PsycINFO, PubMed, Science Direct, Scopus, and Google Scholar. The abstracts were examined manually. The keywords used to search were the combination of the following words: “sleep” OR “sleep duration” OR “sleep quality” OR “sleep problems” AND “electronic media” OR “smartphone” OR “tablet” OR “social media” OR “Facebook” OR “Twitter” OR “online gaming” OR “internet” OR “addiction” OR “problematic” ( Multimedia Appendix 2 ). Additionally, the reference lists of relevant studies were examined.

Two trained coders independently screened the titles and abstracts of the identified papers for eligibility, followed by a full-text review of the selected studies. Discrepancies between the coders were resolved through discussion until a consensus was reached. The reference lists of the included studies were also manually screened to identify any additional relevant studies. Through this rigorous process, we ensured a comprehensive and replicable literature search that could contribute to the robustness of our meta-analysis findings.

Inclusion or Exclusion Criteria

Titles and abstracts from search results were scrutinized for relevance, with duplicates removed. Full texts of pertinent papers were obtained, and their eligibility for inclusion was evaluated. We mainly included correlational studies that used both continuous measures of time spent using electronic media use and sleep quality. Studies must have been available in English. Four criteria were used to screen studies: (1) only peer-reviewed empirical studies, published in English, were considered for inclusion in the meta-analysis; (2) the studies should report quantitative statistics on electronic media use and sleep quality, including sample size and essential information to calculate the effect size, and review papers, qualitative studies, case studies, and conference abstracts were excluded; (3) studies on both general use and problematic use of electronic media or devices should be included; and (4) only studies that used correlation, regression, or odds ratio were included to ensure consistency.

Study Coding

Two trained coders were used to code the characteristics of the studies independently. Discrepancies were discussed with the first author of the paper to resolve. Sample size and characteristics of participants were coded: country, female ratio, average age, publication year, and electronic types. Effect sizes were either extracted directly from the original publications or manually calculated. If a study reported multiple dependent effects, the effects were merged into one. If a study reported multiple independent effects from different samples, the effects were included separately. Additionally, to evaluate the study quality, the papers were classified into 3 tiers (high, middle, and low) according to Journal Citation Reports 2022 , a ranking of journals based on their impact factor as reported in the Web of Science. The few unindexed papers were rated based on their citation counts as reported in Google Scholar.

Meta-Analysis and Moderator Analyses

The effect sizes were calculated using the correlation coefficient ( r ) as a standardized measure of the relationship between electronic media or device use and sleep quality across studies. When studies reported multiple effect sizes, we selected the one that best represented the overall association between electronic media use and sleep quality. If studies did not provide correlation coefficients, we converted other reported statistics (eg, standardized regression coefficients) into correlation coefficients using established formulas. Once calculated, the correlation coefficients were transformed into Fisher z scores to stabilize the variance and normalize the distribution.

Previous meta-studies have shown high levels of heterogeneity. Hence, the random effects model was adopted for all analyses. To explore potential factors contributing to the heterogeneity and to further understand the relationship between electronic media use and sleep quality, we conducted moderator analyses. The following categorical and continuous moderators were examined: media types (online gaming, social media, smartphone, or intent), participants’ average age, culture, female ratio, and sleep quality assessment method. For categorical moderators, subgroup analyses were performed, while for continuous moderators, meta-regression analyses were conducted. All analyses were completed in the Comprehensive Meta-Analysis software (version 3.0; Biostat, Inc).

Publication Bias

We used statistical methods such as funnel plots to assess the presence of asymmetry and a p -curve test to test the p -hacking problem, which may indicate publication bias. In case of detected asymmetry, we applied techniques such as the trim-and-fill method to adjust the effect size estimates.

By addressing publication bias, we aimed to provide a more accurate and reliable synthesis of the available evidence, enhancing the validity and generalizability of our meta-analytic findings. Nevertheless, it is essential for readers to interpret the results cautiously, considering the potential limitations imposed by publication bias and other methodological concerns.

Search Findings

A total of 98,806 studies were identified from databases, especially Scopus (n=49,643), Google Scholar (n=18,600), Science Direct (n=15,084), and Web of Science (n=11,689). Upon removing duplicate records and excluding studies that did not meet the inclusion criteria, 754 studies remained for the screening phase. After screening titles, abstracts, and full texts, 703 studies were excluded. A total of 4 additional studies were identified from the references of relevant reviews. Finally, 55 studies [ 20 - 74 ] were included in the meta-analysis. The flow diagram of the selection is shown in Figure 1 .

a meta analysis literature review

Characteristics of Included Studies

In 20 studies, 21,594 participants were included in the analysis of the general use of electronic media and sleep quality. The average age of the sample ranged from 9.9 to 44 years. The category of general online gaming and sleep quality included 4 studies, with 14,837 participants; the category of general smartphone use and sleep quality included 10 studies, with 5011 participants; and the category of general social media use and sleep quality included 6 studies, with 1746 participants.

These studies came from the following countries or areas: Germany, Serbia, Indonesia, India, China, Italy, Saudi Arabia, New Zealand, the United Kingdom, the United States, Spain, Qatar, Egypt, Argentina, and Portugal. The most frequently used measure of electronic media use was the time spent on it. The most frequently used measure of sleep was the Pittsburgh Sleep Quality Index.

In 35 studies, 20,122 participants were included in the analysis of the problematic use of electronic media and sleep quality. The average age of the sample ranged from 14.76 to 65.62 years. The category of problematic online gaming and sleep quality included 5 studies, with 1874 participants; the category of problematic internet use and sleep quality included 2 studies, with 774 participants; the category of problematic smartphone use and sleep quality included 18 studies, with 12,204 participants; and the category of problematic social media use and sleep quality included 11 studies, with 5270 participants. There was a study that focused on both social media and online gaming, which led to its inclusion in the analysis. These studies came from 14 countries or areas: Turkey, the United States, Indonesia, China, France, Taiwan, India, South Korea, Hong Kong, Iran, Poland, Israel, Hungary, and Saudi Arabia. The most frequently used measures of problematic electronic media use were the Internet Gaming Disorder Scale-Short Form, Smartphone Addiction Scale-Short Form, and Bergen Social Media Addiction Scale.

With respect to study quality, the 56 papers were published in 50 journals, 41 of which were indexed in Journal Citation Reports 2022 , while the remaining 9 journals were rated based on their citation counts as reported in Google Scholar. As a result, of the 56 papers included in the study, 22 papers were assigned a high rating, 18 papers were assigned a middle rating, and 16 papers were assigned a low rating. More information about the included studies is listed in Multimedia Appendix 3 [ 20 - 74 ].

Meta-Analysis

The results of the meta-analysis of the relationship between general electronic media use and sleep quality showed that electronic media use was associated with a significant decrease in sleep quality ( P <.001). The pooled effect size was 0.28 (95% CI 0.21-0.35; k =20), indicating that individuals who used electronic media more frequently were generally associated with more sleeping problems.

The second meta-analysis showed that problematic electronic media use was associated with a significant increase in sleep problems ( P ≤.001). The pooled effect size was 0.33 (95% CI 0.28-0.38; k =36), indicating that participants who used electronic media more frequently were more likely to have more sleep problems.

Moderator Analyses

At first, we conducted subgroup analyses for different media or devices. The results are shown in Tables 1 and 2 . The effect of the relationship between general online gaming and sleep problems was r =0.14 (95% CI 0.06-0.22); the effect of the relationship between general smartphone use and sleep problems was r =0.33 (95% CI 0.27-0.40); and the effect of the relationship between general social media use and sleep problems was r =0.28 (95% CI 0.21-0.34). There are significant differences among these groups ( Q between =14.46; P =.001).

The effect of the relationship between problematic gaming and sleep problems was r =0.49, 95% CI 0.23-0.69; the effect of the relationship between problematic internet use and sleep problems was r =0.51 (95% CI 0.43-0.59); the effect of the relationship between problematic smartphone use and sleep problems was r =0.25 (95% CI 0.20-0.30); and the effect of the relationship between problematic social media use and sleep problems was r =0.35 (95% CI 0.29-0.40). There are significant differences among these groups ( Q between =27.37; P <.001).

We also used age, gender, and culture as moderators to conduct meta-regression analyses. The results are shown in Tables 3 and 4 . Only cultural difference in the relationship between Eastern and Western culture was significant ( Q between =6.694; P =.01). All other analyses were not significant.

a Not applicable.

All funnel plots of the analyses were symmetrical, showing no evidence of publication bias ( Figures 2 - 5 ). We also conducted p -curve analyses to see whether there were any selection biases. The results also showed that there were no biases.

a meta analysis literature review

Principal Findings

This study indicated that electronic media use was significantly linked with decreased sleep quality and increased sleep problems with varying effect sizes across subgroups. General use was associated with a significant decrease in sleep quality. Problematic use was associated with a significant increase in sleep problems. A significant cultural difference was also observed by the meta-regression analysis.

First, there is a distinction in the impact on sleep quality between problematic use and general use, with the former exhibiting a higher correlation strength. However, both have a positive correlation, suggesting that the deeper the level of use, the more sleep-related issues are observed. In addressing this research question, the way in which electronic media use is conceptualized and operationalized may have a bearing on the ultimate outcomes. Problematic use is measured through addiction scales, while general use is predominantly assessed by duration of use (time), leading to divergent results stemming from these distinct approaches. The key takeaway is that each measurement possesses unique strengths and weaknesses, and the pathways affecting sleep quality differ. Consequently, the selection of a measurement approach should be tailored to the specific research question at hand. The duration of general use reflects an individual’s comprehensive involvement with electronic media, and its impact on sleep quality is evident in factors such as an extended time to fall asleep and reduced sleep duration. The addiction scale for problematic use illuminates an individual’s preferences, dependencies, and other associations with electronic media. Its impact on sleep quality is evident through physiological and psychological responses, including anxiety, stress, and emotional reactions.

Second, notable variations exist in how different types of electronic media affect sleep quality. In general, the positive predictive effects of smartphone, social media, and online gaming use durations on sleep problems gradually decrease. In the problematic context, the intensity of addiction to the internet and online gaming has the most significant positive impact on sleep problems, followed by social media, while smartphones exert the least influence. On one hand, longitudinal comparisons within the same context reveal that the content and format of electronic media can have varying degrees of negative impact on sleep quality, irrespective of whether it involves general or problematic use. On the other hand, cross-context comparisons suggest that both general and problematic use play a role in moderating the impact of electronic media types on sleep quality. As an illustration, problematic use reinforces the positive impact of online gaming and social media on sleep problems, while mitigating the influence of smartphones. Considering smartphones as electronic media, an extended duration of general use is associated with lower sleep quality. However, during problematic use, smartphones serve as the platform for other electronic media such as games and social media, resulting in a weakened predictive effect on sleep quality. Put differently, in the context of problematic use, the specific type of electronic media an individual consumes on their smartphones becomes increasingly pivotal in shaping sleep quality.

Third, cultural differences were found to be significant moderators of the relationship between electronic media use and sleep problems in both our study and Carter et al [ 16 ]. Kristensen et al [ 17 ], however, did not specifically address the role of cultural differences but revealed that there was a strong and consistent association between bedtime media device use and sleep outcomes across the studies included. Our findings showed that the association between problematic social media use was significantly larger in Eastern culture. We speculate that the difference may be attributed to cultural differences in social media use patterns, perceptions of social norms and expectations, variations in bedtime routines and habits, and diverse coping mechanisms for stress. These speculations warrant further investigation to understand better the underlying factors contributing to the observed cultural differences in the relationship between social media use and sleep quality.

Fourth, it was observed that gender and age had no significant impact on sleep quality. The negative effects of electronic media use are not only confined to the sleep quality of adults, and the association with gender differences remains unclear. Recent studies point out that electronic media use among preschoolers may result in a “time-shifting” process, disrupting their sleep patterns [ 75 ]. Similarly, children and adolescent sleep patterns have been reported to be adversely affected by electronic media use [ 76 - 78 ]. These findings underscore the necessity of considering age group variations in future research, as electronic media use may differently impact sleep quality across age demographics.

In conclusion, our study, Carter et al [ 16 ], and Kristensen et al [ 17 ] collectively emphasize the importance of understanding and addressing the negative impact of electronic media use, particularly problematic online gaming and smartphone use, on sleep quality and related issues. Further research is warranted to explore the underlying mechanisms and specific factors contributing to the relationship between electronic media use and sleep problems.

Strengths and Limitations

Our study, supplemented with research by Carter et al [ 16 ] and Kristensen et al [ 17 ], contributes to the growing evidence supporting a connection between electronic media use and sleep quality. We found that both general and problematic use of electronic media correlates with sleep issues, with the strength of the correlation varying based on the type of electronic media and cultural factors, with no significant relationship observed with age or gender.

Despite the vast amount of research on the relationship between electronic media use and sleep, several gaps and limitations still exist.

First, the inclusion criteria were restricted to English-language, peer-reviewed empirical studies published between January 2018 and October 2023. This may have led to the exclusion of relevant studies published in other languages or before 2018, potentially limiting the generalizability of our findings. Furthermore, the exclusion of non–peer-reviewed studies and conference abstracts may have introduced publication bias, as significant results are more likely to be published in peer-reviewed journals.

Second, although we used a comprehensive search strategy, the possibility remains that some relevant studies may have been missed. Additionally, the search strategies were not linked with Medical Subject Headings headers and may not have captured all possible electronic media types, resulting in an incomplete representation of the effects of electronic media use on sleep quality.

Third, the studies included in our meta-analysis exhibited considerable heterogeneity in sample characteristics, electronic media types, and measures of sleep quality. This heterogeneity might have contributed to the variability in effect sizes observed across studies. Although we conducted moderator analyses to explore potential sources of heterogeneity, other unexamined factors may still have influenced the relationship between electronic media use and sleep quality.

Fourth, our meta-analysis relied on the correlation coefficient ( r ) as the primary effect size measure, which may not fully capture the complex relationships between electronic media use and sleep quality. Moreover, the conversion of other reported statistics into correlation coefficients could introduce additional sources of error. The correlational nature of the included studies limited our ability to draw causal inferences between electronic media use and sleep quality. Experimental and longitudinal research designs would provide stronger evidence for the directionality of this relationship.

Given these limitations, future research should aim to include a more diverse range of studies, examine additional potential moderators, and use more robust research designs to better understand the complex relationship between electronic media use and sleep quality.

Conclusions

In conclusion, our updated meta-analysis affirms the consistent negative impact of electronic media use on sleep outcomes, with problematic online gaming and smartphone use being particularly impactful. Notably, the negative effect of problematic social media use on sleep quality appears more pronounced in Eastern cultures. This research emphasizes the need for public health initiatives to increase awareness of these impacts, particularly for adolescents. Further research, including experimental and longitudinal studies, is necessary to delve deeper into the complex relationship between electronic media use and sleep quality, considering potential moderators like cultural differences.

Acknowledgments

This research was supported by the Journalism and Marxism Research Center, Renmin University of China (MXG202215), and by funds for building world-class universities (disciplines) of Renmin University of China (23RXW195).

A statement on the use of ChatGPT in the process of writing this paper can be found in Multimedia Appendix 4.

Data Availability

The data sets analyzed during this study are available from the corresponding author on reasonable request.

Conflicts of Interest

None declared.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist.

Search strategies.

Characteristics of included studies.

Large language model statement.

  • Brink-Kjaer A, Leary EB, Sun H, Westover MB, Stone KL, Peppard PE, et al. Age estimation from sleep studies using deep learning predicts life expectancy. NPJ Digit Med. 2022;5(1):103. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Killgore WDS. Effects of sleep deprivation on cognition. Prog Brain Res. 2010;185:105-129. [ CrossRef ] [ Medline ]
  • Lee S, Mu CX, Wallace ML, Andel R, Almeida DM, Buxton OM, et al. Sleep health composites are associated with the risk of heart disease across sex and race. Sci Rep. 2022;12(1):2023. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Prather AA. Sleep, stress, and immunity. In: Grandner MA, editor. Sleep and Health, 1st Edition. Cambridge. Academic Press; 2019;319-330.
  • Scott AJ, Webb TL, Martyn-St James M, Rowse G, Weich S. Improving sleep quality leads to better mental health: a meta-analysis of randomised controlled trials. Sleep Med Rev. Dec 2021;60:101556. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Guttmann A. Statista. 2023. URL: https://www.statista.com/topics/1536/media-use/#topicOverview [accessed 2023-06-10]
  • Hysing M, Pallesen S, Stormark KM, Jakobsen R, Lundervold AJ, Sivertsen B. Sleep and use of electronic devices in adolescence: results from a large population-based study. BMJ Open. Feb 02, 2015;5(1):e006748. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lavender RM. Electronic media use and sleep quality. Undergrad J Psychol. 2015;28(1):55-62. [ FREE Full text ]
  • Exelmans L, Van den Bulck J. Bedtime mobile phone use and sleep in adults. Soc Sci Med. 2016;148:93-101. [ CrossRef ] [ Medline ]
  • Twenge JM, Krizan Z, Hisler G. Decreases in self-reported sleep duration among U.S. adolescents 2009-2015 and association with new media screen time. Sleep Med. 2017;39:47-53. [ CrossRef ] [ Medline ]
  • Exelmans L. Electronic media use and sleep: a self-control perspective. Curr Sleep Med Rep. 2019;5:135-140. [ CrossRef ]
  • Jniene A, Errguig L, El Hangouche AJ, Rkain H, Aboudrar S, El Ftouh M, et al. Perception of sleep disturbances due to bedtime use of blue light-emitting devices and its impact on habits and sleep quality among young medical students. Biomed Res Int. 2019;2019:7012350. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Munezawa T, Kaneita Y, Osaki Y, Kanda H, Minowa M, Suzuki K, et al. The association between use of mobile phones after lights out and sleep disturbances among Japanese adolescents: a nationwide cross-sectional survey. Sleep. 2011;34(8):1013-1020. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Smith LJ, Gradisar M, King DL, Short M. Intrinsic and extrinsic predictors of video-gaming behaviour and adolescent bedtimes: the relationship between flow states, self-perceived risk-taking, device accessibility, parental regulation of media and bedtime. Sleep Med. 2017;30:64-70. [ CrossRef ] [ Medline ]
  • Alimoradi Z, Lin CY, Broström A, Bülow PH, Bajalan Z, Griffiths MD, et al. Internet addiction and sleep problems: a systematic review and meta-analysis. Sleep Med Rev. 2019;47:51-61. [ CrossRef ] [ Medline ]
  • Carter B, Rees P, Hale L, Bhattacharjee D, Paradkar MS. Association between portable screen-based media device access or use and sleep outcomes: a systematic review and meta-analysis. JAMA Pediatr. 2016;170(12):1202-1208. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kristensen JH, Pallesen S, King DL, Hysing M, Erevik EK. Problematic gaming and sleep: a systematic review and meta-analysis. Front Psychiatry. 2021;12:675237. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Akçay D, Akçay BD. The effect of computer game playing habits of university students on their sleep states. Perspect Psychiatr Care. 2020;56(4):820-826. [ CrossRef ] [ Medline ]
  • Alahdal WM, Alsaedi AA, Garrni AS, Alharbi FS. The impact of smartphone addiction on sleep quality among high school students in Makkah, Saudi Arabia. Cureus. 2023;15(6):e40759. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Alam A, Alshakhsi S, Al-Thani D, Ali R. The role of objectively recorded smartphone usage and personality traits in sleep quality. PeerJ Comput Sci. 2023;9:e1261. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Almeida F, Marques DR, Gomes AA. A preliminary study on the association between social media at night and sleep quality: the relevance of FOMO, cognitive pre-sleep arousal, and maladaptive cognitive emotion regulation. Scand J Psychol. 2023;64(2):123-132. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Alshobaili FA, AlYousefi NA. The effect of smartphone usage at bedtime on sleep quality among Saudi non-medical staff at King Saud University Medical City. J Family Med Prim Care. 2019;8(6):1953-1957. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Alsulami A, Bakhsh D, Baik M, Merdad M, Aboalfaraj N. Assessment of sleep quality and its relationship to social media use among medical students. Med Sci Educ. 2019;29(1):157-161. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Altintas E, Karaca Y, Hullaert T, Tassi P. Sleep quality and video game playing: effect of intensity of video game playing and mental health. Psychiatry Res. 2019;273:487-492. [ CrossRef ] [ Medline ]
  • Asbee J, Slavish D, Taylor DJ, Dietch JR. Using a frequentist and Bayesian approach to examine video game usage, substance use, and sleep among college students. J Sleep Res. 2023;32(4):e13844. [ CrossRef ] [ Medline ]
  • Bae ES, Kang HS, Lee HN. The mediating effect of sleep quality in the relationship between academic stress and social network service addiction tendency among adolescents. J Korean Acad Community Health Nurs. 2020;31(3):290-299. [ FREE Full text ] [ CrossRef ]
  • Chatterjee S, Kar SK. Smartphone addiction and quality of sleep among Indian medical students. Psychiatry. 2021;84(2):182-191. [ CrossRef ] [ Medline ]
  • Chung JE, Choi SA, Kim KT, Yee J, Kim JH, Seong JW, et al. Smartphone addiction risk and daytime sleepiness in Korean adolescents. J Paediatr Child Health. 2018;54(7):800-806. [ CrossRef ] [ Medline ]
  • Demir YP, Sumer MM. Effects of smartphone overuse on headache, sleep and quality of life in migraine patients. Neurosciences (Riyadh). 2019;24(2):115-121. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dewi RK, Efendi F, Has EMM, Gunawan J. Adolescents' smartphone use at night, sleep disturbance and depressive symptoms. Int J Adolesc Med Health. 2018;33(2):20180095. [ CrossRef ] [ Medline ]
  • Eden A, Ellithorpe ME, Meshi D, Ulusoy E, Grady SM. All night long: problematic media use is differentially associated with sleep quality and depression by medium. Commun Res Rep. 2021;38(3):143-149. [ CrossRef ]
  • Ellithorpe ME, Meshi D, Tham SM. Problematic video gaming is associated with poor sleep quality, diet quality, and personal hygiene. Psychol Pop Media. 2023;12(2):248-253. [ CrossRef ]
  • Elsheikh AA, Elsharkawy SA, Ahmed DS. Impact of smartphone use at bedtime on sleep quality and academic activities among medical students at Al -Azhar University at Cairo. J Public Health (Berl.). Jun 15, 2023.:1-10. [ FREE Full text ] [ CrossRef ]
  • Gaya AR, Brum R, Brites K, Gaya A, de Borba Schneiders L, Duarte Junior MA, et al. Electronic device and social network use and sleep outcomes among adolescents: the EHDLA study. BMC Public Health. 2023;23(1):919. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Gezgin DM. Understanding patterns for smartphone addiction: age, sleep duration, social network use and fear of missing out. Cypriot J Educ Sci. 2018;13(2):166-177. [ CrossRef ]
  • Graham S, Mason A, Riordan B, Winter T, Scarf D. Taking a break from social media improves wellbeing through sleep quality. Cyberpsychol Behav Soc Netw. 2021;24(6):421-425. [ CrossRef ] [ Medline ]
  • Guerrero MD, Barnes JD, Chaput JP, Tremblay MS. Screen time and problem behaviors in children: exploring the mediating role of sleep duration. Int J Behav Nutr Phys Act. 2019;16(1):105. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hamvai C, Kiss H, Vörös H, Fitzpatrick KM, Vargha A, Pikó BF. Association between impulsivity and cognitive capacity decrease is mediated by smartphone addiction, academic procrastination, bedtime procrastination, sleep insufficiency and daytime fatigue among medical students: a path analysis. BMC Med Educ. 2023;23(1):537. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Herlache AD, Lang KM, Krizan Z. Withdrawn and wired: problematic internet use accounts for the link of neurotic withdrawal to sleep disturbances. Sleep Sci. 2018;11(2):69-73. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Huang Q, Li Y, Huang S, Qi J, Shao T, Chen X, et al. Smartphone use and sleep quality in chinese college students: a preliminary study. Front Psychiatry. 2020;11:352. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hussain Z, Griffiths MD. The associations between problematic social networking site use and sleep quality, attention-deficit hyperactivity disorder, depression, anxiety and stress. Int J Ment Health Addict. 2021;19:686-700. [ FREE Full text ] [ CrossRef ]
  • Imani V, Ahorsu DK, Taghizadeh N, Parsapour Z, Nejati B, Chen HP, et al. The mediating roles of anxiety, depression, sleepiness, insomnia, and sleep quality in the association between problematic social media use and quality of life among patients with cancer. Healthcare (Basel). 2022;10(9):1745. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jeong CY, Seo YS, Cho EH. The effect of SNS addiction tendency on trait-anxiety and quality of sleep in university students'. J Korean Clin Health Sci. 2018;6(2):1147-1155. [ CrossRef ]
  • Karaş H, Küçükparlak İ, Özbek MG, Yılmaz T. Addictive smartphone use in the elderly: relationship with depression, anxiety and sleep quality. Psychogeriatrics. 2023;23(1):116-125. [ CrossRef ] [ Medline ]
  • Kater MJ, Schlarb AA. Smartphone usage in adolescents: motives and link to sleep disturbances, stress and sleep reactivity. Somnologie. 2020;24(4):245-252. [ CrossRef ]
  • Kharisma AC, Fitryasari R, Rahmawati PD. Online games addiction and the decline in sleep quality of college student gamers in the online game communities in Surabaya, Indonesia. Int J Psychosoc Rehabil. 2020;24(7):8987-8993. [ FREE Full text ] [ CrossRef ]
  • Kumar VA, Chandrasekaran V, Brahadeeswari H. Prevalence of smartphone addiction and its effects on sleep quality: a cross-sectional study among medical students. Ind Psychiatry J. 2019;28(1):82-85. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee Y, Blebea J, Janssen F, Domoff SE. The impact of smartphone and social media use on adolescent sleep quality and mental health during the COVID-19 pandemic. Hum Behav Emerg Technol. 2023;2023:3277040. [ FREE Full text ] [ CrossRef ]
  • Li L, Griffiths MD, Mei S, Niu Z. Fear of missing out and smartphone addiction mediates the relationship between positive and negative affect and sleep quality among Chinese university students. Front Psychiatry. 2020;11:877. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li Y, Mu W, Sun C, Kwok SYCL. Surrounded by smartphones: relationship between peer phubbing, psychological distress, problematic smartphone use, daytime sleepiness, and subjective sleep quality. Appl Res Qual Life. 2023;18:1099-1114. [ CrossRef ]
  • Luo X, Hu C. Loneliness and sleep disturbance among first-year college students: the sequential mediating effect of attachment anxiety and mobile social media dependence. Psychol Sch. 2022;59(9):1776-1789. [ CrossRef ]
  • Luqman A, Masood A, Shahzad F, Shahbaz M, Feng Y. Untangling the adverse effects of late-night usage of smartphone-based SNS among university students. Behav Inf Technol. 2021;40(15):1671-1687. [ CrossRef ]
  • Makhfudli, Aulia A, Pratiwi A. Relationship intensity of social media use with quality of sleep, social interaction, and self-esteem in urban adolescents in Surabaya. Sys Rev Pharm. 2020;11(5):783-788. [ CrossRef ]
  • Ozcan B, Acimis NM. Sleep quality in Pamukkale university students and its relationship with smartphone addiction. Pak J Med Sci. 2021;37(1):206-211. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Peltz JS, Bodenlos JS, Kingery JN, Abar C. Psychological processes linking problematic smartphone use to sleep disturbance in young adults. Sleep Health. 2023;9(4):524-531. [ CrossRef ] [ Medline ]
  • Pérez-Chada D, Bioch SA, Schönfeld D, Gozal D, Perez-Lloret S, Sleep in Adolescents Collaborative Study Group. Screen use, sleep duration, daytime somnolence, and academic failure in school-aged adolescents. PLoS One. 2023;18(2):e0281379. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Przepiorka A, Blachnio A. The role of Facebook intrusion, depression, and future time perspective in sleep problems among adolescents. J Res Adolesc. 2020;30(2):559-569. [ CrossRef ] [ Medline ]
  • Rudolf K, Bickmann P, Froböse I, Tholl C, Wechsler K, Grieben C. Demographics and health behavior of video game and eSports players in Germany: the eSports study 2019. Int J Environ Res Public Health. 2020;17(6):1870. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sami H, Danielle L, Lihi D, Elena S. The effect of sleep disturbances and internet addiction on suicidal ideation among adolescents in the presence of depressive symptoms. Psychiatry Res. 2018;267:327-332. [ CrossRef ] [ Medline ]
  • Scott H, Woods HC. Fear of missing out and sleep: cognitive behavioural factors in adolescents' nighttime social media use. J Adolesc. 2018;68:61-65. [ CrossRef ] [ Medline ]
  • Spagnoli P, Balducci C, Fabbri M, Molinaro D, Barbato G. Workaholism, intensive smartphone use, and the sleep-wake cycle: a multiple mediation analysis. Int J Environ Res Public Health. 2019;16(19):3517. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Stanković M, Nešić M, Čičević S, Shi Z. Association of smartphone use with depression, anxiety, stress, sleep quality, and internet addiction. empirical evidence from a smartphone application. Pers Individ Differ. Jan 2021;168:110342. [ CrossRef ]
  • Tandon A, Kaur P, Dhir A, Mäntymäki M. Sleepless due to social media? investigating problematic sleep due to social media and social media sleep hygiene. Comput Hum Behav. Dec 2020;113:106487. [ FREE Full text ] [ CrossRef ]
  • Wang PY, Chen KL, Yang SY, Lin PH. Relationship of sleep quality, smartphone dependence, and health-related behaviors in female junior college students. PLoS One. 2019;14(4):e0214769. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang Q, Zhong Y, Zhao G, Song R, Zeng C. Relationship among content type of smartphone use, technostress, and sleep difficulty: a study of university students in China. Educ Inf Technol. Aug 02, 2022;28(2):1697-1714. [ CrossRef ]
  • Wong HY, Mo HY, Potenza MN, Chan MNM, Lau WM, Chui TK, et al. Relationships between severity of internet gaming disorder, severity of problematic social media use, sleep quality and psychological distress. Int J Environ Res Public Health. 2020;17(6):1879. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Xie X, Dong Y, Wang J. Sleep quality as a mediator of problematic smartphone use and clinical health symptoms. J Behav Addict. 2018;7(2):466-472. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yang SY, Chen KL, Lin PH, Wang PY. Relationships among health-related behaviors, smartphone dependence, and sleep duration in female junior college students. Soc Health Behav. 2019;2(1):26-31. [ FREE Full text ] [ CrossRef ]
  • Yıldırım M, Öztürk A, Solmaz F. Fear of COVID-19 and sleep problems in Turkish young adults: mediating roles of happiness and problematic social networking sites use. Psihologija. 2023;56(4):497-515. [ FREE Full text ] [ CrossRef ]
  • Zhai X, Ye M, Wang C, Gu Q, Huang T, Wang K, et al. Associations among physical activity and smartphone use with perceived stress and sleep quality of Chinese college students. Mental Health and Physical Activity. Mar 2020;18:100323. [ CrossRef ]
  • Zhang MX, Wu AMS. Effects of smartphone addiction on sleep quality among Chinese university students: the mediating role of self-regulation and bedtime procrastination. Addict Behav. 2020;111:106552. [ CrossRef ] [ Medline ]
  • Zhang MX, Zhou H, Yang HM, Wu AMS. The prospective effect of problematic smartphone use and fear of missing out on sleep among Chinese adolescents. Curr Psychol. May 24, 2021;42(7):5297-5305. [ CrossRef ]
  • Beyens I, Nathanson AI. Electronic media use and sleep among preschoolers: evidence for time-shifted and less consolidated sleep. Health Commun. 2019;34(5):537-544. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mazurek MO, Engelhardt CR, Hilgard J, Sohl K. Bedtime electronic media use and sleep in children with autism spectrum disorder. J Dev Behav Pediatr. 2016;37(7):525-531. [ CrossRef ] [ Medline ]
  • King DL, Delfabbro PH, Zwaans T, Kaptsis D. Sleep interference effects of pathological electronic media use during adolescence. Int J Ment Health Addict. 2014;12:21-35. [ CrossRef ]
  • Kubiszewski V, Fontaine R, Rusch E, Hazouard E. Association between electronic media use and sleep habits: an eight-day follow-up study. Int J Adolesc Youth. 2013;19(3):395-407. [ FREE Full text ] [ CrossRef ]

Abbreviations

Edited by G Eysenbach, T Leung; submitted 20.04.23; peer-reviewed by M Behzadifar, F Estévez-López, R Prieto-Moreno; comments to author 18.05.23; revised version received 15.06.23; accepted 26.03.24; published 23.04.24.

©Xiaoning Han, Enze Zhou, Dong Liu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 23.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. Systematic Literature Review Methodology

    a meta analysis literature review

  2. 3 Systematic Reviews and Meta-Analyses

    a meta analysis literature review

  3. How Is A Meta-Analysis Performed?

    a meta analysis literature review

  4. (PDF) Meta-analysis of Systematic Literature Review Methods

    a meta analysis literature review

  5. How to Write a Literature Review Meta-Analysis

    a meta analysis literature review

  6. The difference between a systematic review and a meta-analysis

    a meta analysis literature review

VIDEO

  1. Literature, Systematic Review & Meta Analysis

  2. Meta Analysis Research (मेटा विश्लेषण अनुसंधान) #ugcnet #ResearchMethodology #educationalbyarun

  3. Introduction to Meta-Analysis

  4. How to Conduct Research for Your Scientific Inquiry or Project?

  5. Meta analysis

  6. Coding Studies for Systematic Reviews (and Meta-Analysis)

COMMENTS

  1. How to conduct a meta-analysis in eight steps: a practical guide

    Similar to conducting a literature review, the search process of a meta-analysis should be systematic ... Cunha PV (2009) A review and evaluation of meta-analysis practices in management research. J Manag 35(2):393-419. Google Scholar Glass GV (2015) Meta-analysis at middle age: a personal history. Res Synth Methods 6(3):221-231 ...

  2. Introduction to systematic review and meta-analysis

    A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective ...

  3. Meta‐analysis and traditional systematic literature reviews—What, why

    Review Manager (RevMan) is a web-based software that manages the entire literature review process and meta-analysis. The meta-analyst uploads all studies to RevMan library, where they can be managed and exanimated for inclusion. Like CMA, RevMan enables authors to conduct overall analysis and moderator analysis. 4.4.6.3 Stata

  4. PDF How to conduct a meta-analysis in eight steps: a practical guide

    tion meta-analyses. 2.2 Step 2: literature search 2.2.1 Search straegiest Similar to conducting a literature review, the search process of a meta-analysis should be systematic, reproducible, and transparent, resulting in a sample that includes all relevant studies (Fisch and Block 2018; Gusenbauer and Haddaway 2020).

  5. Systematic Reviews and Meta-Analysis: A Guide for Beginners

    Meta-analysis is a statistical tool that provides pooled estimates of effect from the data extracted from individual studies in the systematic review. The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of ...

  6. How to Review a Meta-analysis

    Meta-analysis is a systematic review of a focused topic in the literature that provides a quantitative estimate for the effect of a treatment intervention or exposure. The key to designing a high quality meta-analysis is to identify an area where the effect of the treatment or exposure is uncertain and where a relatively homogenous body of ...

  7. Systematic Reviews and Meta Analysis

    It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

  8. A Simple Guide to Systematic Reviews and Meta-Analyses

    Systematic reviews and meta-analyses lie on the top of the evidence hierarchy because they utilize explicit methods for literature search and retrieval of studies relevant to the review question ... Mani R. A systematic review and meta-analysis of nutritional supplementation in chronic lower extremity wounds. Int J Low Extrem Wounds. 2016;15(4 ...

  9. Meta-analysis of social science research: A practitioner's guide

    That is, you should have conducted primary research on the topic, written a detailed narrative literature review, or taught extensively on the subject. ... 4 Note that the term "fixed effects" has a different meaning in econometrics and much of the meta-analysis literature. In econometrics, a fixed-effects panel data model denotes a ...

  10. Methodological Guidance Paper: High-Quality Meta-Analysis in a

    This methodological guidance article is focused on the use of meta-analysis in a systematic review. A prior article in this series, Alexander (in press), discusses the art and science of all systematic reviews with an emphasis on the importance of the literature search, coding, and results interpretation.Systematic reviews analyze and synthesize a body of literature in a logical, transparent ...

  11. A step by step guide for conducting a systematic review and meta

    Detailed steps for conducting any systematic review and meta-analysis. We searched the methods reported in published SR/MA in tropical medicine and other healthcare fields besides the published guidelines like Cochrane guidelines {Higgins, 2011 #7} [] to collect the best low-bias method for each step of SR/MA conduction steps.Furthermore, we used guidelines that we apply in studies for all SR ...

  12. Systematic Reviews and Meta Analysis

    PRISMA-P is a 17-item checklist for elements considered essential in protocol for a systematic review or meta-analysis. The documentation contains an excellent rationale for completing a protocol, too. Use PRISMA-ScR, a 20-item checklist, for reporting scoping reviews. The documentation provides a clear overview of scoping reviews.

  13. Systematic Reviews and Meta-Analysis

    The course provides a general overview of all aspects of a scientific literature review, including formulating a problem, finding the relevant literature, coding studies, and meta-analysis. It follows guidelines and standards developed by the Campbell Collaboration, based on empirical evidence about how to produce the most comprehensive and ...

  14. (PDF) Literature Reviews and Meta Analysis

    Literature Reviews and Meta Analysis. January 2010. DOI: 10.1007/978--387-09757-2_18. In book: Handbook of Clinical Psychology Competencies. Publisher: Springer New York. Editors: Jay C. Thomas ...

  15. Systematic reviews vs meta-analysis: what's the difference?

    A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, pre-specified eligibility criteria for including articles, and a systematic method for its production. Whereas a meta-analysis is a quantitative, epidemiological study design used to assess the results of articles ...

  16. A Guide to Conducting a Meta-Analysis

    Abstract. Meta-analysis is widely accepted as the preferred method to synthesize research findings in various disciplines. This paper provides an introduction to when and how to conduct a meta-analysis. Several practical questions, such as advantages of meta-analysis over conventional narrative review and the number of studies required for a ...

  17. Literature Reviews and Meta Analysis

    The overall goals of a meta-analysis are the same as any review which were noted earlier (i.e., critically evaluate and summarize a body of research, reach some conclusions about that research, and offer suggestions for future work). The unique feature of a meta-analysis is its ability to quantify the magnitude of the findings via the effect size.

  18. Ten simple rules for carrying out and writing meta-analyses

    Rule 1: Specify the topic and type of the meta-analysis. Considering that a systematic review [ 10] is fundamental for a meta-analysis, you can use the Population, Intervention, Comparison, Outcome (PICO) model to formulate the research question. It is important to verify that there are no published meta-analyses on the specific topic in order ...

  19. What is a Systematic Review and Meta-Analysis

    A meta-analysis is the use of statistical methods to summarize the results of a systematic review. Not every systematic review contains a meta-analysis. A meta-analysis may not be appropriate if the designs of the studies are too different, if there are concerns about the quality of studies, if the outcomes measured are not sufficiently similar ...

  20. Literature Review, Systematic Review and Meta-analysis

    Meta-analysis is a specialised type of systematic review which is quantitative and rigorous, often comparing data and results across multiple similar studies. This is a common approach in medical research where several papers might report the results of trials of a particular treatment, for instance. The meta-analysis then statistical ...

  21. How to Review a Meta-analysis

    Meta-analysis is a systematic review of a focused topic in the literature that provides a quantitative estimate for the effect of a treatment intervention or exposure. The key to designing a high quality meta-analysis is to identify an area where the effect of the treatment or exposure is uncertain and where a relatively homogenous body of ...

  22. Surgery is associated with better long-term outcomes than

    The following articles were included in the systematic review and meta-analysis 15,16,17,18,19,20. In total, there were 427 participants. In total, there were 427 participants. All studies were RCT.

  23. Frequency, complications, and mortality of inhalation injury in burn

    This is a systematic literature review and meta-analysis, according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA). PubMed/MEDLINE, Embase, LILACS/VHL, Scopus, Web of Science, and CINAHL databases will be consulted without language restrictions and publication date. Studies presenting incomplete data and ...

  24. Effects of Chronic Static Stretching on Maximal Strength and Muscle

    A review of the literature. Clin Physiol Funct Imaging. 2020;40:148-56. Article PubMed Google Scholar Panidi I, Donti O, Konrad A, Petros CD, Terzis G, Mouratidis A et al. Muscle architecture adaptations to static stretching training: a systematic review with meta-analysis. Sports Med Open. 2023;9.

  25. Understanding and Evaluating Systematic Reviews and Meta-analyses

    Keywords: Bias, meta-analysis, number needed to treat, publication bias, randomized controlled trials, systematic review Introduction A systematic review is a summary of existing evidence that answers a specific clinical question, contains a thorough, unbiased search of the relevant literature, explicit criteria for assessing studies and ...

  26. Mortality burden of pre-treatment weight loss in patients with non

    Cachexia, with weight loss (WL) as a major component, is highly prevalent in patients with cancer and indicates a poor prognosis. The primary objective of this study was to conduct a meta-analysis to estimate the risk of mortality associated with cachexia (using established WL criteria prior to treatment initiation) in patients with non-small-cell lung cancer (NSCLC) in studies identified ...

  27. Understanding the influence of different proxy perspectives in

    Understanding the influence of different proxy perspectives in explaining the difference between self-rated and proxy-rated quality of life in people living with dementia: a systematic literature review and meta-analysis. Review; Open access; Published: 24 April 2024 (2024) Cite this article

  28. Frontiers

    By performing a meta-analysis, we aimed to evaluate AI accuracy and provide evidence for its clinical application and role in decision making. Materials and methods. This combined systematic review and meta-analysis was based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines.

  29. Systematic Literature Reviews and Meta-Analyses

    Systematic literature reviews and meta-analyses enable the research findings and treatment effects obtained in different individual studies to be summed up and evaluated. Keywords: literature search, systematic review, meta-analysis, clinical research, epidemiology. Every year, there is a great increase in the number of scientific publications.

  30. Electronic Media Use and Sleep Quality: Updated Systematic Review and

    Background: This paper explores the widely discussed relationship between electronic media use and sleep quality, indicating negative effects due to various factors. However, existing meta-analyses on the topic have some limitations. Objective: The study aims to analyze and compare the impacts of different digital media types, such as smartphones, online games, and social media, on sleep quality.