Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis

Jump to navigation

Home

Cochrane Training

Chapter 10: analysing data and undertaking meta-analyses.

Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Key Points:

  • Meta-analysis is the statistical combination of results from two or more separate studies.
  • Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, within-study biases, variation across studies, and reporting biases are not carefully considered.
  • It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
  • Most meta-analysis methods are variations on a weighted average of the effect estimates from the different studies.
  • Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
  • Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Random-effects meta-analyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation.
  • Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August  2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

10.1 Do not start here!

It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of meta-analyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a meta-analysis is undertaken.

10.2 Introduction to meta-analysis

An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of meta-analyses include the following:

  • T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
  • To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
  • To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. The use of network meta-analysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).

10.2.1 Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

  • Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).

research on meta analysis

  • The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a random-effects meta-analysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
  • The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
  • As well as yielding a summary quantification of the intervention effect, all methods of meta-analysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
  • The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a meta-analysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to follow-up or exclusions from analysis) (see Section 10.12 ).

Meta-analyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and meta-analyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the meta-analysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.

Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons

research on meta analysis

10.3 A generic inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.

10.3.1 Fixed-effect method for meta-analysis

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

research on meta analysis

where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixed-effect meta-analysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a ‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-analysis can be interpreted without making such an assumption (Rice et al 2018).

10.3.2 Random-effects methods for meta-analysis

A variation on the inverse-variance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a random-effects meta-analysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Random-effects meta-analysis is discussed in detail in Section 10.10.4 .

10.3.3 Performing inverse-variance meta-analyses

Most meta-analysis programs perform inverse-variance meta-analyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).

When the data are conveniently available as summary statistics from each intervention group, the inverse-variance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in meta-analysis. It facilitates the analysis of properly analysed crossover trials, cluster-randomized trials and non-randomized trials (see Chapter 23 ), as well as outcome data that are ordinal, time-to-event or rates (see Chapter 6 ).

10.4 Meta-analysis of dichotomous outcomes

There are four widely used methods of meta-analysis for dichotomous outcomes, three fixed-effect methods (Mantel-Haenszel, Peto and inverse variance) and one random-effects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the meta-analysis methods are available elsewhere (Deeks et al 2001).

Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .

10.4.1 Mantel-Haenszel methods

When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inverse-variance methods may be poor. Mantel-Haenszel methods are fixed-effect meta-analysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the Mantel-Haenszel method is generally preferable to the inverse variance method in fixed-effect meta-analyses. In other situations the two methods give similar estimates.

10.4.2 Peto odds ratio method

Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inverse-variance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.

The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis.

Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section 10.4.4.1 . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using time-to-event analyses where log-rank tests have been used (see Section 10.9 ).

10.4.3 Which effect measure for dichotomous outcomes?

Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a meta-analysis, sometimes dramatically so.

The selection of a summary statistic for use in meta-analysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the meta-analysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid meta-analysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.

Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing meta-analyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the non-event (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a meta-analysis estimate.

Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in meta-analysis, although it can be computed from the meta-analysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other often-cited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).

Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from meta-analyses as risk ratios. Although odds ratios can be re-expressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent over-estimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow trade-offs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.

It is generally recommended that meta-analyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which meta-analysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the meta-analysis (see Section 10.14 ).

It is often sensible to use one statistic for meta-analysis and to re-express the results using a second, more easily interpretable statistic. For example, often meta-analysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results re-expressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for meta-analysis they can also be re-expressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for meta-analysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for meta-analysis.

10.4.4 Meta-analysis of rare events

For rare outcomes, meta-analysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually under-powered to detect differences in rare outcomes, but a meta-analysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of meta-analysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of meta-analysis (Efthimiou 2018).

There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the meta-analysis observe no events in one or more study arms.

10.4.4.1 Studies with no events in one or more arms

Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance meta-analytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most meta-analytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The Mantel-Haenszel methods require zero-cell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for Mantel-Haenszel methods as for the inverse-variance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.

Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and over-estimating variances of study estimates (consequently down-weighting inappropriately their contribution to the meta-analysis). Where the sizes of the study arms are unequal (which occurs more commonly in non-randomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative non-fixed zero-cell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004).

10.4.4.2 Studies with no events in either arm

The standard practice in meta-analysis of odds ratios and risk ratios is to exclude studies from the meta-analysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.

Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for meta-analysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.

It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a meta-analysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference meta-analyses will be affected by non-reporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected.

10.4.4.3 Validity of methods of meta-analysis for rare events

Simulation studies have revealed that many meta-analytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.

In the following we consider the choice of statistical method for meta-analyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.

Bradburn and colleagues found that many of the most commonly used meta-analytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel odds ratio method using a 0.5 zero-cell correction. As already noted, risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.

At event rates below 1% the Peto one-step odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different meta-analytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).

This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were under-estimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.

In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and meta-analyses where many studies were substantially imbalanced) the best performing methods were the Mantel-Haenszel odds ratio without zero-cell corrections, logistic regression and an exact method. None of these methods is available in RevMan.

Methods that should be avoided with rare events are the inverse-variance methods (including the DerSimonian and Laird random-effects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the meta-analysis, but these are usually based on a large-sample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.

10.5 Meta-analysis of continuous outcomes

An important assumption underlying standard methods for meta-analysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).

10.5.1 Which effect measure for continuous outcomes?

The two summary statistics commonly used for meta-analysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .

The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.

For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.

For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that between-study variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section 6.5.1.2 .

These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.

10.5.2 Meta-analysis of change scores

In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of post-intervention values, as it removes a component of between-person variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true between-person baseline variability. Change-from-baseline outcomes may also be preferred if they have a less skewed distribution than post-intervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.

The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a meta-analysis using the generic inverse-variance method, since means and SDs are not available for each intervention group separately.

In practice an author is likely to discover that the studies included in a review include a mixture of change-from-baseline and post-intervention value scores. However, mixing of outcomes is not a problem when it comes to meta-analysis of MDs. There is no statistical reason why studies with change-from-baseline outcomes should not be combined in a meta-analysis with studies with post-intervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on post-intervention measurements. That is to say, the difference in mean post-intervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if post-intervention values had been used, as they will have smaller SDs.

When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of post-intervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.

In contrast, post-intervention value and change scores should not in principle be combined using standard meta-analysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing post-intervention values reflects between-person variability at a single point in time. The SD when standardizing change scores reflects variation in between-person changes over time, so will depend on both within-person and between-person variability; within-person variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 meta-analyses in osteoarthritis did not find a difference between combined SMDs based on post-intervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using post-intervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.

A common practical problem associated with including change-from-baseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section 6.5.2.8 .

10.5.3 Meta-analysis of skewed data

Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for change-from-baseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.

Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.

Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A meta-analysis may be then performed on the scale of the log-transformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section 6.5.2.4 . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Log-transformed and untransformed data should not be mixed in a meta-analysis.

MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews

10.6 Combining dichotomous and continuous outcomes

Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cut-point may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.

There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.

There are statistical approaches available that will re-express odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (Anzures-Cabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be re-expressed as a SMD according to the following simple formula (Chinn 2000):

research on meta analysis

The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be re-expressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).

10.7 Meta-analysis of ordinal outcomes and measurement scale s

Ordinal and measurement scale outcomes are most commonly meta-analysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.

Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.

The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a meta-analysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be meta-analysed using the generic inverse-variance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the meta-analysis. In the context of the three-category model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).

10.8 Meta-analysis of counts and rates

Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and time-to-event data (see Section 10.9 ), as well as being analysed as rate data.

Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a follow-up period of two years. Her rate of strokes is one per year of follow-up (or, equivalently 0.083 per month of follow-up). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 person-years of follow-up. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 women-years of follow-up. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or time-to-event methods that investigate the time elapsing before the first pregnancy.

Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:

  • the assumption of a constant underlying risk may not be suitable; and
  • the statistical methods are not as well developed as they are for other types of data.

The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inverse-variance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).

In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of follow-up should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.

It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inverse-variance method (Hasselblad and McCrory 1995, Guevara et al 2004).

10.9 Meta-analysis of time-to-event outcomes

Two approaches to meta-analysis of time-to-event outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from re-analysis of individual participant data.

If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through re-analysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a log-rank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing time-to-event data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixed-effect meta-analysis methods are available in RevMan for ‘O – E and Variance’ outcomes.

Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inverse-variance methods (see Section 10.3.3 ).

If a mixture of log-rank and Cox model estimates are obtained from the studies, all results can be combined using the generic inverse-variance method, as the log-rank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .

10.10 Heterogeneity

10.10.1 what is heterogeneity.

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .

Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.

Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .

The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Meta-analysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a meta-analysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.

MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews

There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Meta-regression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).

10.10.2 Identifying and measuring heterogeneity

It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chi-squared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).

MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews

Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a meta-analysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a non-significant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a meta-analysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.

Some argue that, since clinical and methodological diversity always occur in a meta-analysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the meta-analysis. A useful statistic for quantifying inconsistency is:

research on meta analysis

In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of meta-analyses of randomized trials is as follows:

  • 0% to 40%: might not be important;
  • 30% to 60%: may represent moderate heterogeneity*;
  • 50% to 90%: may represent substantial heterogeneity*;
  • 75% to 100%: considerable heterogeneity*.

*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).

10.10.3 Strategies for addressing heterogeneity

Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a meta-analysis.

MECIR Box 10.10.c  Relevant expectations for conduct of intervention reviews

  • Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into meta-analysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unit-of-analysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).  
  • Do not do a meta -analysis. A systematic review need not contain any meta-analyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.  
  • Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or meta-regression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly pre-specified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.  
  • Ignore heterogeneity. Fixed-effect meta-analyses ignore heterogeneity. The summary effect estimate from a fixed-effect meta-analysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixed-effect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).  
  • Perform a random-effects meta-analysis. A random-effects meta-analysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .  
  • Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular meta-analysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).  
  • Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a meta-analysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any meta-analysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.

10.10.4 Incorporating heterogeneity into random-effects models

The random-effects meta-analysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A random-effects meta-analysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of random-effects meta-analyses. The importance of the assumed shape for this distribution has not been widely studied.

To undertake a random-effects meta-analysis, the standard errors of the study-specific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tau-squared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the meta-analysis.

In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.

Note that a random-effects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ).

10.10.4.1 Fixed or random effects?

A fixed-effect meta-analysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixed-effect meta-analysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).

A random-effects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.

The random-effects method and the fixed-effect method will give identical results when there is no heterogeneity among the studies.

When heterogeneity is present, a confidence interval around the random-effects summary estimate is wider than a confidence interval around a fixed-effect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).

Sometimes the central estimate of the intervention effect is different between fixed-effect and random-effects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or within-study bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a random-effects meta-analysis will exacerbate the effects of the bias (see also Chapter 13, Section 13.3.5.6 ). A fixed-effect analysis will be affected less, although strictly it will also be inappropriate.

The decision between fixed- and random-effects meta-analyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:

  • Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixed-effect model if this is likely and a random-effects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the random-effects model.
  • Others have argued that a fixed-effect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a random-effects meta-analysis. They then refer to it as a ‘fixed-effects’ meta-analysis (Peto et al 1995, Rice et al 2018).
  • Under any interpretation, a fixed-effect meta-analysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
  • In the presence of heterogeneity, a random-effects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the random-effects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
  • A pragmatic approach is to plan to undertake both a fixed-effect and a random-effects meta-analysis, with an intention to present the random-effects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using meta-regression (see Chapter 13, Section 13.3.5.6 ).
  • The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.

10.10.4.2 Interpretation of random-effects meta-analyses

The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3 ).

Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The random-effects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of over-estimates and under-estimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.

When there is little information, either because there are few studies or if the studies are small with few events, a random-effects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixed-effect methods such as the Mantel-Haenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity.

10.10.4.3 Prediction intervals from a random-effects meta-analysis

An estimate of the between-study variance in a random-effects meta-analysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.

To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the random-effects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:

research on meta analysis

where M is the summary mean from the random-effects meta-analysis, t k −2 is the 95% percentile of a t -distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.

The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the meta-analysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the random-effects meta-analysis.

Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a meta-analysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry.

10.10.4.4 Implementing random-effects meta-analyses

As introduced in Section 10.3.2 , the random-effects model can be implemented using an inverse-variance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of random-effects meta-analysis that is described by DerSimonian and Laird, making use of a ‘moment-based’ estimate of the between-study variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.

For many years, RevMan has implemented two random-effects methods for dichotomous data: a Mantel-Haenszel method and an inverse-variance method. Both use the moment-based approach to estimating the amount of between-studies variation. The difference between the two is subtle: the former estimates the between-study variation by comparing each study’s result with a Mantel-Haenszel fixed-effect meta-analysis result, whereas the latter estimates it by comparing each study’s result with an inverse-variance fixed-effect meta-analysis result. In practice, the difference is likely to be trivial.

There are alternative methods for performing random-effects meta-analyses that have better technical properties than the DerSimonian and Laird approach with a moment-based estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of between-study heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).

An empirical comparison of different ways to estimate between-study variation in Cochrane meta-analyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.

10.11 Investigating heterogeneity

10.11.1 interaction and effect modification.

Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and meta-regression. All methods have considerable pitfalls.

10.11.2 What are subgroup analyses?

Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.

Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.

Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.

10.11.3 Undertaking subgroup analyses

Meta-analyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the meta-analysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results.

10.11.3.1 Is the effect different in different subgroups?

Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare within-subgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, non-overlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.

A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the meta-analysis uses a fixed-effect inverse-variance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).

If fixed-effect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If random-effects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.

An alternative method for testing for differences between subgroups is to use meta-regression techniques, in which case a random-effects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on random-effects models may be regarded as preferable to those based on fixed-effect models, due to the high risk of false-positive results when a fixed-effect model is used to compare subgroups (Higgins and Thompson 2004).

MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews

10.11.4 Meta-regression

If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the meta-analysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Meta-regression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Meta-regression should generally not be considered when there are fewer than ten studies in a meta-analysis.

Meta-regressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In meta-regression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Meta-regressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘random-effects meta-regression’, since the extra variability is incorporated in the same way as in a random-effects meta-analysis (Thompson and Sharp 1999).

The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the log-transformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section 6.1.2.1 ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.

Meta-regression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.

Meta-regression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.

10.11.5 Selection of study characteristics for subgroup analyses and meta-regression

Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to meta-regressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994).

10.11.5.1 Ensure that there are adequate studies to justify subgroup analyses and meta-regressions

It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies.

10.11.5.2 Specify characteristics in advance

Authors should, whenever possible, pre-specify characteristics in the protocol that later will be subject to subgroup analyses or meta-regression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Pre-specifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True pre-specification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such post-hoc analyses should be identified as such.

MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews

10.11.5.3 Select a small number of characteristics

The likelihood of a false-positive result among subgroup analyses and meta-regression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons.

10.11.5.4 Ensure there is scientific rationale for investigating each characteristic

Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a post-hoc data dredge of factors that may have changed over time.

Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of follow-up) or methodology (design and quality).

10.11.5.5 Be aware that the effect of a characteristic may not always be identified

Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or meta-regression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and trial-level effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study.

10.11.5.6 Think about whether the characteristic is closely related to another characteristic (confounded)

The problem of ‘confounding’ complicates interpretation of subgroup analyses and meta-regressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In meta-regression, co-linearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.

10.11.6 Interpretation of subgroup analyses and meta-regressions

Appropriate interpretation of subgroup analyses and meta-regressions requires caution (Oxman and Guyatt 1992).

  • Subgroup comparisons are observational. It must be remembered that subgroup analyses and meta-regressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other study-level characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.  
  • Was the analysis pre-specified or post hoc? Authors should state whether subgroup analyses were pre-specified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of pre-specified analyses. Performing numerous post-hoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.  
  • Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.  
  • Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.  
  • Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section 10.11.3.1 ).  
  • Are analyses looking at within-study or between-study relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings.

10.11.7 Investigating the effect of underlying risk

One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of case-mix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of follow-up, which often varies across studies. However, underlying risk has received particular attention in meta-analysis because the information is readily available once dichotomous data have been prepared for use in meta-analyses. Sharp provides a full discussion of the topic (Sharp 2001).

Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).

Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in meta-analysis (see Section 10.4.3 ).

Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.

10.11.8 Dose-response analyses

The principles of meta-regression can be applied to the relationships between intervention effect and dose (commonly termed dose-response), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on between-study differences. Authors should be particularly cautious about claiming that a dose-response relationship does not exist, given the low power of many meta-regression analyses to detect genuine relationships.

10.12 Missing data

10.12.1 types of missing data.

There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .

Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .

Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section 7.2.3.3 ). Similarly, summary data for an outcome, in a form that can be included in a meta-analysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when change-from-baseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section 6.5.2.8 . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, follow-up times for calculating rates, and sufficient details of time-to-event outcomes. Inappropriate analyses of studies, for example of cluster-randomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for cluster-randomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a meta-analysis.

It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .

Missing data can also affect subgroup analyses. If subgroup analyses or meta-regressions are planned (see Section 10.11 ), they require details of the study-level characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.

Table 10.12.a Types of missing data in a meta-analysis

MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews

10.12.2 General principles for dealing with missing data

There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.

Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.

Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.

The principal options for dealing with missing data are:

  • analysing only the available data (i.e. ignoring the missing data);
  • imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
  • imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
  • using statistical models to allow for missing data, making assumptions about their relationships with the available data.

Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.

Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:

  • Whenever possible, contact the original investigators to request missing data.
  • Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
  • Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
  • Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
  • Address the potential impact of missing data on the findings of the review in the Discussion section.

10.12.3 Dealing with missing outcome data from individual participants

Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).

Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).

10.13 Bayesian approaches to meta-analysis

Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the meta-analysis (for example, 2×2 tables from randomized trials) and the meta-analysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard meta-regression software with a simple trick (Rhodes et al 2016).

A difference between Bayesian analysis and classical meta-analysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the long-term frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different meta-analysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.

In the context of a meta-analysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a random-effects meta-analysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the meta-analysis, such as information from non-randomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘non-informative’ prior can be used, in which all values across the possible range are equally likely.

Most Bayesian meta-analyses use non-informative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of among-study variation in a random-effects analysis. This is particularly advantageous when the number of studies in the meta-analysis is small, say fewer than five or ten. Libraries of data-based prior distributions are available that have been derived from re-analyses of many thousands of meta-analyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).

Box 10.13.a Some potential advantages of Bayesian meta-analysis

Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).

10.14 Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or meta-analysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’

MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews

There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

  • Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

  • Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
  • Characteristics of the intervention: what range of doses should be included in the meta-analysis?
  • Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
  • Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
  • Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

  • Time-to-event data: what assumptions of the distribution of censored data should be made?
  • Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on post-intervention values?
  • Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?
  • Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
  • Crossover trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?
  • All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?

Analysis methods:

  • Should fixed-effect or random-effects methods be used for the analysis?
  • For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
  • For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

10.15 Chapter information

Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead

Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil Als-Nielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.

Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

10.16 References

Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.

Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, Alonso-Coello P, Guyatt G. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.

Akl EA, Kahale LA, Ebrahim S, Alonso-Coello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147-154.

Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 2967-2985.

Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993; 4 : 218-228.

Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group A-LAITS. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371-387.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1 : 97-111.

Borenstein M, Higgins JPT. Meta-analysis and subgroups. Prev Sci 2013; 14 : 134-143.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2007; 26 : 53-77.

Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 2000; 19 : 3127-3131.

da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847-855.

Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703-709.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285-312.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 1575-1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177-188.

DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.

Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 1014-1021 e1011.

Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560-570.

Efthimiou O. Practical guide to the meta-analysis of rare events. Evidence-Based Mental Health 2018; 21 : 72-76.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315 : 629-634.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Statistics in Medicine 2000; 19 : 1707-1728.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse follow-up data. Biometrics 1985; 41 : 55-68.

Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 1-30.

Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology 1992; 135 : 1301-1309.

Guevara JP, Berlin JA, Wolf FM. Meta-analytic methods for pooling rates when follow-up duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 3875-3889.

Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 81-96.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21 : 1539-1558.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327 : 557-560.

Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 2004; 23 : 1663-1682.

Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008a; 5 : 225-239.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Statistics in Medicine 2008b; 27 : 6072-6092.

Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137-159.

Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Annals of Internal Medicine 2001; 135 : 982-989.

Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 meta-analyses. Research Synthesis Methods 2015; 6 : 195-205.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181-198.

Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research Synthesis Methods 2019; 10 : 83-98.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 1479-1480.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325-337.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719-748.

McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 1713-1728.

Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 1336-1344.

Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 78-84.

Peto R, Collins R, Gray R. Large-scale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 23-40.

Poole C, Greenland S. Random-effects meta-analyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469-475.

Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Statistics in Medicine 2016; 35 : 5495-5511.

Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205-227.

Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Röver C. Bayesian random-effects meta-analysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .

Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine 2009; 28 : 721-738.

Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176-188.

Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002; 21 : 3153-3159.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Methods 2011; 2 : 139-149.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881-889.

Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Statistics in Medicine 1995; 14 : 2685-2699.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Chichester (UK): John Wiley & Sons; 2004.

Spittal MJ, Pirkis J, Gurrin LC. Meta-analysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-analysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.

Sutton AJ, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277-303.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23 : 1351-1375.

Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Statistics in Medicine 1997; 16 : 2741-2758.

Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statistics in Medicine 1999; 18 : 2693-2708.

Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 1559-1574.

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818-827.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods 2016; 7 : 55-79.

Whitehead A, Jones NMB. A meta-analysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 2503-2515.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335-371.

Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 93-98.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

  • How it works

Meta-Analysis – Guide with Definition, Steps & Examples

Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023

“A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “

Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of meta-analysis or systematic reviews addressing important questions is significant in academics because it informs decision-making.

What is Meta-Analysis  

Meta-analysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Meta-analysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.   

When Should you Conduct a Meta-Analysis?

Meta-analysis has become a widely-used research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies. 

The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).

A researcher or a practitioner should choose meta-analysis when the following outcomes are desirable. 

For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through meta-analysis is possible. 

To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of meta-analysis, the reasons for differences in the effect can also be explored. 

To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.

Elements of a Meta-Analysis

Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the meta-analysis. They are briefly explained below. 

Characteristics: 

  • A systematic review must be completed before conducting the meta-analysis because it provides a summary of the findings of the individual studies synthesised. 
  • You can only conduct a meta-analysis by synthesising studies in a systematic review. 
  • The studies selected for statistical analysis for the purpose of meta-analysis should be similar in terms of comparison, intervention, and population. 

Strengths: 

  • A meta-analysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable. 
  • It gives more value and weightage to existing studies that do not hold practical value on their own. 
  • Policy-makers and academicians cannot base their decisions on individual research studies. Meta-analysis provides them with a complex and solid analysis of evidence to make informed decisions. 

Criticisms: 

  • The meta-analysis uses studies exploring similar topics. Finding similar studies for the meta-analysis can be challenging.
  • When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the meta-analysis results could be misleading.

Steps of Conducting the Meta-Analysis 

The process of conducting the meta-analysis has remained a topic of debate among researchers and scientists. However, the following 5-step process is widely accepted. 

Step 1: Research Question

The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.

Step 2: Systematic Review 

The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.

While some meta-analyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasi-experimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.

Step 3: Data Extraction

After selecting studies for the meta-analysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.

For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.

Step 4: Standardisation and Weighting Studies

After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.

Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.

In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.

Step 5: Absolute Effect Estimation

The ultimate step in conducting a meta-analysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.

Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the meta-analysis through a test for ‘heterogeneity.’

Forest Plot

The results of a meta-analysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.

However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the meta-analysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.

odds ratio

Figure-A: Hypothetical Forest Plot

Relevance to Practice and Research 

  Evidence Based Nursing commentaries often include recently published systematic reviews and meta-analyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.

The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from meta-analyses, and this is influenced by the quality of individual studies and the systematic review itself. However, meta-analysis cannot overcome issues related to underpowered or poorly designed studies.

Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higher-quality research is required to improve clinical decision-making. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.

Methods and Assumptions in Meta-Analysis 

Ensuring the credibility of findings is imperative in all types of research, including meta-analyses. To validate the outcomes of a meta-analysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a meta-analysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.

Homogeneity is preferred in meta-analyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’s-Q and I-Square, also known as I-2 Index.

Difference Between Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.

Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.

Meta-analysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Meta-analysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.

Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while meta-analyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a meta-analysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.

Software Packages For Meta-Analysis

Meta-analysis can be done through software packages, including free and paid options. One of the most commonly used software packages for meta-analysis is RevMan by the Cochrane Collaboration.

Assessing the Quality of Meta-Analysis 

Assessing the quality of a meta-analysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:

  • Study selection: The studies included in the meta-analysis should be relevant to the research question and meet predetermined criteria for quality.
  • Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
  • Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the meta-analysis.
  • Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
  • Analysis methods: The meta-analysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
  • Publication bias: The potential for publication bias should be assessed and reported in the meta-analysis, including any efforts to identify and include unpublished studies.
  • Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
  • Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.

Overall, a high-quality meta-analysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.

Hire an Expert Writer

Orders completed by our expert writers are

  • Formally drafted in an academic style
  • Free Amendments and 100% Plagiarism Free – or your money back!
  • 100% Confidential and Timely Delivery!
  • Free anti-plagiarism report
  • Appreciated by thousands of clients. Check client reviews

Hire an Expert Writer

Examples of Meta-Analysis

  • STANLEY T.D. et JARRELL S.B. (1989), « Meta-regression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161-170.
  • DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a meta-analysis », Strategic Management Journal, Vol. 13, pp. 67-84.
  • GLASS G. (1983), « Synthesising empirical research : Meta-analysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
  • WOLF F.M. (1986), Meta-analysis : Quantitative methods for research synthesis, Sage University Paper n°59.
  • HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Meta-analysis : cumulating research findings across studies », Beverly Hills, CA : Sage.

Frequently Asked Questions

What is a meta-analysis in research.

Meta-analysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, meta-analysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.

Why is meta-analysis important?

Meta-analysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidence-based decisions and identify areas for further research.

What is an example of a meta-analysis?

A meta-analysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.

Through meta-analysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderate-intensity aerobic exercise, performed three to five times per week, was the most effective. The meta-analysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.

What is the definition of meta-analysis in clinical research?

Meta-analysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.

This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Meta-analyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.

Is meta-analysis qualitative or quantitative?

Meta-analysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, meta-analysis is considered a quantitative approach to research synthesis.

You May Also Like

A survey includes questions relevant to the research topic. The participants are selected, and the questionnaire is distributed to collect the data.

You can transcribe an interview by converting a conversation into a written format including question-answer recording sessions between two or more people.

Disadvantages of primary research – It can be expensive, time-consuming and take a long time to complete if it involves face-to-face contact with customers.

USEFUL LINKS

LEARNING RESOURCES

secure connection

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Study Design 101: Meta-Analysis

  • Case Report
  • Case Control Study
  • Cohort Study
  • Randomized Controlled Trial
  • Practice Guideline
  • Systematic Review

Meta-Analysis

  • Helpful Formulas
  • Finding Specific Study Types

A subset of systematic reviews; a method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power. This conclusion is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.

Meta-analysis would be used for the following purposes:

  • To establish statistical significance with studies that have conflicting results
  • To develop a more correct estimate of effect magnitude
  • To provide a more complex analysis of harms, safety data, and benefits
  • To examine subgroups with individual numbers that are not statistically significant

If the individual studies utilized randomized controlled trials (RCT), combining several selected RCT results would be the highest-level of evidence on the evidence hierarchy, followed by systematic reviews, which analyze all available studies on a topic.

  • Greater statistical power
  • Confirmatory data analysis
  • Greater ability to extrapolate to general population affected
  • Considered an evidence-based resource

Disadvantages

  • Difficult and time consuming to identify appropriate studies
  • Not all studies provide adequate data for inclusion and analysis
  • Requires advanced statistical techniques
  • Heterogeneity of study populations

Design pitfalls to look out for

The studies pooled for review should be similar in type (i.e. all randomized controlled trials).

Are the studies being reviewed all the same type of study or are they a mixture of different types?

The analysis should include published and unpublished results to avoid publication bias.

Does the meta-analysis include any appropriate relevant studies that may have had negative outcomes?

Fictitious Example

Do individuals who wear sunscreen have fewer cases of melanoma than those who do not wear sunscreen? A MEDLINE search was conducted using the terms melanoma, sunscreening agents, and zinc oxide, resulting in 8 randomized controlled studies, each with between 100 and 120 subjects. All of the studies showed a positive effect between wearing sunscreen and reducing the likelihood of melanoma. The subjects from all eight studies (total: 860 subjects) were pooled and statistically analyzed to determine the effect of the relationship between wearing sunscreen and melanoma. This meta-analysis showed a 50% reduction in melanoma diagnosis among sunscreen-wearers.

Real-life Examples

Goyal, A., Elminawy, M., Kerezoudis, P., Lu, V., Yolcu, Y., Alvi, M., & Bydon, M. (2019). Impact of obesity on outcomes following lumbar spine surgery: A systematic review and meta-analysis. Clinical Neurology and Neurosurgery, 177 , 27-36. https://doi.org/10.1016/j.clineuro.2018.12.012

This meta-analysis was interested in determining whether obesity affects the outcome of spinal surgery. Some previous studies have shown higher perioperative morbidity in patients with obesity while other studies have not shown this effect. This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive surgery, but patients with obesity who had open surgery had experienced higher blood loss and longer operative times (not clinically meaningful) as well as higher complication and reoperation rates. Further research is needed to explore this issue in patients with morbid obesity.

Nakamura, A., van Der Waerden, J., Melchior, M., Bolze, C., El-Khoury, F., & Pryor, L. (2019). Physical activity during pregnancy and postpartum depression: Systematic review and meta-analysis. Journal of Affective Disorders, 246 , 29-41. https://doi.org/10.1016/j.jad.2018.12.009

This meta-analysis explored whether physical activity during pregnancy prevents postpartum depression. Seventeen studies were included (93,676 women) and analysis showed a "significant reduction in postpartum depression scores in women who were physically active during their pregnancies when compared with inactive women." Possible limitations or moderators of this effect include intensity and frequency of physical activity, type of physical activity, and timepoint in pregnancy (e.g. trimester).

Related Terms

A document often written by a panel that provides a comprehensive review of all relevant studies on a particular clinical or health-related topic/question.

Publication Bias

A phenomenon in which studies with positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Therefore, conclusions based exclusively on published studies can be misleading.

Now test yourself!

1. A Meta-Analysis pools together the sample populations from different studies, such as Randomized Controlled Trials, into one statistical analysis and treats them as one large sample population with one conclusion.

a) True b) False

2. One potential design pitfall of Meta-Analyses that is important to pay attention to is:

a) Whether it is evidence-based. b) If the authors combined studies with conflicting results. c) If the authors appropriately combined studies so they did not compare apples and oranges. d) If the authors used only quantitative data.

Evidence Pyramid - Navigation

  • Meta- Analysis
  • Case Reports
  • << Previous: Systematic Review
  • Next: Helpful Formulas >>

Creative Commons License

  • Last Updated: Sep 25, 2023 10:59 AM
  • URL: https://guides.himmelfarb.gwu.edu/studydesign101

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 16, Issue 1

What is meta-analysis?

  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Allison Shorten 1 ,
  • Brett Shorten 2
  • 1 School of Nursing , Yale University , New Haven, Connecticut , USA
  • 2 Informed Health Choices Trust, Wollongong, New South Wales, Australia
  • Correspondence to : Dr Allison Shorten Yale University School of Nursing, 100 Church Street South, PO Box 9740, New Haven, CT 06536, USA; allison.shorten{at}yale.edu

https://doi.org/10.1136/eb-2012-101118

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

When clinicians begin their search for the best available evidence to inform decision-making, they are usually directed to the top of the ‘evidence pyramid’ to find out whether a systematic review and meta-analysis have been conducted. The Cochrane Library 1 is fast filling with systematic reviews and meta-analyses that aim to answer important clinical questions and provide the most reliable evidence to inform practice and research. So what is meta-analysis and how can it contribute to practice?

The Five-step process

There is debate about the best practice for meta-analysis, however there are five common steps.

Step 1: the research question

A clinical research question is identified and a hypothesis proposed. The likely clinical significance is explained and the study design and analytical plan are justified.

Step 2: systematic review

A systematic review (SR) is specifically designed to address the research question and conducted to identify all studies considered to be both relevant and of sufficiently good quality to warrant inclusion. Often, only studies published in established journals are identified, but identification of ‘unpublished’ data is important to avoid ‘publication bias’ or exclusion of studies with negative findings. 4 Some meta-analyses only consider randomised control trials (RCTs) in the quest for highest quality evidence. Other types of ‘experimental’ and ‘quasi-experimental’ studies may be included if they satisfy the defined inclusion/exclusion criteria.

Step 3: data extraction

Once studies are selected for inclusion in the meta-analysis, summary data or outcomes are extracted from each study. In addition, sample sizes and measures of data variability for both intervention and control groups are required. Depending on the study and the research question, outcome measures could include numerical measures or categorical measures. For example, differences in scores on a questionnaire or differences in a measurement level such as blood pressure would be reported as a numerical mean. However, differences in the likelihood of being in one category versus another (eg, vaginal birth versus cesarean birth) are usually reported in terms of risk measures such as OR or relative risk (RR).

Step 4: standardisation and weighting studies

Having assembled all the necessary data, the fourth step is to calculate appropriate summary measures from each study for further analysis. These measures are usually called Effect Sizes and represent the difference in average scores between intervention and control groups. For example, the difference in change in blood pressure between study participants who used drug X compared with participants who used a placebo. Since units of measurement typically vary across included studies, they usually need to be ‘standardised’ in order to produce comparable estimates of this effect. When different outcome measures are used, such as when researchers use different tests, standardisation is imperative. Standardisation is achieved by taking, for each study, the mean score for the intervention group, subtracting the mean for the control group and dividing this result by the appropriate measure of variability in that data set.

The results of some studies need to carry more weight than others. Larger studies (as measured by sample sizes) are thought to produce more precise effect size estimates than smaller studies. Second, studies with less data variability, for example, smaller SD or narrower CIs are often regarded as ‘better quality’ in study design. A weighting statistic that seeks to incorporate both these factors, known as inverse variance , is commonly used.

Step 5: final estimates of effect

The final stage is to select and apply an appropriate model to compare Effect Sizes across different studies. The most common models used are Fixed Effects and Random Effects models. Fixed Effects models are based on the ‘assumption that every study is evaluating a common treatment effect’. 5 This means that the assumption is that all studies would estimate the same Effect Size were it not for different levels of sample variability across different studies. In contrast, the Random Effects model ‘assumes that the true treatment effects in the individual studies may be different from each other’. 5 and attempts to allow for this additional source of interstudy variation in Effect Sizes . Whether this latter source of variability is likely to be important is often assessed within the meta-analysis by testing for ‘heterogeneity’.

Forest plot

The final estimates from a meta-analysis are often graphically reported in the form of a ‘Forest Plot’.

In the hypothetical Forest Plot shown in figure 1 , for each study, a horizontal line indicates the standardised Effect Size estimate (the rectangular box in the centre of each line) and 95% CI for the risk ratio used. For each of the studies, drug X reduced the risk of death (the risk ratio is less than 1.0). However, the first study was larger than the other two (the size of the boxes represents the relative weights calculated by the meta-analysis). Perhaps, because of this, the estimates for the two smaller studies were not statistically significant (the lines emanating from their boxes include the value of 1). When all the three studies were combined in the meta-analysis, as represented by the diamond, we get a more precise estimate of the effect of the drug, where the diamond represents both the combined risk ratio estimate and the limits of the 95% CI.

  • Download figure
  • Open in new tab
  • Download powerpoint

Hypothetical Forest Plot

Relevance to practice and research

Many Evidence Based Nursing commentaries feature recently published systematic review and meta-analysis because they not only bring new insight or strength to recommendations about the most effective healthcare practices but they also identify where future research should be directed to bridge the gaps or limitations in current evidence. The strength of conclusions from meta-analysis largely depends on the quality of the data available for synthesis. This reflects the quality of individual studies and the systematic review. Meta-analysis does not magically resolve the problem of underpowered or poorly designed studies and clinicians can be frustrated to find that even when a meta-analysis has been conducted, all that the researchers can conclude is that the evidence is weak, there is uncertainty about the effects of treatment and that higher quality research is needed to better inform practice. This is still an important finding and can inform our practice and challenge us to fill the evidence gaps with better quality research in the future.

  • ↵ The Cochrane Library . http://www.thecochranelibrary.com/view/0/index.html (accessed 23 Oct 2012).
  • Davey Smith G
  • Davey Smoth G
  • Higgins JPT ,

Competing interests None.

Read the full text or download the PDF:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 April 2024

A systematic review and multivariate meta-analysis of the physical and mental health benefits of touch interventions

  • Julian Packheiser   ORCID: orcid.org/0000-0001-9805-6755 2   na1   nAff1 ,
  • Helena Hartmann 2 , 3 , 4   na1 ,
  • Kelly Fredriksen 2 ,
  • Valeria Gazzola   ORCID: orcid.org/0000-0003-0324-0619 2 ,
  • Christian Keysers   ORCID: orcid.org/0000-0002-2845-5467 2 &
  • Frédéric Michon   ORCID: orcid.org/0000-0003-1289-2133 2  

Nature Human Behaviour ( 2024 ) Cite this article

21k Accesses

1923 Altmetric

Metrics details

  • Human behaviour
  • Paediatric research
  • Randomized controlled trials

Receiving touch is of critical importance, as many studies have shown that touch promotes mental and physical well-being. We conducted a pre-registered (PROSPERO: CRD42022304281) systematic review and multilevel meta-analysis encompassing 137 studies in the meta-analysis and 75 additional studies in the systematic review ( n  = 12,966 individuals, search via Google Scholar, PubMed and Web of Science until 1 October 2022) to identify critical factors moderating touch intervention efficacy. Included studies always featured a touch versus no touch control intervention with diverse health outcomes as dependent variables. Risk of bias was assessed via small study, randomization, sequencing, performance and attrition bias. Touch interventions were especially effective in regulating cortisol levels (Hedges’ g  = 0.78, 95% confidence interval (CI) 0.24 to 1.31) and increasing weight (0.65, 95% CI 0.37 to 0.94) in newborns as well as in reducing pain (0.69, 95% CI 0.48 to 0.89), feelings of depression (0.59, 95% CI 0.40 to 0.78) and state (0.64, 95% CI 0.44 to 0.84) or trait anxiety (0.59, 95% CI 0.40 to 0.77) for adults. Comparing touch interventions involving objects or robots resulted in similar physical (0.56, 95% CI 0.24 to 0.88 versus 0.51, 95% CI 0.38 to 0.64) but lower mental health benefits (0.34, 95% CI 0.19 to 0.49 versus 0.58, 95% CI 0.43 to 0.73). Adult clinical cohorts profited more strongly in mental health domains compared with healthy individuals (0.63, 95% CI 0.46 to 0.80 versus 0.37, 95% CI 0.20 to 0.55). We found no difference in health benefits in adults when comparing touch applied by a familiar person or a health care professional (0.51, 95% CI 0.29 to 0.73 versus 0.50, 95% CI 0.38 to 0.61), but parental touch was more beneficial in newborns (0.69, 95% CI 0.50 to 0.88 versus 0.39, 95% CI 0.18 to 0.61). Small but significant small study bias and the impossibility to blind experimental conditions need to be considered. Leveraging factors that influence touch intervention efficacy will help maximize the benefits of future interventions and focus research in this field.

Similar content being viewed by others

research on meta analysis

Touching the social robot PARO reduces pain perception and salivary oxytocin levels

Nirit Geva, Florina Uzefovsky & Shelly Levy-Tzedek

research on meta analysis

The impact of mindfulness apps on psychological processes of change: a systematic review

Natalia Macrynikola, Zareen Mir, … John Torous

research on meta analysis

The why, who and how of social touch

Juulia T. Suvilehto, Asta Cekaite & India Morrison

The sense of touch has immense importance for many aspects of our life. It is the first of all the senses to develop in newborns 1 and the most direct experience of contact with our physical and social environment 2 . Complementing our own touch experience, we also regularly receive touch from others around us, for example, through consensual hugs, kisses or massages 3 .

The recent coronavirus pandemic has raised awareness regarding the need to better understand the effects that touch—and its reduction during social distancing—can have on our mental and physical well-being. The most common touch interventions, for example, massage for adults or kangaroo care for newborns, have been shown to have a wide range of both mental and physical health benefits, from facilitating growth and development to buffering against anxiety and stress, over the lifespan of humans and animals alike 4 . Despite the substantial weight this literature gives to support the benefits of touch, it is also characterized by a large variability in, for example, studied cohorts (adults, children, newborns and animals), type and duration of applied touch (for example, one-time hug versus repeated 60-min massages), measured health outcomes (ranging from physical health outcomes such as sleep and blood pressure to mental health outcomes such as depression or mood) and who actually applies the touch (for example, partner versus stranger).

A meaningful tool to make sense of this vast amount of research is through meta-analysis. While previous meta-analyses on this topic exist, they were limited in scope, focusing only on particular types of touch, cohorts or specific health outcomes (for example, refs. 5 , 6 ). Furthermore, despite best efforts, meaningful variables that moderate the efficacy of touch interventions could not yet be identified. However, understanding these variables is critical to tailor touch interventions and guide future research to navigate this diverse field with the ultimate aim of promoting well-being in the population.

In this Article, we describe a pre-registered, large-scale systematic review and multilevel, multivariate meta-analysis to address this need with quantitative evidence for (1) the effect of touch interventions on physical and mental health and (2) which moderators influence the efficacy of the intervention. In particular, we ask whether and how strongly health outcomes depend on the dynamics of the touching dyad (for example, humans or robots/objects, familiarity and touch directionality), demographics (for example, clinical status, age or sex), delivery means (for example, type of touch intervention or touched body part) and procedure (for example, duration or number of sessions). We did so separately for newborns and for children and adults, as the health outcomes in newborns differed substantially from those in the other age groups. Despite the focus of the analysis being on humans, it is widely known that many animal species benefit from touch interactions and that engaging in touch promotes their well-being as well 7 . Since animal models are essential for the investigation of the mechanisms underlying biological processes and for the development of therapeutic approaches, we accordingly included health benefits of touch interventions in non-human animals as part of our systematic review. However, this search yielded only a small number of studies, suggesting a lack of research in this domain, and as such, was insufficient to be included in the meta-analysis. We evaluate the identified animal studies and their findings in the discussion.

Touch interventions have a medium-sized effect

The pre-registration can be found at ref. 8 . The flowchart for data collection and extraction is depicted in Fig. 1 .

figure 1

Animal outcomes refer to outcomes measured in non-human species that were solely considered as part of a systematic review. Included languages were French, Dutch, German and English, but our search did not identify any articles in French, Dutch or German. MA, meta-analysis.

For adults, a total of n  = 2,841 and n  = 2,556 individuals in the touch and control groups, respectively, across 85 studies and 103 cohorts were included. The effect of touch overall was medium-sized ( t (102) = 9.74, P  < 0.001, Hedges’ g  = 0.52, 95% confidence interval (CI) 0.42 to 0.63; Fig. 2a ). For newborns, we could include 63 cohorts across 52 studies comprising a total of n  = 2,134 and n  = 2,086 newborns in the touch and control groups, respectively, with an overall effect almost identical to the older age group ( t (62) = 7.53, P  < 0.001, Hedges’ g  = 0.56, 95% CI 0.41 to 0.71; Fig. 2b ), suggesting that, despite distinct health outcomes, touch interventions show comparable effects across newborns and adults. Using these overall effect estimates, we conducted a power sensitivity analysis of all the included primary studies to investigate whether such effects could be reliably detected 9 . Sufficient power to detect such effect sizes was rare in individual studies, as investigated by firepower plots 10 (Supplementary Figs. 1 and 2 ). No individual effect size from either meta-analysis was overly influential (Cook’s D  < 0.06). The benefits were similar for mental and physical outcomes (mental versus physical; adults: t (101) = 0.79, P  = 0.432, Hedges’ g difference of −0.05, 95% CI −0.16 to 0.07, Fig. 2c ; newborns: t (61) = 1.08, P  = 0.284, Hedges’ g difference of −0.19, 95% CI −0.53 to 0.16, Fig. 2d ).

figure 2

a , Orchard plot illustrating the overall benefits across all health outcomes for adults/children across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , The same as a but for newborns across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating the results for physical versus mental health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating the results for physical versus mental health benefits across 172 in part dependent effect sizes from 52 studies and 63 cohorts. Each dot reflects a measured effect, and the number of effects ( k ) included in the analysis is depicted in the bottom left. Mean effects and 95% CIs are presented in the bottom right and are indicated by the central black dot (mean effect) and its error bars (95% CI). The heterogeneity Q statistic is presented in the top left. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). Note that the P values above the mean effects indicate whether an effect differed significantly from a zero effect. P values were not corrected for multiple comparisons. The dot size reflects the precision of each individual effect (larger indicates higher precision). Small-study bias for the overall effect was significant ( F test, two-sided test) in the adult meta-analysis ( F (1, 101) = 21.24, P  < 0.001; Supplementary Fig. 3 ) as well as in the newborn meta-analysis ( F (1, 61) = 5.25, P  = 0.025; Supplementary Fig. 4 ).

Source data

On the basis of the overall effect of both meta-analyses as well as their median sample sizes, the minimum number of studies necessary for subgroup analyses to achieve 80% power was k  = 9 effects for adults and k  = 8 effects for newborns (Supplementary Figs. 5 and 6 ). Assessing specific health outcomes with sufficient power in more detail in adults (Fig. 3a ) revealed smaller benefits to sleep and heart rate parameters, moderate benefits to positive and negative affect, diastolic blood and systolic blood pressure, mobility and reductions of the stress hormone cortisol and larger benefits to trait and state anxiety, depression, fatigue and pain. Post hoc tests revealed stronger benefits for pain, state anxiety, depression and trait anxiety compared with respiratory, sleep and heart rate parameters (see Fig. 3 for all post hoc comparisons). Reductions in pain and state anxiety were increased compared with reductions in negative affect ( t (83) = 2.54, P  = 0.013, Hedges’ g difference of 0.31, 95% CI 0.07 to 0.55; t (83) = 2.31, P  = 0.024, Hedges’ g difference of 0.27, 95% CI 0.03 to 0.51). Benefits to pain symptoms were higher compared with benefits to positive affect ( t (83) = 2.22, P  = 0.030, Hedges’ g difference of 0.29, 95% CI 0.04 to 0.54). Finally, touch resulted in larger benefits to cortisol release compared with heart rate parameters ( t (83) = 2.30, P  = 0.024, Hedges’ g difference of 0.26, 95% CI 0.04–0.48).

figure 3

a , b , Health outcomes in adults analysed across 405 in part dependent effect sizes from 79 studies and 97 cohorts ( a ) and in newborns analysed across 105 in part dependent effect sizes from 46 studies and 56 cohorts ( b ). The type of health outcomes measured differed between adults and newborns and were thus analysed separately. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

In newborns, only physical health effects offered sufficient data for further analysis. We found no benefits for digestion and heart rate parameters. All other health outcomes (cortisol, liver enzymes, respiration, temperature regulation and weight gain) showed medium to large effects (Fig. 3b ). We found no significant differences among any specific health outcomes.

Non-human touch and skin-to-skin contact

In some situations, a fellow human is not readily available to provide affective touch, raising the question of the efficacy of touch delivered by objects and robots 11 . Overall, we found humans engaging in touch with other humans or objects to have medium-sized health benefits in adults, without significant differences ( t (99) = 1.05, P  = 0.295, Hedges’ g difference of 0.12, 95% CI −0.11 to 0.35; Fig. 4a ). However, differentiating physical versus mental health benefits revealed similar benefits for human and object touch on physical health outcomes, but larger benefits on mental outcomes when humans were touched by humans ( t (97) = 2.32, P  = 0.022, Hedges’ g difference of 0.24, 95% CI 0.04 to 0.44; Fig. 4b ). It must be noted that touching with an object still showed a significant effect (see Supplementary Fig. 7 for the corresponding orchard plot).

figure 4

a , Forest plot comparing humans versus objects touching a human on health outcomes overall across 467 in part dependent effect sizes from 85 studies and 101 cohorts. b , The same as a but separately for mental versus physical health outcomes across 467 in part dependent effect sizes from 85 studies and 101 cohorts. c , Results with the removal of all object studies, leaving 406 in part dependent effect sizes from 71 studies and 88 cohorts to identify whether missing skin-to-skin contact is the relevant mediator of higher mental health effects in human–human interactions. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

We considered the possibility that this effect was due to missing skin-to-skin contact in human–object interactions. Thus, we investigated human–human interactions with and without skin-to-skin contact (Fig. 4c ). In line with the hypothesis that skin-to-skin contact is highly relevant, we again found stronger mental health benefits in the presence of skin-to-skin contact that however did not achieve nominal significance ( t (69) = 1.95, P  = 0.055, Hedges’ g difference of 0.41, 95% CI −0.00 to 0.82), possibly because skin-to-skin contact was rarely absent in human–human interactions, leading to a decrease in power of this analysis. Results for skin-to-skin contact as an overall moderator can be found in Supplementary Fig. 8 .

Influences of type of touch

The large majority of touch interventions comprised massage therapy in adults and kangaroo care in newborns (see Supplementary Table 1 for a complete list of interventions across studies). However, comparing the different types of touch explored across studies did not reveal significant differences in effect sizes based on touch type, be it on overall health benefits (adults: t (101) = 0.11, P  = 0.916, Hedges’ g difference of 0.02, 95% CI −0.32 to 0.29; Fig. 5a ) or comparing different forms of touch separately for physical (massage therapy versus other forms: t (99) = 0.99, P  = 0.325, Hedges’ g difference 0.16, 95% CI −0.15 to 0.47) or for mental health benefits (massage therapy versus other forms: t (99) = 0.75, P  = 0.458, Hedges’ g difference of 0.13, 95% CI −0.22 to 0.48) in adults (Fig. 5c ; see Supplementary Fig. 9 for the corresponding orchard plot). A similar picture emerged for physical health effects in newborns (massage therapy versus kangaroo care: t (58) = 0.94, P  = 0.353, Hedges’ g difference of 0.15, 95% CI −0.17 to 0.47; massage therapy versus other forms: t (58) = 0.56, P  = 0.577, Hedges’ g difference of 0.13, 95% CI −0.34 to 0.60; kangaroo care versus other forms: t (58) = 0.07, P  = 0.947, Hedges’ g difference of 0.02, 95% CI −0.46 to 0.50; Fig. 5d ; see also Supplementary Fig. 10 for the corresponding orchard plot). This suggests that touch types may be flexibly adapted to the setting of every touch intervention.

figure 5

a , Forest plot of health benefits comparing massage therapy versus other forms of touch in adult cohorts across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , Forest plot of health benefits comparing massage therapy, kangaroo care and other forms of touch for newborns across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating mental and physical health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating mental and physical health outcomes where possible across 164 in part dependent effect sizes from 51 studies and 62 cohorts. Note that an insufficient number of studies assessed mental health benefits of massage therapy or other forms of touch to be included. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

The role of clinical status

Most research on touch interventions has focused on clinical samples, but are benefits restricted to clinical cohorts? We found health benefits to be significant in clinical and healthy populations (Fig. 6 ), whether all outcomes are considered (Fig. 6a,b ) or physical and mental health outcomes are separated (Fig. 6c,d , see Supplementary Figs. 11 and 12 for the corresponding orchard plots). In adults, however, we found higher mental health benefits for clinical populations compared with healthy ones (Fig. 6c ; t (99) = 2.11, P  = 0.037, Hedges’ g difference of 0.25, 95% CI 0.01 to 0.49).

figure 6

a , Health benefits for clinical cohorts of adults versus healthy cohorts of adults across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , The same as a but for newborn cohorts across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating mental versus physical health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating mental versus physical health benefits across 172 in part dependent effect sizes from 52 studies and 63 cohorts. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test).The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

A more detailed analysis of specific clinical conditions in adults revealed positive mental and physical health benefits for almost all assessed clinical disorders. Differences between disorders were not found, with the exception of increased effectiveness of touch interventions in neurological disorders (Supplementary Fig. 13 ).

Familiarity in the touching dyad and intervention location

Touch interventions can be performed either by familiar touchers (partners, family members or friends) or by unfamiliar touchers (health care professionals). In adults, we did not find an impact of familiarity of the toucher ( t (99) = 0.12, P  = 0.905, Hedges’ g difference of 0.02, 95% CI −0.27 to 0.24; Fig. 7a ; see Supplementary Fig. 14 for the corresponding orchard plot). Similarly, investigating the impact on mental and physical health benefits specifically, no significant differences could be detected, suggesting that familiarity is irrelevant in adults. In contrast, touch applied to newborns by their parents (almost all studies only included touch by the mother) was significantly more beneficial compared with unfamiliar touch ( t (60) = 2.09, P  = 0.041, Hedges’ g difference of 0.30, 95% CI 0.01 to 0.59) (Fig. 7b ; see Supplementary Fig. 15 for the corresponding orchard plot). Investigating mental and physical health benefits specifically revealed no significant differences. Familiarity with the location in which the touch was applied (familiar being, for example, the participants’ home) did not influence the efficacy of touch interventions (Supplementary Fig. 16 ).

figure 7

a , Health benefits for being touched by a familiar (for example, partner, family member or friend) versus unfamiliar toucher (health care professional) across 463 in part dependent effect sizes from 83 studies and 101 cohorts. b , The same as a but for newborn cohorts across 171 in part dependent effect sizes from 51 studies and 62 cohorts. c , The same as a but separating mental versus physical health benefits across 463 in part dependent effect sizes from 83 studies and 101 cohorts. d , The same as b but separating mental versus physical health benefits across 169 in part dependent effect sizes from 51 studies and 62 cohorts. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

Frequency and duration of touch interventions

How often and for how long should touch be delivered? For adults, the median touch duration across studies was 20 min and the median number of touch interventions was four sessions with an average time interval of 2.3 days between each session. For newborns, the median touch duration across studies was 17.5 min and the median number of touch interventions was seven sessions with an average time interval of 1.3 days between each session.

Delivering more touch sessions increased benefits in adults, whether overall ( t (101) = 4.90, P  < 0.001, Hedges’ g  = 0.02, 95% CI 0.01 to 0.03), physical ( t (81) = 3.07, P  = 0.003, Hedges’ g  = 0.02, 95% CI 0.01–0.03) or mental benefits ( t (72) = 5.43, P  < 0.001, Hedges’ g  = 0.02, 95% CI 0.01–0.03) were measured (Fig. 8a ). A closer look at specific outcomes for which sufficient data were available revealed that positive associations between the number of sessions and outcomes were found for trait anxiety ( t (12) = 7.90, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.02–0.04), depression ( t (20) = 10.69, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.03–0.04) and pain ( t (37) = 3.65, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.02–0.05), indicating a need for repeated sessions to improve these adverse health outcomes. Neither increasing the number of sessions for newborns nor increasing the duration of touch per session in adults or newborns increased health benefits, be they physical or mental (Fig. 8b–d ). For continuous moderators in adults, we also looked at specific health outcomes as sufficient data were generally available for further analysis. Surprisingly, we found significant negative associations between touch duration and reductions of cortisol ( t (24) = 2.71, P  = 0.012, Hedges’ g  = −0.01, 95% CI −0.01 to −0.00) and heart rate parameters ( t (21) = 2.35, P  = 0.029, Hedges’ g  = −0.01, 95% CI −0.02 to −0.00).

figure 8

a , Meta-regression analysis examining the association between the number of sessions applied and the effect size in adults, either on overall health benefits (left, 469 in part dependent effect sizes from 85 studies and 103 cohorts) or for physical (middle, 245 in part dependent effect sizes from 69 studies and 83 cohorts) or mental benefits (right, 224 in part dependent effect sizes from 60 studies and 74 cohorts) separately. b , The same as a for newborns (overall: 150 in part dependent effect sizes from 46 studies and 53 cohorts; physical health: 127 in part dependent effect sizes from 44 studies and 51 cohorts; mental health: 21 in part dependent effect sizes from 11 studies and 12 cohorts). c , d the same as a ( c ) and b ( d ) but for the duration of the individual sessions. For adults, 449 in part dependent effect sizes across 80 studies and 96 cohorts were included in the overall analysis. The analysis of physical health benefits included 240 in part dependent effect sizes across 67 studies and 80 cohorts, and the analysis of mental health benefits included 209 in part dependent effect sizes from 56 studies and 69 cohorts. For newborns, 145 in part dependent effect sizes across 45 studies and 52 cohorts were included in the overall analysis. The analysis of physical health benefits included 122 in part dependent effect sizes across 43 studies and 50 cohorts, and the analysis of mental health benefits included 21 in part dependent effect sizes from 11 studies and 12 cohorts. Each dot represents an effect size. Its size indicates the precision of the study (larger indicates better). Overall effects of moderator impact were assessed via an F test (two-sided test). The P values in each panel represent the result of a regression analysis testing the hypothesis that the slope of the relationship is equal to zero. P values are not corrected for multiple testing. The shaded area around the regression line represents the 95% CI.

Demographic influences of sex and age

We used the ratio between women and men in the single-study samples as a proxy for sex-specific effects. Sex ratios were heavily skewed towards larger numbers of women in each cohort (median 83% women), and we could not find significant associations between sex ratio and overall ( t (62) = 0.08, P  = 0.935, Hedges’ g  = 0.00, 95% CI −0.00 to 0.01), mental ( t (43) = 0.55, P  = 0.588, Hedges’ g  = 0.00, 95% CI −0.00 to 0.01) or physical health benefits ( t (51) = 0.15, P  = 0.882, Hedges’ g  = −0.00, 95% CI −0.01 to 0.01). For specific outcomes that could be further analysed, we found a significant positive association of sex ratio with reductions in cortisol secretion ( t (18) = 2.31, P  = 0.033, Hedges’ g  = 0.01, 95% CI 0.00 to 0.01) suggesting stronger benefits in women. In contrast to adults, sex ratios were balanced in samples of newborns (median 53% girls). For newborns, there was no significant association with overall ( t (36) = 0.77, P  = 0.447, Hedges’ g  = −0.01, 95% CI −0.02 to 0.01) and physical health benefits of touch ( t (35) = 0.93, P  = 0.359, Hedges’ g  = −0.01, 95% CI −0.02 to 0.01). Mental health benefits did not provide sufficient data for further analysis.

The median age in the adult meta-analysis was 42.6 years (s.d. 21.16 years, range 4.5–88.4 years). There was no association between age and the overall ( t (73) = 0.35, P  = 0.727, Hedges’ g = 0.00, 95% CI −0.01 to 0.01), mental ( t (53) = 0.94, P  = 0.353, Hedges’ g  = 0.01, 95% CI −0.01 to 0.02) and physical health benefits of touch ( t (60) = 0.16, P  = 0.870, Hedges’ g  = 0.00, 95% CI −0.01 to 0.01). Looking at specific health outcomes, we found significant positive associations between mean age and improved positive affect ( t (10) = 2.54, P  = 0.030, Hedges’ g  = 0.01, 95% CI 0.00 to 0.02) as well as systolic blood pressure ( t (11) = 2.39, P  = 0.036, Hedges’ g  = 0.02, 95% CI 0.00 to 0.04).

A list of touched body parts can be found in Supplementary Table 1 . For the touched body part, we found significantly higher health benefits for head touch compared with arm touch ( t (40) = 2.14, P  = 0.039, Hedges’ g difference of 0.78, 95% CI 0.07 to 1.49) and torso touch ( t (40) = 2.23, P  = 0.031; Hedges’ g difference of 0.84, 95% CI 0.10 to 1.58; Supplementary Fig. 17 ). Touching the arm resulted in lower mental health compared with physical health benefits ( t (37) = 2.29, P  = 0.028, Hedges’ g difference of −0.35, 95% CI −0.65 to −0.05). Furthermore, we found a significantly increased physical health benefit when the head was touched as opposed to the torso ( t (37) = 2.10, P  = 0.043, Hedges’ g difference of 0.96, 95% CI 0.06 to 1.86). Thus, head touch such as a face or scalp massage could be especially beneficial.

Directionality

In adults, we tested whether a uni- or bidirectional application of touch mattered. The large majority of touch was applied unidirectionally ( k  = 442 of 469 effects). Unidirectional touch had higher health benefits ( t (101) = 2.17, P  = 0.032, Hedges’ g difference of 0.30, 95% CI 0.03 to 0.58) than bidirectional touch. Specifically, mental health benefits were higher in unidirectional touch ( t (99) = 2.33, P  = 0.022, Hedges’ g difference of 0.46, 95% CI 0.06 to 0.66).

Study location

For adults, we found significantly stronger health benefits of touch in South American compared with North American cohorts ( t (95) = 2.03, P  = 0.046, Hedges’ g difference of 0.37, 95% CI 0.01 to 0.73) and European cohorts ( t (95) = 2.22, P  = 0.029, Hedges’ g difference of 0.36, 95% CI 0.04 to 0.68). For newborns, we found weaker effects in North American cohorts compared to Asian ( t (55) = 2.28, P  = 0.026, Hedges’ g difference of −0.37, 95% CI −0.69 to −0.05) and European cohorts ( t (55) = 2.36, P  = 0.022, Hedges’ g difference of −0.40, 95% CI −0.74 to −0.06). Investigating the interaction with mental and physical health benefits did not reveal any effects of study location in both meta-analyses (Supplementary Fig. 18 ).

Systematic review of studies without effect sizes

All studies where effect size data could not be obtained or that did not meet the meta-analysis inclusion criteria can be found on the OSF project 12 in the file ‘Study_lists_final_revised.xlsx’ (sheet ‘Studies_without_effect_sizes’). Specific reasons for exclusion are furthermore documented in Supplementary Table 2 . For human health outcomes assessed across 56 studies and n  = 2,438 individuals, interventions mostly comprised massage therapy ( k  = 86 health outcomes) and kangaroo care ( k  = 33 health outcomes). For datasets where no effect size could be computed, 90.0% of mental health and 84.3% of physical health parameters were positively impacted by touch. Positive impact of touch did not differ between types of touch interventions. These results match well with the observations of the meta-analysis of a highly positive benefit of touch overall, irrespective of whether a massage or any other intervention is applied.

We also assessed health outcomes in animals across 19 studies and n  = 911 subjects. Most research was conducted in rodents. Animals that received touch were rats (ten studies, k  = 16 health outcomes), mice (four studies, k  = 7 health outcomes), macaques (two studies, k  = 3 health outcomes), cats (one study, k  = 3 health outcomes), lambs (one study, k  = 2 health outcomes) and coral reef fish (one study, k  = 1 health outcome). Touch interventions mostly comprised stroking ( k  = 13 health outcomes) and tickling ( k  = 10 health outcomes). For animal studies, 71.4% of effects showed benefits to mental health-like parameters and 81.8% showed positive physical health effects. We thus found strong evidence that touch interventions, which were mostly conducted by humans (16 studies with human touch versus 3 studies with object touch), had positive health effects in animal species as well.

The key aim of the present study was twofold: (1) to provide an estimate of the effect size of touch interventions and (2) to disambiguate moderating factors to potentially tailor future interventions more precisely. Overall, touch interventions were beneficial for both physical and mental health, with a medium effect size. Our work illustrates that touch interventions are best suited for reducing pain, depression and anxiety in adults and children as well as for increasing weight gain in newborns. These findings are in line with previous meta-analyses on this topic, supporting their conclusions and their robustness to the addition of more datasets. One limitation of previous meta-analyses is that they focused on specific health outcomes or populations, despite primary studies often reporting effects on multiple health parameters simultaneously (for example, ref. 13 focusing on neck and shoulder pain and ref. 14 focusing on massage therapy in preterms). To our knowledge, only ref. 5 provides a multivariate picture for a large number of dependent variables. However, this study analysed their data in separate random effects models that did not account for multivariate reporting nor for the multilevel structure of the data, as such approaches have only become available recently. Thus, in addition to adding a substantial amount of new data, our statistical approach provides a more accurate depiction of effect size estimates. Additionally, our study investigated a variety of moderating effects that did not reach significance (for example, sex ratio, mean age or intervention duration) or were not considered (for example, the benefits of robot or object touch) in previous meta-analyses in relation to touch intervention efficacy 5 , probably because of the small number of studies with information on these moderators in the past. Owing to our large-scale approach, we reached high statistical power for many moderator analyses. Finally, previous meta-analyses on this topic exclusively focused on massage therapy in adults or kangaroo care in newborns 15 , leaving out a large number of interventions that are being carried out in research as well as in everyday life to improve well-being. Incorporating these studies into our study, we found that, in general, both massages and other types of touch, such as gentle touch, stroking or kangaroo care, showed similar health benefits.

While it seems to be less critical which touch intervention is applied, the frequency of interventions seems to matter. More sessions were positively associated with the improvement of trait outcomes such as depression and anxiety but also pain reductions in adults. In contrast to session number, increasing the duration of individual sessions did not improve health effects. In fact, we found some indications of negative relationships in adults for cortisol and blood pressure. This could be due to habituating effects of touch on the sympathetic nervous system and hypothalamic–pituitary–adrenal axis, ultimately resulting in diminished effects with longer exposure, or decreased pleasantness ratings of affective touch with increasing duration 16 . For newborns, we could not support previous notions that the duration of the touch intervention is linked to benefits in weight gain 17 . Thus, an ideal intervention protocol does not seem to have to be excessively long. It should be noted that very few interventions lasted less than 5 min, and it therefore remains unclear whether very short interventions have the same effect.

A critical issue highlighted in the pandemic was the lack of touch due to social restrictions 18 . To accommodate the need for touch in individuals with small social networks (for example, institutionalized or isolated individuals), touch interventions using objects/robots have been explored in the past (for a review, see ref. 11 ). We show here that touch interactions outside of the human–human domain are beneficial for mental and physical health outcomes. Importantly, object/robot touch was not as effective in improving mental health as human-applied touch. A sub-analysis of missing skin-to-skin contact among humans indicated that mental health effects of touch might be mediated by the presence of skin-to-skin contact. Thus, it seems profitable to include skin-to-skin contact in future touch interventions, in line with previous findings in newborns 19 . In robots, recent advancements in synthetic skin 20 should be investigated further in this regard. It should be noted that, although we did not observe significant differences in physical health benefits between human–human and human–object touch, the variability of effect sizes was higher in human–object touch. The conditions enabling object or robot interactions to improve well-being should therefore be explored in more detail in the future.

Touch was beneficial for both healthy and clinical cohorts. These data are critical as most previous meta-analytic research has focused on individuals diagnosed with clinical disorders (for example, ref. 6 ). For mental health outcomes, we found larger effects in clinical cohorts. A possible reason could relate to increased touch wanting 21 in patients. For example, loneliness often co-occurs with chronic illnesses 22 , which are linked to depressed mood and feelings of anxiety 23 . Touch can be used to counteract this negative development 24 , 25 . In adults and children, knowing the toucher did not influence health benefits. In contrast, familiarity affected overall health benefits in newborns, with parental touch being more beneficial than touch applied by medical staff. Previous studies have suggested that early skin-to-skin contact and exposure to maternal odour is critical for a newborn’s ability to adapt to a new environment 26 , supporting the notion that parental care is difficult to substitute in this time period.

With respect to age-related effects, our data further suggest that increasing age was associated with a higher benefit through touch for systolic blood pressure. These findings could potentially be attributed to higher basal blood pressure 27 with increasing age, allowing for a stronger modulation of this parameter. For sex differences, our study provides some evidence that there are differences between women and men with respect to health benefits of touch. Overall, research on sex differences in touch processing is relatively sparse (but see refs. 28 , 29 ). Our results suggest that buffering effects against physiological stress are stronger in women. This is in line with increased buffering effects of hugs in women compared with men 30 . The female-biased primary research in adults, however, begs for more research in men or non-binary individuals. Unfortunately, our study could not dive deeper into this topic as health benefits broken down by sex or gender were almost never provided. Recent research has demonstrated that sensory pleasantness is affected by sex and that this also interacts with the familiarity of the other person in the touching dyad 29 , 31 . In general, contextual factors such as sex and gender or the relationship of the touching dyad, differences in cultural background or internal states such as stress have been demonstrated to be highly influential in the perception of affective touch and are thus relevant to maximizing the pleasantness and ultimately the health benefits of touch interactions 32 , 33 , 34 . As a positive personal relationship within the touching dyad is paramount to induce positive health effects, future research applying robot touch to promote well-being should therefore not only explore synthetic skin options but also focus on improving robots as social agents that form a close relationship with the person receiving the touch 35 .

As part of the systematic review, we also assessed the effects of touch interventions in non-human animals. Mimicking the results of the meta-analysis in humans, beneficial effects of touch in animals were comparably strong for mental health-like and physical health outcomes. This may inform interventions to promote animal welfare in the context of animal experiments 36 , farming 37 and pets 38 . While most studies investigated effects in rodents, which are mostly used as laboratory animals, these results probably transfer to livestock and common pets as well. Indeed, touch was beneficial in lambs, fish and cats 39 , 40 , 41 . The positive impact of human touch in rodents also allows for future mechanistic studies in animal models to investigate how interventions such as tickling or stroking modulate hormonal and neuronal responses to touch in the brain. Furthermore, the commonly proposed oxytocin hypothesis can be causally investigated in these animal models through, for example, optogenetic or chemogenetic techniques 42 . We believe that such translational approaches will further help in optimizing future interventions in humans by uncovering the underlying mechanisms and brain circuits involved in touch.

Our results offer many promising avenues to improve future touch interventions, but they also need to be discussed in light of their limitations. While the majority of findings showed robust health benefits of touch interventions across moderators when compared with a null effect, post hoc tests of, for example, familiarity effects in newborns or mental health benefit differences between human and object touch only barely reached significance. Since we computed a large number of statistical tests in the present study, there is a risk that these results are false positives. We hope that researchers in this field are stimulated by these intriguing results and target these questions by primary research through controlled experimental designs within a well-powered study. Furthermore, the presence of small-study bias in both meta-analyses is indicative that the effect size estimates presented here might be overestimated as null results are often unpublished. We want to stress however that this bias is probably reduced by the multivariate reporting of primary studies. Most studies that reported on multiple health outcomes only showed significant findings for one or two among many. Thus, the multivariate nature of primary research in this field allowed us to include many non-significant findings in the present study. Another limitation pertains to the fact that we only included articles in languages mostly spoken in Western countries. As a large body of evidence comes from Asian countries, it could be that primary research was published in languages other than specified in the inclusion criteria. Thus, despite the large and inclusive nature of our study, some studies could have been missed regardless. Another factor that could not be accounted for in our meta-analysis was that an important prerequisite for touch to be beneficial is its perceived pleasantness. The level of pleasantness associated with being touched is modulated by several parameters 34 including cultural acceptability 43 , perceived humanness 44 or a need for touch 45 , which could explain the observed differences for certain moderators, such as human–human versus robot–human interaction. Moreover, the fact that secondary categorical moderators could not be investigated with respect to specific health outcomes, owing to the lack of data points, limits the specificity of our conclusions in this regard. It thus remains unclear whether, for example, a decreased mental health benefit in the absence of skin-to-skin contact is linked mostly to decreased anxiolytic effects, changes in positive/negative affect or something else. Since these health outcomes are however highly correlated 46 , it is likely that such effects are driven by multiple health outcomes. Similarly, it is important to note that our conclusions mainly refer to outcomes measured close to the touch intervention as we did not include long-term outcomes. Finally, it needs to be noted that blinding towards the experimental condition is essentially impossible in touch interventions. Although we compared the touch intervention with other interventions, such as relaxation therapy, as control whenever possible, contributions of placebo effects cannot be ruled out.

In conclusion, we show clear evidence that touch interventions are beneficial across a large number of both physical and mental health outcomes, for both healthy and clinical cohorts, and for all ages. These benefits, while influenced in their magnitude by study cohorts and intervention characteristics, were robustly present, promoting the conclusion that touch interventions can be systematically employed across the population to preserve and improve our health.

Open science practices

All data and code are accessible in the corresponding OSF project 12 . The systematic review was registered on PROSPERO (CRD42022304281) before the start of data collection. We deviated from the pre-registered plan as follows:

Deviation 1: During our initial screening for the systematic review, we were confronted with a large number of potential health outcomes to look at. This observation of multivariate outcomes led us to register an amendment during data collection (but before any effect size or moderator screening). In doing so, we aimed to additionally extract meta-analytic effects for a more quantitative assessment of our review question that can account for multivariate data reporting and dependencies of effects within the same study. Furthermore, as we noted a severe lack of studies with respect to health outcomes for animals during the inclusion assessment for the systematic review, we decided that the meta-analysis would only focus on outcomes that could be meaningfully analysed on the meta-analytic level and therefore only included health outcomes of human participants.

Deviation 2: In the pre-registration, we did not explicitly exclude non-randomized trials. Since an explicit use of non-randomization for group allocation significantly increases the risk of bias, we decided to exclude them a posteriori from data analysis.

Deviation 3: In the pre-registration, we outlined a tertiary moderator level, namely benefits of touch application versus touch reception. This level was ignored since no included study specifically investigated the benefits of touch application by itself.

Deviation 4: In the pre-registration, we suggested using the RoBMA function 47 to provide a Bayesian framework that allows for a more accurate assessment of publication bias beyond small-study bias. Unfortunately, neither multilevel nor multivariate data structures are supported by the RoBMA function, to our knowledge. For this reason, we did not further pursue this analysis, as the hierarchical nature of the data would not be accounted for.

Deviation 5: Beyond the pre-registered inclusion and exclusion criteria, we also excluded dissertations owing to their lack of peer review.

Deviation 6: In the pre-registration, we stated to investigate the impact of sex of the person applying the touch. This moderator was not further analysed, as this information was rarely given and the individuals applying the touch were almost exclusively women (7 males, 24 mixed and 85 females in studies on adults/children; 3 males, 17 mixed and 80 females in studied on newborns).

Deviation 7: The time span of the touch intervention as assessed by subtracting the final day of the intervention from the first day was not investigated further owing to its very high correlation with the number of sessions ( r (461) = 0.81 in the adult meta-analysis, r (145) = 0.84 in the newborn meta-analysis).

Inclusion and exclusion criteria

To be included in the systematic review, studies had to investigate the relationship between at least one health outcome (physical and/or mental) in humans or animals and a touch intervention, include explicit physical touch by another human, animal or object as part of an intervention and include an experimental and control condition/group that are differentiated by touch alone. Of note, as a result of this selection process, no animal-to-animal touch intervention study was included, as they never featured a proper no-touch control. Human touch was always explicit touch by a human (that is, no brushes or other tools), either with or without skin-to-skin contact. Regarding the included health outcomes, we aimed to be as broad as possible but excluded parameters such as neurophysiological responses or pleasantness ratings after touch application as they do not reflect health outcomes. All included studies in the meta-analysis and systematic review 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 190 , 191 , 192 , 193 , 194 , 195 , 196 , 197 , 198 , 199 , 200 , 201 , 202 , 203 , 204 , 205 , 206 , 207 , 208 , 209 , 210 , 211 , 212 , 213 , 214 , 215 , 216 , 217 , 218 , 219 , 220 , 221 , 222 , 223 , 224 , 225 , 226 , 227 , 228 , 229 , 230 , 231 , 232 , 233 , 234 , 235 , 236 , 237 , 238 , 239 , 240 , 241 , 242 , 243 , 244 , 245 , 246 , 247 , 248 , 249 , 250 , 251 , 252 , 253 , 254 , 255 , 256 , 257 , 258 , 259 , 260 , 261 , 262 , 263 are listed in Supplementary Table 2 . All excluded studies are listed in Supplementary Table 3 , together with a reason for exclusion. We then applied a two-step process: First, we identified all potential health outcomes and extracted qualitative information on those outcomes (for example, direction of effect). Second, we extracted quantitative information from all possible outcomes (for example, effect sizes). The meta-analysis additionally required a between-subjects design (to clearly distinguish touch from no-touch effects and owing to missing information about the correlation between repeated measurements 264 ). Studies that explicitly did not apply a randomized protocol were excluded before further analysis to reduce risk of bias. The full study lists for excluded and included studies can be found in the OSF project 12 in the file ‘Study_lists_final_revised.xlsx’. In terms of the time frame, we conducted an open-start search of studies until 2022 and identified studies conducted between 1965 and 2022.

Data collection

We used Google Scholar, PubMed and Web of Science for our literature search, with no limitations regarding the publication date and using pre-specified search queries (see Supplementary Information for the exact keywords used). All procedures were in accordance with the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines 265 . Articles were assessed in French, Dutch, German or English. The above databases were searched from 2 December 2021 until 1 October 2022. Two independent coders evaluated each paper against the inclusion and exclusion criteria. Inconsistencies between coders were checked and resolved by J.P. and H.H. Studies excluded/included for the review and meta-analysis can be found on the OSF project.

Search queries

We used the following keywords to search the chosen databases. Agents (human versus animal versus object versus robot) and touch outcome (physical versus mental) were searched separately together with keywords searching for touch.

TOUCH: Touch OR Social OR Affective OR Contact OR Tactile interaction OR Hug OR Massage OR Embrace OR Kiss OR Cradling OR Stroking OR Haptic interaction OR tickling

AGENT: Object OR Robot OR human OR animal OR rodent OR primate

MENTAL OUTCOME: Health OR mood OR Depression OR Loneliness OR happiness OR life satisfaction OR Mental Disorder OR well-being OR welfare OR dementia OR psychological OR psychiatric OR anxiety OR Distress

PHYSICAL OUTCOME: Health OR Stress OR Pain OR cardiovascular health OR infection risk OR immune response OR blood pressure OR heart rate

Data extraction and preparation

Data extraction began on 10 October 2022 and was concluded on 25 February 2023. J.P. and H.H. oversaw the data collection process, and checked and resolved all inconsistencies between coders.

Health benefits of touch were always coded by positive summary effects, whereas adverse health effects of touch were represented by negative summary effects. If multiple time points were measured for the same outcome on the same day after a single touch intervention, we extracted the peak effect size (in either the positive or negative direction). If the touch intervention occurred multiple times and health outcomes were assessed for each time point, we extracted data points separately. However, we only extracted immediate effects, as long-term effects not controlled through the experimental conditions could be due to influences other than the initial touch intervention. Measurements assessing long-term effects without explicit touch sessions in the breaks were excluded for the same reason. Common control groups for touch interventions comprised active (for example, relaxation therapy) as well as passive control groups (for example, standard medical care). In the case of multiple control groups, we always contrasted the touch group to the group that most closely matched the touch condition (for example, relaxation therapy was preferred over standard medical care). We extracted information from all moderators listed in the pre-registration (Supplementary Table 4 ). A list of included and excluded health outcomes is presented in Supplementary Table 5 . Authors of studies with possible effects but missing information to calculate those effects were contacted via email and asked to provide the missing data (response rate 35.7%).

After finalizing the list of included studies for the systematic review, we added columns for moderators and the coding schema for our meta-analysis per our updated registration. Then, each study was assessed for its eligibility in the meta-analysis by two independent coders (J.P., H.H., K.F. or F.M.). To this end, all coders followed an a priori specified procedure: First, the PDF was skimmed for possible effects to extract, and the study was excluded if no PDF was available or the study was in a language different from the ones specified in ‘ Data collection ’. Effects from studies that met the inclusion criteria were extracted from all studies listing descriptive values or statistical parameters to calculate effect sizes. A website 266 was used to convert descriptive and statistical values available in the included studies (means and standard deviations/standard errors/confidence intervals, sample sizes, F values, t values, t test P values or frequencies) into Cohen’s d , which were then converted in Hedges’ g . If only P value thresholds were reported (for example, P  < 0.01), we used this, most conservative, value as the P value to calculate the effect size (for example, P  = 0.01). If only the total sample size was given but that number was even and the participants were randomly assigned to each group, we assumed equal sample sizes for each group. If delta change scores (for example, pre- to post-touch intervention) were reported, we used those over post-touch only scores. In case frequencies were 0 when frequency tables were used to determine effect sizes, we used a value of 0.5 as a substitute to calculate the effect (the default setting in the ‘metafor’ function 267 ). From these data, Hedges’ g and its variance could be derived. Effect sizes were always computed between the experimental and the control group.

Statistical analysis and risk of bias assessment

Owing to the lack of identified studies, health benefits to animals were not included as part of the statistical analysis. One meta-analysis was performed for adults, adolescents and children, as outcomes were highly comparable. We refer to this meta-analysis as the adult meta-analysis, as children/adolescent cohorts were only targeted in a minority of studies. A separate meta-analysis was performed for newborns, as their health outcomes differed substantially from any other age group.

Data were analysed using R (version 4.2.2) with the ‘rma.mv’ function from the ‘metafor’ package 267 in a multistep, multivariate and multilevel fashion.

We calculated an overall effect of touch interventions across all studies, cohorts and health outcomes. To account for the hierarchical structure of the data, we used a multilevel structure with random effects at the study, cohort and effects level. Furthermore, we calculated the variance–covariance matrix of all data points to account for the dependencies of measured effects within each individual cohort and study. The variance–covariance matrix was calculated by default with an assumed correlation of effect sizes within each cohort of ρ  = 0.6. As ρ needed to be assumed, sensitivity analyses for all computed effect estimates were conducted using correlations between effects of 0, 0.2, 0.4 and 0.8. The results of these sensitivity analyses can be found in ref. 12 . No conclusion drawn in the present manuscript was altered by changing the level of ρ . The sensitivity analyses, however, showed that higher assumed correlations lead to more conservative effect size estimates (see Supplementary Figs. 19 and 20 for the adult and newborn meta-analyses, respectively), reducing the type I error risk in general 268 . In addition to these procedures, we used robust variance estimation with cluster-robust inference at the cohort level. This step is recommended to more accurately determine the confidence intervals in complex multivariate models 269 . The data distribution was assumed to be normal, but this was not formally tested.

To determine whether individual effects had a strong influence on our results, we calculated Cook’s distance D . Here, a threshold of D  > 0.5 was used to qualify a study as influential 270 . Heterogeneity in the present study was assessed using Cochran’s Q , which determines whether the extracted effect sizes estimate a common population effect size. Although the Q statistic in the ‘rma.mv’ function accounts for the hierarchical nature of the data, we also quantified the heterogeneity estimator σ ² for each random-effects level to provide a comprehensive overview of heterogeneity indicators. These indicators for all models can be found on the OSF project 12 in the Table ‘Model estimates’. To assess small study bias, we visually inspected the funnel plot and used the standard error as a moderator in the overarching meta-analyses.

Before any sub-group analysis, the overall effect size was used as input for power calculations. While such post hoc power calculations might be limited, we believe that a minimum number of effects to be included in subgroup analyses was necessary to allow for meaningful conclusions. Such medium effect sizes would also probably be the minimum effect sizes of interest for researchers as well as clinical practitioners. Power calculation for random-effects models further requires a sample size for each individual effect as well as an approximation of the expected heterogeneity between studies. For the sample size input, we used the median sample size in each of our studies. For heterogeneity, we assumed a value between medium and high levels of heterogeneity ( I ² = 62.5% 271 ), as moderator analyses typically aim at reducing heterogeneity overall. Subgroups were only further investigated if the number of observed effects achieved ~80% power under these circumstances, to allow for a more robust interpretation of the observed effects (see Supplementary Figs. 5 and 6 for the adult and newborn meta-analysis, respectively). In a next step, we investigated all pre-registered moderators for which sufficient power was detected. We first looked at our primary moderators (mental versus physical health) and how the effect sizes systematically varied as a function of our secondary moderators (for example, human–human or human–object touch, duration, skin-to-skin presence, etc.). We always included random slopes to allow for our moderators to vary with the random effects at our clustering variable, which is recommended in multilevel models to reduce false positives 272 . All statistical tests were performed two-sided. Significance of moderators was determined using omnibus F tests. Effect size differences between moderator levels and their confidence intervals were assessed via t tests.

Post hoc t tests were performed comparing mental and physical health benefits within each interacting moderator (for example, mental versus physical health benefits in cancer patients) and mental or physical health benefits across levels of the interacting moderator (for example, mental health benefits in cancer versus pain patients). The post hoc tests were not pre-registered. Data were visualized using forest plots and orchard plots 273 for categorical moderators and scatter plots for continuous moderators.

For a broad overview of prior work and their biases, risk of bias was assessed for all studies included in both meta-analyses and the systematic review. We assessed the risk of bias for the following parameters:

Bias from randomization, including whether a randomization procedure was performed, whether it was a between- or within-participant design and whether there were any baseline differences for demographic or dependent variables.

Sequence bias resulting from a lack of counterbalancing in within-subject designs.

Performance bias resulting from the participants or experiments not being blinded to the experimental conditions.

Attrition bias resulting from different dropout rates between experimental groups.

Note that four studies in the adult meta-analysis did not explicitly mention randomization as part of their protocol. However, since these studies never showed any baseline differences in all relevant variables (see ‘Risk of Bias’ table on the OSF project ) , we assumed that randomization was performed but not mentioned. Sequence bias was of no concern for studies for the meta-analysis since cross-over designs were excluded. It was, however, assessed for studies within the scope of the systematic review. Importantly, performance bias was always high in the adult/children meta-analysis, as blinding of the participants and experimenters to the experimental conditions was not possible owing to the nature of the intervention (touch versus no touch). For studies with newborns and animals, we assessed the performance bias as medium since neither newborns or animals are likely to be aware of being part of an experiment or specific group. An overview of the results is presented in Supplementary Fig. 21 , and the precise assessment for each study can be found on the OSF project 12 in the ‘Risk of Bias’ table.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data are available via Open Science Framework at https://doi.org/10.17605/OSF.IO/C8RVW (ref. 12 ). Source data are provided with this paper.

Code availability

All code is available via Open Science Framework at https://doi.org/10.17605/OSF.IO/C8RVW (ref. 12 ).

Fulkerson, M. The First Sense: a Philosophical Study of Human Touch (MIT Press, 2013).

Farroni, T., Della Longa, L. & Valori, I. The self-regulatory affective touch: a speculative framework for the development of executive functioning. Curr. Opin. Behav. Sci. 43 , 167–173 (2022).

Article   Google Scholar  

Ocklenburg, S. et al. Hugs and kisses—the role of motor preferences and emotional lateralization for hemispheric asymmetries in human social touch. Neurosci. Biobehav. Rev. 95 , 353–360 (2018).

Ardiel, E. L. & Rankin, C. H. The importance of touch in development. Paediatr. Child Health 15 , 153–156 (2010).

Article   PubMed   PubMed Central   Google Scholar  

Moyer, C. A., Rounds, J. & Hannum, J. W. A meta-analysis of massage therapy research. Psychol. Bull. 130 , 3–18 (2004).

Article   PubMed   Google Scholar  

Lee, S. H., Kim, J. Y., Yeo, S., Kim, S. H. & Lim, S. Meta-analysis of massage therapy on cancer pain. Integr. Cancer Ther. 14 , 297–304 (2015).

LaFollette, M. R., O’Haire, M. E., Cloutier, S. & Gaskill, B. N. A happier rat pack: the impacts of tickling pet store rats on human–animal interactions and rat welfare. Appl. Anim. Behav. Sci. 203 , 92–102 (2018).

Packheiser, J., Michon, F. Eva, C., Fredriksen, K. & Hartmann H. The physical and mental health benefits of social touch: a comparative systematic review and meta-analysis. PROSPERO https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022304281 (2023).

Lakens, D. Sample size justification. Collabra. Psychol. 8 , 33267 (2022).

Quintana, D. S. A guide for calculating study-level statistical power for meta-analyses. Adv. Meth. Pract. Psychol. Sci. https://doi.org/10.1177/25152459221147260 (2023).

Eckstein, M., Mamaev, I., Ditzen, B. & Sailer, U. Calming effects of touch in human, animal, and robotic interaction—scientific state-of-the-art and technical advances. Front. Psychiatry 11 , 555058 (2020).

Packheiser, J. et al. The physical and mental health benefits of affective touch: a comparative systematic review and multivariate meta-analysis. Open Science Framework https://doi.org/10.17605/OSF.IO/C8RVW (2023).

Kong, L. J. et al. Massage therapy for neck and shoulder pain: a systematic review and meta-analysis. Evid. Based Complement. Altern. Med. 2013 , 613279 (2013).

Wang, L., He, J. L. & Zhang, X. H. The efficacy of massage on preterm infants: a meta-analysis. Am. J. Perinatol. 30 , 731–738 (2013).

Field, T. Massage therapy research review. Complement. Ther. Clin. Pract. 24 , 19–31 (2016).

Bendas, J., Ree, A., Pabel, L., Sailer, U. & Croy, I. Dynamics of affective habituation to touch differ on the group and individual level. Neuroscience 464 , 44–52 (2021).

Article   CAS   PubMed   Google Scholar  

Charpak, N., Montealegre‐Pomar, A. & Bohorquez, A. Systematic review and meta‐analysis suggest that the duration of Kangaroo mother care has a direct impact on neonatal growth. Acta Paediatr. 110 , 45–59 (2021).

Packheiser, J. et al. A comparison of hugging frequency and its association with momentary mood before and during COVID-19 using ecological momentary assessment. Health Commun. https://doi.org/10.1080/10410236.2023.2198058 (2023).

Whitelaw, A., Heisterkamp, G., Sleath, K., Acolet, D. & Richards, M. Skin to skin contact for very low birthweight infants and their mothers. Arch. Dis. Child. 63 , 1377–1381 (1988).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Yogeswaran, N. et al. New materials and advances in making electronic skin for interactive robots. Adv. Robot. 29 , 1359–1373 (2015).

Durkin, J., Jackson, D. & Usher, K. Touch in times of COVID‐19: touch hunger hurts. J. Clin. Nurs. https://doi.org/10.1111/jocn.15488 (2021).

Rokach, A., Lechcier-Kimel, R. & Safarov, A. Loneliness of people with physical disabilities. Soc. Behav. Personal. Int. J. 34 , 681–700 (2006).

Palgi, Y. et al. The loneliness pandemic: loneliness and other concomitants of depression, anxiety and their comorbidity during the COVID-19 outbreak. J. Affect. Disord. 275 , 109–111 (2020).

Heatley-Tejada, A., Dunbar, R. I. M. & Montero, M. Physical contact and loneliness: being touched reduces perceptions of loneliness. Adapt. Hum. Behav. Physiol. 6 , 292–306 (2020).

Article   CAS   Google Scholar  

Packheiser, J. et al. The association of embracing with daily mood and general life satisfaction: an ecological momentary assessment study. J. Nonverbal Behav. 46 , 519–536 (2022).

Porter, R. The biological significance of skin-to-skin contact and maternal odours. Acta Paediatr. 93 , 1560–1562 (2007).

Hawkley, L. C., Masi, C. M., Berry, J. D. & Cacioppo, J. T. Loneliness is a unique predictor of age-related differences in systolic blood pressure. Psychol. Aging 21 , 152–164 (2006).

Russo, V., Ottaviani, C. & Spitoni, G. F. Affective touch: a meta-analysis on sex differences. Neurosci. Biobehav. Rev. 108 , 445–452 (2020).

Schirmer, A. et al. Understanding sex differences in affective touch: sensory pleasantness, social comfort, and precursive experiences. Physiol. Behav. 250 , 113797 (2022).

Berretz, G. et al. Romantic partner embraces reduce cortisol release after acute stress induction in women but not in men. PLoS ONE 17 , e0266887 (2022).

Gazzola, V. et al. Primary somatosensory cortex discriminates affective significance in social touch. Proc. Natl Acad. Sci. USA 109 , E1657–E1666 (2012).

Sorokowska, A. et al. Affective interpersonal touch in close relationships: a cross-cultural perspective. Personal. Soc. Psychol. Bull. 47 , 1705–1721 (2021).

Ravaja, N., Harjunen, V., Ahmed, I., Jacucci, G. & Spapé, M. M. Feeling touched: emotional modulation of somatosensory potentials to interpersonal touch. Sci. Rep. 7 , 40504 (2017).

Saarinen, A., Harjunen, V., Jasinskaja-Lahti, I., Jääskeläinen, I. P. & Ravaja, N. Social touch experience in different contexts: a review. Neurosci. Biobehav. Rev. 131 , 360–372 (2021).

Huisman, G. Social touch technology: a survey of haptic technology for social touch. IEEE Trans. Haptics 10 , 391–408 (2017).

Lewejohann, L., Schwabe, K., Häger, C. & Jirkof, P. Impulse for animal welfare outside the experiment. Lab. Anim. https://doi.org/10.17169/REFUBIUM-26765 (2020).

Sørensen, J. T., Sandøe, P. & Halberg, N. Animal welfare as one among several values to be considered at farm level: the idea of an ethical account for livestock farming. Acta Agric. Scand. A 51 , 11–16 (2001).

Google Scholar  

Verga, M. & Michelazzi, M. Companion animal welfare and possible implications on the human–pet relationship. Ital. J. Anim. Sci. 8 , 231–240 (2009).

Coulon, M. et al. Do lambs perceive regular human stroking as pleasant? Behavior and heart rate variability analyses. PLoS ONE 10 , e0118617 (2015).

Soares, M. C., Oliveira, R. F., Ros, A. F. H., Grutter, A. S. & Bshary, R. Tactile stimulation lowers stress in fish. Nat. Commun. 2 , 534 (2011).

Gourkow, N., Hamon, S. C. & Phillips, C. J. C. Effect of gentle stroking and vocalization on behaviour, mucosal immunity and upper respiratory disease in anxious shelter cats. Prev. Vet. Med. 117 , 266–275 (2014).

Oliveira, V. E. et al. Oxytocin and vasopressin within the ventral and dorsal lateral septum modulate aggression in female rats. Nat. Commun. 12 , 2900 (2021).

Burleson, M. H., Roberts, N. A., Coon, D. W. & Soto, J. A. Perceived cultural acceptability and comfort with affectionate touch: differences between Mexican Americans and European Americans. J. Soc. Personal. Relatsh. 36 , 1000–1022 (2019).

Wijaya, M. et al. The human ‘feel’ of touch contributes to its perceived pleasantness. J. Exp. Psychol. Hum. Percept. Perform. 46 , 155–171 (2020).

Golaya, S. Touch-hunger: an unexplored consequence of the COVID-19 pandemic. Indian J. Psychol. Med. 43 , 362–363 (2021).

Ng, T. W. H., Sorensen, K. L., Zhang, Y. & Yim, F. H. K. Anger, anxiety, depression, and negative affect: convergent or divergent? J. Vocat. Behav. 110 , 186–202 (2019).

Maier, M., Bartoš, F. & Wagenmakers, E.-J. Robust Bayesian meta-analysis: addressing publication bias with model-averaging. Psychol. Methods 28 , 107–122 (2022).

Ahles, T. A. et al. Massage therapy for patients undergoing autologous bone marrow transplantation. J. Pain. Symptom Manag. 18 , 157–163 (1999).

Albert, N. M. et al. A randomized trial of massage therapy after heart surgery. Heart Lung 38 , 480–490 (2009).

Ang, J. Y. et al. A randomized placebo-controlled trial of massage therapy on the immune system of preterm infants. Pediatrics 130 , e1549–e1558 (2012).

Arditi, H., Feldman, R. & Eidelman, A. I. Effects of human contact and vagal regulation on pain reactivity and visual attention in newborns. Dev. Psychobiol. 48 , 561–573 (2006).

Arora, J., Kumar, A. & Ramji, S. Effect of oil massage on growth and neurobehavior in very low birth weight preterm neonates. Indian Pediatr. 42 , 1092–1100 (2005).

PubMed   Google Scholar  

Asadollahi, M., Jabraeili, M., Mahallei, M., Asgari Jafarabadi, M. & Ebrahimi, S. Effects of gentle human touch and field massage on urine cortisol level in premature infants: a randomized, controlled clinical trial. J. Caring Sci. 5 , 187–194 (2016).

Basiri-Moghadam, M., Basiri-Moghadam, K., Kianmehr, M. & Jani, S. The effect of massage on neonatal jaundice in stable preterm newborn infants: a randomized controlled trial. J. Pak. Med. Assoc. 65 , 602–606 (2015).

Bauer, B. A. et al. Effect of massage therapy on pain, anxiety, and tension after cardiac surgery: a randomized study. Complement. Ther. Clin. Pract. 16 , 70–75 (2010).

Beijers, R., Cillessen, L. & Zijlmans, M. A. C. An experimental study on mother-infant skin-to-skin contact in full-terms. Infant Behav. Dev. 43 , 58–65 (2016).

Bennett, S. et al. Acute effects of traditional Thai massage on cortisol levels, arterial blood pressure and stress perception in academic stress condition: a single blind randomised controlled trial. J. Bodyw. Mov. Therapies 20 , 286–292 (2016).

Bergman, N., Linley, L. & Fawcus, S. Randomized controlled trial of skin-to-skin contact from birth versus conventional incubator for physiological stabilization in 1200- to 2199-gram newborns. Acta Paediatr. 93 , 779–785 (2004).

Bigelow, A., Power, M., MacLellan‐Peters, J., Alex, M. & McDonald, C. Effect of mother/infant skin‐to‐skin contact on postpartum depressive symptoms and maternal physiological stress. J. Obstet. Gynecol. Neonatal Nurs. 41 , 369–382 (2012).

Billhult, A., Bergbom, I. & Stener-Victorin, E. Massage relieves nausea in women with breast cancer who are undergoing chemotherapy. J. Altern. Complement. Med. 13 , 53–57 (2007).

Billhult, A., Lindholm, C., Gunnarsson, R. & Stener-Victorin, E. The effect of massage on cellular immunity, endocrine and psychological factors in women with breast cancer—a randomized controlled clinical trial. Auton. Neurosci. 140 , 88–95 (2008).

Braun, L. A. et al. Massage therapy for cardiac surgery patients—a randomized trial. J. Thorac. Cardiovasc. Surg. 144 , 1453–1459 (2012).

Cabibihan, J.-J. & Chauhan, S. S. Physiological responses to affective tele-touch during induced emotional stimuli. IEEE Trans. Affect. Comput. 8 , 108–118 (2017).

Campeau, M.-P. et al. Impact of massage therapy on anxiety levels in patients undergoing radiation therapy: randomized controlled trial. J. Soc. Integr. Oncol. 5 , 133–138 (2007).

Can, Ş. & Kaya, H. The effects of yakson or gentle human touch training given to mothers with preterm babies on attachment levels and the responses of the baby: a randomized controlled trial. Health Care Women Int. 43 , 479–498 (2021).

Carfoot, S., Williamson, P. & Dickson, R. A randomised controlled trial in the north of England examining the effects of skin-to-skin care on breast feeding. Midwifery 21 , 71–79 (2005).

Castral, T. C., Warnock, F., Leite, A. M., Haas, V. J. & Scochi, C. G. S. The effects of skin-to-skin contact during acute pain in preterm newborns. Eur. J. Pain. 12 , 464–471 (2008).

Cattaneo, A. et al. Kangaroo mother care for low birthweight infants: a randomized controlled trial in different settings. Acta Paediatr. 87 , 976–985 (1998).

Charpak, N., Ruiz-Peláez, J. G. & Charpak, Y. Rey-Martinez kangaroo mother program: an alternative way of caring for low birth weight infants? One year mortality in a two cohort study. Pediatrics 94 , 804–810 (1994).

Chermont, A. G., Falcão, L. F. M., de Souza Silva, E. H. L., de Cássia Xavier Balda, R. & Guinsburg, R. Skin-to-skin contact and/or oral 25% dextrose for procedural pain relief for term newborn infants. Pediatrics 124 , e1101–e1107 (2009).

Chi Luong, K., Long Nguyen, T., Huynh Thi, D. H., Carrara, H. P. O. & Bergman, N. J. Newly born low birthweight infants stabilise better in skin-to-skin contact than when separated from their mothers: a randomised controlled trial. Acta Paediatr. 105 , 381–390 (2016).

Cho, E.-S. et al. The effects of kangaroo care in the neonatal intensive care unit on the physiological functions of preterm infants, maternal–infant attachment, and maternal stress. J. Pediatr. Nurs. 31 , 430–438 (2016).

Choi, H. et al. The effects of massage therapy on physical growth and gastrointestinal function in premature infants: a pilot study. J. Child Health Care 20 , 394–404 (2016).

Choudhary, M. et al. To study the effect of Kangaroo mother care on pain response in preterm neonates and to determine the behavioral and physiological responses to painful stimuli in preterm neonates: a study from western Rajasthan. J. Matern. Fetal Neonatal Med. 29 , 826–831 (2016).

Christensson, K. et al. Temperature, metabolic adaptation and crying in healthy full-term newborns cared for skin-to-skin or in a cot. Acta Paediatr. 81 , 488–493 (1992).

Cloutier, S. & Newberry, R. C. Use of a conditioning technique to reduce stress associated with repeated intra-peritoneal injections in laboratory rats. Appl. Anim. Behav. Sci. 112 , 158–173 (2008).

Cloutier, S., Wahl, K., Baker, C. & Newberry, R. C. The social buffering effect of playful handling on responses to repeated intraperitoneal injections in laboratory rats. J. Am. Assoc. Lab. Anim. Sci. 53 , 168–173 (2014).

CAS   PubMed   PubMed Central   Google Scholar  

Cloutier, S., Wahl, K. L., Panksepp, J. & Newberry, R. C. Playful handling of laboratory rats is more beneficial when applied before than after routine injections. Appl. Anim. Behav. Sci. 164 , 81–90 (2015).

Cong, X. et al. Effects of skin-to-skin contact on autonomic pain responses in preterm infants. J. Pain. 13 , 636–645 (2012).

Cong, X., Ludington-Hoe, S. M., McCain, G. & Fu, P. Kangaroo care modifies preterm infant heart rate variability in response to heel stick pain: pilot study. Early Hum. Dev. 85 , 561–567 (2009).

Cong, X., Ludington-Hoe, S. M. & Walsh, S. Randomized crossover trial of kangaroo care to reduce biobehavioral pain responses in preterm infants: a pilot study. Biol. Res. Nurs. 13 , 204–216 (2011).

Costa, R. et al. Tactile stimulation of adult rats modulates hormonal responses, depression-like behaviors, and memory impairment induced by chronic mild stress: role of angiotensin II. Behav. Brain Res. 379 , 112250 (2020).

Cutshall, S. M. et al. Effect of massage therapy on pain, anxiety, and tension in cardiac surgical patients: a pilot study. Complement. Ther. Clin. Pract. 16 , 92–95 (2010).

Dalili, H., Sheikhi, S., Shariat, M. & Haghnazarian, E. Effects of baby massage on neonatal jaundice in healthy Iranian infants: a pilot study. Infant Behav. Dev. 42 , 22–26 (2016).

Diego, M. A., Field, T. & Hernandez-Reif, M. Vagal activity, gastric motility, and weight gain in massaged preterm neonates. J. Pediatr. 147 , 50–55 (2005).

Diego, M. A., Field, T. & Hernandez-Reif, M. Temperature increases in preterm infants during massage therapy. Infant Behav. Dev. 31 , 149–152 (2008).

Diego, M. A. et al. Preterm infant massage elicits consistent increases in vagal activity and gastric motility that are associated with greater weight gain. Acta Paediatr. 96 , 1588–1591 (2007).

Diego, M. A. et al. Spinal cord patients benefit from massage therapy. Int. J. Neurosci. 112 , 133–142 (2002).

Diego, M. A. et al. Aggressive adolescents benefit from massage therapy. Adolescence 37 , 597–607 (2002).

Diego, M. A. et al. HIV adolescents show improved immune function following massage therapy. Int. J. Neurosci. 106 , 35–45 (2001).

Dieter, J. N. I., Field, T., Hernandez-Reif, M., Emory, E. K. & Redzepi, M. Stable preterm infants gain more weight and sleep less after five days of massage therapy. J. Pediatr. Psychol. 28 , 403–411 (2003).

Ditzen, B. et al. Effects of different kinds of couple interaction on cortisol and heart rate responses to stress in women. Psychoneuroendocrinology 32 , 565–574 (2007).

Dreisoerner, A. et al. Self-soothing touch and being hugged reduce cortisol responses to stress: a randomized controlled trial on stress, physical touch, and social identity. Compr. Psychoneuroendocrinol. 8 , 100091 (2021).

Eaton, M., Mitchell-Bonair, I. L. & Friedmann, E. The effect of touch on nutritional intake of chronic organic brain syndrome patients. J. Gerontol. 41 , 611–616 (1986).

Edens, J. L., Larkin, K. T. & Abel, J. L. The effect of social support and physical touch on cardiovascular reactions to mental stress. J. Psychosom. Res. 36 , 371–382 (1992).

El-Farrash, R. A. et al. Longer duration of kangaroo care improves neurobehavioral performance and feeding in preterm infants: a randomized controlled trial. Pediatr. Res. 87 , 683–688 (2020).

Erlandsson, K., Dsilna, A., Fagerberg, I. & Christensson, K. Skin-to-skin care with the father after cesarean birth and its effect on newborn crying and prefeeding behavior. Birth 34 , 105–114 (2007).

Escalona, A., Field, T., Singer-Strunck, R., Cullen, C. & Hartshorn, K. Brief report: improvements in the behavior of children with autism following massage therapy. J. Autism Dev. Disord. 31 , 513–516 (2001).

Fattah, M. A. & Hamdy, B. Pulmonary functions of children with asthma improve following massage therapy. J. Altern. Complement. Med. 17 , 1065–1068 (2011).

Feldman, R. & Eidelman, A. I. Skin-to-skin contact (kangaroo care) accelerates autonomic and neurobehavioural maturation in preterm infants. Dev. Med. Child Neurol. 45 , 274–281 (2003).

Feldman, R., Eidelman, A. I., Sirota, L. & Weller, A. Comparison of skin-to-skin (kangaroo) and traditional care: parenting outcomes and preterm infant development. Pediatrics 110 , 16–26 (2002).

Feldman, R., Singer, M. & Zagoory, O. Touch attenuates infants’ physiological reactivity to stress. Dev. Sci. 13 , 271–278 (2010).

Feldman, R., Weller, A., Sirota, L. & Eidelman, A. I. Testing a family intervention hypothesis: the contribution of mother–infant skin-to-skin contact (kangaroo care) to family interaction, proximity, and touch. J. Fam. Psychol. 17 , 94–107 (2003).

Ferber, S. G. et al. Massage therapy by mothers and trained professionals enhances weight gain in preterm infants. Early Hum. Dev. 67 , 37–45 (2002).

Ferber, S. G. & Makhoul, I. R. The effect of skin-to-skin contact (kangaroo care) shortly after birth on the neurobehavioral responses of the term newborn: a randomized, controlled trial. Pediatrics 113 , 858–865 (2004).

Ferreira, A. M. & Bergamasco, N. H. P. Behavioral analysis of preterm neonates included in a tactile and kinesthetic stimulation program during hospitalization. Rev. Bras. Fisioter. 14 , 141–148 (2010).

Fidanza, F., Polimeni, E., Pierangeli, V. & Martini, M. A better touch: C-tactile fibers related activity is associated to pain reduction during temporal summation of second pain. J. Pain. 22 , 567–576 (2021).

Field, T. et al. Leukemia immune changes following massage therapy. J. Bodyw. Mov. Ther. 5 , 271–274 (2001).

Field, T. et al. Benefits of combining massage therapy with group interpersonal psychotherapy in prenatally depressed women. J. Bodyw. Mov. Ther. 13 , 297–303 (2009).

Field, T., Delage, J. & Hernandez-Reif, M. Movement and massage therapy reduce fibromyalgia pain. J. Bodyw. Mov. Ther. 7 , 49–52 (2003).

Field, T. et al. Fibromyalgia pain and substance P decrease and sleep improves after massage therapy. J. Clin. Rheumatol. 8 , 72–76 (2002).

Field, T., Diego, M., Gonzalez, G. & Funk, C. G. Neck arthritis pain is reduced and range of motion is increased by massage therapy. Complement. Ther. Clin. Pract. 20 , 219–223 (2014).

Field, T., Diego, M., Hernandez-Reif, M., Deeds, O. & Figueiredo, B. Pregnancy massage reduces prematurity, low birthweight and postpartum depression. Infant Behav. Dev. 32 , 454–460 (2009).

Field, T. et al. Insulin and insulin-like growth factor-1 increased in preterm neonates following massage therapy. J. Dev. Behav. Pediatr. 29 , 463–466 (2008).

Field, T. et al. Yoga and massage therapy reduce prenatal depression and prematurity. J. Bodyw. Mov. Ther. 16 , 204–209 (2012).

Field, T., Diego, M., Hernandez-Reif, M., Schanberg, S. & Kuhn, C. Massage therapy effects on depressed pregnant women. J. Psychosom. Obstet. Gynecol. 25 , 115–122 (2004).

Field, T., Diego, M., Hernandez-Reif, M. & Shea, J. Hand arthritis pain is reduced by massage therapy. J. Bodyw. Mov. Ther. 11 , 21–24 (2007).

Field, T., Gonzalez, G., Diego, M. & Mindell, J. Mothers massaging their newborns with lotion versus no lotion enhances mothers’ and newborns’ sleep. Infant Behav. Dev. 45 , 31–37 (2016).

Field, T. et al. Children with asthma have improved pulmonary functions after massage therapy. J. Pediatr. 132 , 854–858 (1998).

Field, T., Hernandez-Reif, M., Diego, M. & Fraser, M. Lower back pain and sleep disturbance are reduced following massage therapy. J. Bodyw. Mov. Ther. 11 , 141–145 (2007).

Field, T. et al. Effects of sexual abuse are lessened by massage therapy. J. Bodyw. Mov. Ther. 1 , 65–69 (1997).

Field, T. et al. Pregnant women benefit from massage therapy. J. Psychosom. Obstet. Gynecol. 20 , 31–38 (1999).

Field, T. et al. Juvenilerheumatoid arthritis: benefits from massage therapy. J. Pediatr. Psychol. 22 , 607–617 (1997).

Field, T., Hernandez-Reif, M., Taylor, S., Quintino, O. & Burman, I. Labor pain is reduced by massage therapy. J. Psychosom. Obstet. Gynecol. 18 , 286–291 (1997).

Field, T. et al. Massage therapy reduces anxiety and enhances EEG pattern of alertness and math computations. Int. J. Neurosci. 86 , 197–205 (1996).

Field, T. et al. Brief report: autistic children’s attentiveness and responsivity improve after touch therapy. J. Autism Dev. Disord. 27 , 333–338 (1997).

Field, T. M. et al. Tactile/kinesthetic stimulation effects on preterm neonates. Pediatrics 77 , 654–658 (1986).

Field, T. et al. Massage reduces anxiety in child and adolescent psychiatric patients. J. Am. Acad. Child Adolesc. Psychiatry 31 , 125–131 (1992).

Field, T. et al. Burn injuries benefit from massage therapy. J. Burn Care Res. 19 , 241–244 (1998).

Filho, F. L. et al. Effect of maternal skin-to-skin contact on decolonization of methicillin-oxacillin-resistant Staphylococcus in neonatal intensive care units: a randomized controlled trial. BMC Pregnancy Childbirth https://doi.org/10.1186/s12884-015-0496-1 (2015).

Forward, J. B., Greuter, N. E., Crisall, S. J. & Lester, H. F. Effect of structured touch and guided imagery for pain and anxiety in elective joint replacement patients—a randomized controlled trial: M-TIJRP. Perm. J. 19 , 18–28 (2015).

Fraser, J. & Ross Kerr, J. Psychophysiological effects of back massage on elderly institutionalized patients. J. Adv. Nurs. 18 , 238–245 (1993).

Frey Law, L. A. et al. Massage reduces pain perception and hyperalgesia in experimental muscle pain: a randomized, controlled trial. J. Pain. 9 , 714–721 (2008).

Gao, H. et al. Effect of repeated kangaroo mother care on repeated procedural pain in preterm infants: a randomized controlled trial. Int. J. Nurs. Stud. 52 , 1157–1165 (2015).

Garner, B. et al. Pilot study evaluating the effect of massage therapy on stress, anxiety and aggression in a young adult psychiatric inpatient unit. Aust. N. Z. J. Psychiatry 42 , 414–422 (2008).

Gathwala, G., Singh, B. & Singh, J. Effect of kangaroo mother care on physical growth, breastfeeding and its acceptability. Trop. Dr. 40 , 199–202 (2010).

Geva, N., Uzefovsky, F. & Levy-Tzedek, S. Touching the social robot PARO reduces pain perception and salivary oxytocin levels. Sci. Rep. 10 , 9814 (2020).

Gitau, R. et al. Acute effects of maternal skin-to-skin contact and massage on saliva cortisol in preterm babies. J. Reprod. Infant Psychol. 20 , 83–88 (2002).

Givi, M. Durability of effect of massage therapy on blood pressure. Int. J. Prev. Med. 4 , 511–516 (2013).

PubMed   PubMed Central   Google Scholar  

Glover, V., Onozawa, K. & Hodgkinson, A. Benefits of infant massage for mothers with postnatal depression. Semin. Neonatol. 7 , 495–500 (2002).

Gonzalez, A. et al. Weight gain in preterm infants following parent-administered vimala massage: a randomized controlled trial. Am. J. Perinatol. 26 , 247–252 (2009).

Gray, L., Watt, L. & Blass, E. M. Skin-to-skin contact is analgesic in healthy newborns. Pediatrics 105 , e14 (2000).

Grewen, K. M., Anderson, B. J., Girdler, S. S. & Light, K. C. Warm partner contact is related to lower cardiovascular reactivity. Behav. Med. 29 , 123–130 (2003).

Groër, M. W., Hill, J., Wilkinson, J. E. & Stuart, A. Effects of separation and separation with supplemental stroking in BALB/c infant mice. Biol. Res. Nurs. 3 , 119–131 (2002).

Gürol, A. P., Polat, S. & Nuran Akçay, M. Itching, pain, and anxiety levels are reduced with massage therapy in burned adolescents. J. Burn Care Res. 31 , 429–432 (2010).

Haley, S. et al. Tactile/kinesthetic stimulation (TKS) increases tibial speed of sound and urinary osteocalcin (U-MidOC and unOC) in premature infants (29–32 weeks PMA). Bone 51 , 661–666 (2012).

Harris, M., Richards, K. C. & Grando, V. T. The effects of slow-stroke back massage on minutes of nighttime sleep in persons with dementia and sleep disturbances in the nursing home: a pilot study. J. Holist. Nurs. 30 , 255–263 (2012).

Hart, S. et al. Anorexia nervosa symptoms are reduced by massage therapy. Eat. Disord. 9 , 289–299 (2001).

Hattan, J., King, L. & Griffiths, P. The impact of foot massage and guided relaxation following cardiac surgery: a randomized controlled trial. Issues Innov. Nurs. Pract. 37 , 199–207 (2002).

Haynes, A. C. et al. A calming hug: design and validation of a tactile aid to ease anxiety. PLoS ONE 17 , e0259838 (2022).

Henricson, M., Ersson, A., Määttä, S., Segesten, K. & Berglund, A.-L. The outcome of tactile touch on stress parameters in intensive care: a randomized controlled trial. Complement. Ther. Clin. Pract. 14 , 244–254 (2008).

Hernandez-Reif, M., Diego, M. & Field, T. Preterm infants show reduced stress behaviors and activity after 5 days of massage therapy. Infant Behav. Dev. 30 , 557–561 (2007).

Hernandez-Reif, M., Dieter, J. N. I., Field, T., Swerdlow, B. & Diego, M. Migraine headaches are reduced by massage therapy. Int. J. Neurosci. 96 , 1–11 (1998).

Hernandez-Reif, M. et al. Natural killer cells and lymphocytes increase in women with breast cancer following massage therapy. Int. J. Neurosci. 115 , 495–510 (2005).

Hernandez-Reif, M. et al. Children with cystic fibrosis benefit from massage therapy. J. Pediatr. Psychol. 24 , 175–181 (1999).

Hernandez-Reif, M., Field, T., Krasnegor, J. & Theakston, H. Lower back pain is reduced and range of motion increased after massage therapy. Int. J. Neurosci. 106 , 131–145 (2001).

Hernandez-Reif, M. et al. High blood pressure and associated symptoms were reduced by massage therapy. J. Bodyw. Mov. Ther. 4 , 31–38 (2000).

Hernandez-Reif, M. et al. Parkinson’s disease symptoms are differentially affected by massage therapy vs. progressive muscle relaxation: a pilot study. J. Bodyw. Mov. Ther. 6 , 177–182 (2002).

Hernandez-Reif, M., Field, T. & Theakston, H. Multiple sclerosis patients benefit from massage therapy. J. Bodyw. Mov. Ther. 2 , 168–174 (1998).

Hernandez-Reif, M. et al. Breast cancer patients have improved immune and neuroendocrine functions following massage therapy. J. Psychosom. Res. 57 , 45–52 (2004).

Hertenstein, M. J. & Campos, J. J. Emotion regulation via maternal touch. Infancy 2 , 549–566 (2001).

Hinchcliffe, J. K., Mendl, M. & Robinson, E. S. J. Rat 50 kHz calls reflect graded tickling-induced positive emotion. Curr. Biol. 30 , R1034–R1035 (2020).

Hodgson, N. A. & Andersen, S. The clinical efficacy of reflexology in nursing home residents with dementia. J. Altern. Complement. Med. 14 , 269–275 (2008).

Hoffmann, L. & Krämer, N. C. The persuasive power of robot touch. Behavioral and evaluative consequences of non-functional touch from a robot. PLoS ONE 16 , e0249554 (2021).

Holst, S., Lund, I., Petersson, M. & Uvnäs-Moberg, K. Massage-like stroking influences plasma levels of gastrointestinal hormones, including insulin, and increases weight gain in male rats. Auton. Neurosci. 120 , 73–79 (2005).

Hori, M. et al. Tickling during adolescence alters fear-related and cognitive behaviors in rats after prolonged isolation. Physiol. Behav. 131 , 62–67 (2014).

Hori, M. et al. Effects of repeated tickling on conditioned fear and hormonal responses in socially isolated rats. Neurosci. Lett. 536 , 85–89 (2013).

Hucklenbruch-Rother, E. et al. Delivery room skin-to-skin contact in preterm infants affects long-term expression of stress response genes. Psychoneuroendocrinology 122 , 104883 (2020).

Im, H. & Kim, E. Effect of yakson and gentle human touch versus usual care on urine stress hormones and behaviors in preterm infants: a quasi-experimental study. Int. J. Nurs. Stud. 46 , 450–458 (2009).

Jain, S., Kumar, P. & McMillan, D. D. Prior leg massage decreases pain responses to heel stick in preterm babies. J. Paediatr. Child Health 42 , 505–508 (2006).

Jane, S.-W. et al. Effects of massage on pain, mood status, relaxation, and sleep in Taiwanese patients with metastatic bone pain: a randomized clinical trial. Pain 152 , 2432–2442 (2011).

Johnston, C. C. et al. Kangaroo mother care diminishes pain from heel lance in very preterm neonates: a crossover trial. BMC Pediatr. 8 , 13 (2008).

Johnston, C. C. et al. Kangaroo care is effective in diminishing pain response in preterm neonates. Arch. Pediatr. Adolesc. Med. 157 , 1084–1088 (2003).

Jung, M. J., Shin, B.-C., Kim, Y.-S., Shin, Y.-I. & Lee, M. S. Is there any difference in the effects of QI therapy (external QIGONG) with and without touching? a pilot study. Int. J. Neurosci. 116 , 1055–1064 (2006).

Kapoor, Y. & Orr, R. Effect of therapeutic massage on pain in patients with dementia. Dementia 16 , 119–125 (2017).

Karagozoglu, S. & Kahve, E. Effects of back massage on chemotherapy-related fatigue and anxiety: supportive care and therapeutic touch in cancer nursing. Appl. Nurs. Res. 26 , 210–217 (2013).

Karbasi, S. A., Golestan, M., Fallah, R., Golshan, M. & Dehghan, Z. Effect of body massage on increase of low birth weight neonates growth parameters: a randomized clinical trial. Iran. J. Reprod. Med. 11 , 583–588 (2013).

Kashaninia, Z., Sajedi, F., Rahgozar, M. & Noghabi, F. A. The effect of kangaroo care on behavioral responses to pain of an intramuscular injection in neonates . J. Pediatr. Nurs. 3 , 275–280 (2008).

Kelling, C., Pitaro, D. & Rantala, J. Good vibes: The impact of haptic patterns on stress levels. In Proc. 20th International Academic Mindtrek Conference 130–136 (Association for Computing Machinery, 2016).

Khilnani, S., Field, T., Hernandez-Reif, M. & Schanberg, S. Massage therapy improves mood and behavior of students with attention-deficit/hyperactivity disorder. Adolescence 38 , 623–638 (2003).

Kianmehr, M. et al. The effect of massage on serum bilirubin levels in term neonates with hyperbilirubinemia undergoing phototherapy. Nautilus 128 , 36–41 (2014).

Kim, I.-H., Kim, T.-Y. & Ko, Y.-W. The effect of a scalp massage on stress hormone, blood pressure, and heart rate of healthy female. J. Phys. Ther. Sci. 28 , 2703–2707 (2016).

Kim, M. A., Kim, S.-J. & Cho, H. Effects of tactile stimulation by fathers on physiological responses and paternal attachment in infants in the NICU: a pilot study. J. Child Health Care 21 , 36–45 (2017).

Kim, M. S., Sook Cho, K., Woo, H.-M. & Kim, J. H. Effects of hand massage on anxiety in cataract surgery using local anesthesia. J. Cataract Refr. Surg. 27 , 884–890 (2001).

Koole, S. L., Tjew A Sin, M. & Schneider, I. K. Embodied terror management: interpersonal touch alleviates existential concerns among individuals with low self-esteem. Psychol. Sci. 25 , 30–37 (2014).

Krohn, M. et al. Depression, mood, stress, and Th1/Th2 immune balance in primary breast cancer patients undergoing classical massage therapy. Support. Care Cancer 19 , 1303–1311 (2011).

Kuhn, C. et al. Tactile-kinesthetic stimulation effects sympathetic and adrenocortical function in preterm infants. J. Pediatr. 119 , 434–440 (1991).

Kumar, J. et al. Effect of oil massage on growth in preterm neonates less than 1800 g: a randomized control trial. Indian J. Pediatr. 80 , 465–469 (2013).

Lee, H.-K. The effects of infant massage on weight, height, and mother–infant interaction. J. Korean Acad. Nurs. 36 , 1331–1339 (2006).

Leivadi, S. et al. Massage therapy and relaxation effects on university dance students. J. Dance Med. Sci. 3 , 108–112 (1999).

Lindgren, L. et al. Touch massage: a pilot study of a complex intervention. Nurs. Crit. Care 18 , 269–277 (2013).

Lindgren, L. et al. Physiological responses to touch massage in healthy volunteers. Auton. Neurosci. Basic Clin. 158 , 105–110 (2010).

Listing, M. et al. Massage therapy reduces physical discomfort and improves mood disturbances in women with breast cancer. Psycho-Oncol. 18 , 1290–1299 (2009).

Ludington-Hoe, S. M., Cranston Anderson, G., Swinth, J. Y., Thompson, C. & Hadeed, A. J. Randomized controlled trial of kangaroo care: cardiorespiratory and thermal effects on healthy preterm infants. Neonatal Netw. 23 , 39–48 (2004).

Lund, I. et al. Corticotropin releasing factor in urine—a possible biochemical marker of fibromyalgia. Neurosci. Lett. 403 , 166–171 (2006).

Ma, Y.-K. et al. Lack of social touch alters anxiety-like and social behaviors in male mice. Stress 25 , 134–144 (2022).

Massaro, A. N., Hammad, T. A., Jazzo, B. & Aly, H. Massage with kinesthetic stimulation improves weight gain in preterm infants. J. Perinatol. 29 , 352–357 (2009).

Mathai, S., Fernandez, A., Mondkar, J. & Kanbur, W. Effects of tactile-kinesthetic stimulation in preterms–a controlled trial. Indian Pediatr. 38 , 1091–1098 (2001).

CAS   PubMed   Google Scholar  

Matsunaga, M. et al. Profiling of serum proteins influenced by warm partner contact in healthy couples. Neuroenocrinol. Lett. 30 , 227–236 (2009).

CAS   Google Scholar  

Mendes, E. W. & Procianoy, R. S. Massage therapy reduces hospital stay and occurrence of late-onset sepsis in very preterm neonates. J. Perinatol. 28 , 815–820 (2008).

Mirnia, K., Arshadi Bostanabad, M., Asadollahi, M. & Hamid Razzaghi, M. Paternal skin-to-skin care and its effect on cortisol levels of the infants. Iran. J. Pediatrics 27 , e8151 (2017).

Mitchell, A. J., Yates, C., Williams, K. & Hall, R. W. Effects of daily kangaroo care on cardiorespiratory parameters in preterm infants. J. Neonatal-Perinat. Med. 6 , 243–249 (2013).

Mitchinson, A. R. et al. Acute postoperative pain management using massage as an adjuvant therapy: a randomized trial. Arch. Surg. 142 , 1158–1167 (2007).

Modrcin-Talbott, M. A., Harrison, L. L., Groer, M. W. & Younger, M. S. The biobehavioral effects of gentle human touch on preterm infants. Nurs. Sci. Q. 16 , 60–67 (2003).

Mok, E. & Pang Woo, C. The effects of slow-stroke back massage on anxiety and shoulder pain in elderly stroke patients. Complement. Ther. Nurs. Midwifery 10 , 209–216 (2004).

Mokaberian, M., Noripour, S., Sheikh, M. & Mills, P. J. Examining the effectiveness of body massage on physical status of premature neonates and their mothers’ psychological status. Early Child Dev. Care 192 , 2311–2325 (2021).

Mori, H. et al. Effect of massage on blood flow and muscle fatigue following isometric lumbar exercise. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 10 , CR173–CR178 (2004).

Moyer-Mileur, L. J., Haley, S., Slater, H., Beachy, J. & Smith, S. L. Massage improves growth quality by decreasing body fat deposition in male preterm infants. J. Pediatr. 162 , 490–495 (2013).

Moyle, W. et al. Foot massage and physiological stress in people with dementia: a randomized controlled trial. J. Altern. Complement. Med. 20 , 305–311 (2014).

Muntsant, A., Shrivastava, K., Recasens, M. & Giménez-Llort, L. Severe perinatal hypoxic-ischemic brain injury induces long-term sensorimotor deficits, anxiety-like behaviors and cognitive impairment in a sex-, age- and task-selective manner in C57BL/6 mice but can be modulated by neonatal handling. Front. Behav. Neurosci. 13 , 7 (2019).

Negahban, H., Rezaie, S. & Goharpey, S. Massage therapy and exercise therapy in patients with multiple sclerosis: a randomized controlled pilot study. Clin. Rehabil. 27 , 1126–1136 (2013).

Nelson, D., Heitman, R. & Jennings, C. Effects of tactile stimulation on premature infant weight gain. J. Obstet. Gynecol. Neonatal Nurs. 15 , 262–267 (1986).

Griffin, J. W. Calculating statistical power for meta-analysis using metapower. Quant. Meth. Psychol . 17 , 24–39 (2021).

Nunes, G. S. et al. Massage therapy decreases pain and perceived fatigue after long-distance Ironman triathlon: a randomised trial. J. Physiother. 62 , 83–87 (2016).

Ohgi, S. et al. Comparison of kangaroo care and standard care: behavioral organization, development, and temperament in healthy, low-birth-weight infants through 1 year. J. Perinatol. 22 , 374–379 (2002).

O′Higgins, M., St. James Roberts, I. & Glover, V. Postnatal depression and mother and infant outcomes after infant massage. J. Affect. Disord. 109 , 189–192 (2008).

Okan, F., Ozdil, A., Bulbul, A., Yapici, Z. & Nuhoglu, A. Analgesic effects of skin-to-skin contact and breastfeeding in procedural pain in healthy term neonates. Ann. Trop. Paediatr. 30 , 119–128 (2010).

Oliveira, D. S., Hachul, H., Goto, V., Tufik, S. & Bittencourt, L. R. A. Effect of therapeutic massage on insomnia and climacteric symptoms in postmenopausal women. Climacteric 15 , 21–29 (2012).

Olsson, E., Ahlsén, G. & Eriksson, M. Skin-to-skin contact reduces near-infrared spectroscopy pain responses in premature infants during blood sampling. Acta Paediatr. 105 , 376–380 (2016).

Pauk, J., Kuhn, C. M., Field, T. M. & Schanberg, S. M. Positive effects of tactile versus kinesthetic or vestibular stimulation on neuroendocrine and ODC activity in maternally-deprived rat pups. Life Sci. 39 , 2081–2087 (1986).

Pinazo, D., Arahuete, L. & Correas, N. Hugging as a buffer against distal fear of death. Calid. Vida Salud 13 , 11–20 (2020).

Pope, M. H. et al. A prospective randomized three-week trial of spinal manipulation, transcutaneous muscle stimulation, massage and corset in the treatment of subacute low back pain. Spine 19 , 2571–2577 (1994).

Preyde, M. Effectiveness of massage therapy for subacute low-back pain: a randomized controlled trial. Can. Med. Assoc. J. 162 , 1815–1820 (2000).

Ramanathan, K., Paul, V. K., Deorari, A. K., Taneja, U. & George, G. Kangaroo mother care in very low birth weight infants. Indian J. Pediatr. 68 , 1019–1023 (2001).

Reddan, M. C., Young, H., Falkner, J., López-Solà, M. & Wager, T. D. Touch and social support influence interpersonal synchrony and pain. Soc. Cogn. Affect. Neurosci. 15 , 1064–1075 (2020).

Rodríguez-Mansilla, J. et al. The effects of ear acupressure, massage therapy and no therapy on symptoms of dementia: a randomized controlled trial. Clin. Rehabil. 29 , 683–693 (2015).

Rose, S. A., Schmidt, K., Riese, M. L. & Bridger, W. H. Effects of prematurity and early intervention on responsivity to tactual stimuli: a comparison of preterm and full-term infants. Child Dev. 51 , 416–425 (1980).

Scafidi, F. A. et al. Massage stimulates growth in preterm infants: a replication. Infant Behav. Dev. 13 , 167–188 (1990).

Scafidi, F. A. et al. Effects of tactile/kinesthetic stimulation on the clinical course and sleep/wake behavior of preterm neonates. Infant Behav. Dev. 9 , 91–105 (1986).

Scafidi, F. & Field, T. Massage therapy improves behavior in neonates born to HIV-positive mothers. J. Pediatr. Psychol. 21 , 889–897 (1996).

Scarr-Salapatek, S. & Williams, M. L. A stimulation program for low birth weight infants. Am. J. Public Health 62 , 662–667 (1972).

Serrano, B., Baños, R. M. & Botella, C. Virtual reality and stimulation of touch and smell for inducing relaxation: a randomized controlled trial. Comput. Hum. Behav. 55 , 1–8 (2016).

Seyyedrasooli, A., Valizadeh, L., Hosseini, M. B., Asgari Jafarabadi, M. & Mohammadzad, M. Effect of vimala massage on physiological jaundice in infants: a randomized controlled trial. J. Caring Sci. 3 , 165–173 (2014).

Sharpe, P. A., Williams, H. G., Granner, M. L. & Hussey, J. R. A randomised study of the effects of massage therapy compared to guided relaxation on well-being and stress perception among older adults. Complement. Therap. Med. 15 , 157–163 (2007).

Sherman, K. J., Cherkin, D. C., Hawkes, R. J., Miglioretti, D. L. & Deyo, R. A. Randomized trial of therapeutic massage for chronic neck pain. Clin. J. Pain. 25 , 233–238 (2009).

Shiloh, S., Sorek, G. & Terkel, J. Reduction of state-anxiety by petting animals in a controlled laboratory experiment. Anxiety, Stress Coping 16 , 387–395 (2003).

Shor-Posner, G. et al. Impact of a massage therapy clinical trial on immune status in young Dominican children infected with HIV-1. J. Altern. Complement. Med. 12 , 511–516 (2006).

Simpson, E. A. et al. Social touch alters newborn monkey behavior. Infant Behav. Dev. 57 , 101368 (2019).

Smith, S. L., Haley, S., Slater, H. & Moyer-Mileur, L. J. Heart rate variability during caregiving and sleep after massage therapy in preterm infants. Early Hum. Dev. 89 , 525–529 (2013).

Smith, S. L. et al. The effect of massage on heart rate variability in preterm infants. J. Perinatol. 33 , 59–64 (2013).

Solkoff, N. & Matuszak, D. Tactile stimulation and behavioral development among low-birthweight infants. Child Psychiatry Hum. Dev. 6 , 3337 (1975).

Srivastava, S., Gupta, A., Bhatnagar, A. & Dutta, S. Effect of very early skin to skin contact on success at breastfeeding and preventing early hypothermia in neonates. Indian J. Public Health 58 , 22–26 (2014).

Stringer, J., Swindell, R. & Dennis, M. Massage in patients undergoing intensive chemotherapy reduces serum cortisol and prolactin: massage in oncology patients reduces serum cortisol. Psycho-Oncol. 17 , 1024–1031 (2008).

Suman Rao, P. N., Udani, R. & Nanavati, R. Kangaroo mother care for low birth weight infants: a randomized controlled trial. Indian Pediatr. 45 , 17–23 (2008).

Sumioka, H. et al. A huggable device can reduce the stress of calling an unfamiliar person on the phone for individuals with ASD. PLoS ONE 16 , e0254675 (2021).

Sumioka, H., Nakae, A., Kanai, R. & Ishiguro, H. Huggable communication medium decreases cortisol levels. Sci. Rep. 3 , 3034 (2013).

Suzuki, M. et al. Physical and psychological effects of 6-week tactile massage on elderly patients with severe dementia. Am. J. Alzheimer’s Dis. Other Dement. 25 , 680–686 (2010).

Thomson, L. J. M., Ander, E. E., Menon, U., Lanceley, A. & Chatterjee, H. J. Quantitative evidence for wellbeing benefits from a heritage-in-health intervention with hospital patients. Int. J. Art. Ther. 17 , 63–79 (2012).

Triplett, J. L. & Arneson, S. W. The use of verbal and tactile comfort to alleviate distress in young hospitalized children. Res. Nurs. Health 2 , 17–23 (1979).

Walach, H., Güthlin, C. & König, M. Efficacy of massage therapy in chronic pain: a pragmatic randomized trial. J. Altern. Complement. Med. 9 , 837–846 (2003).

Walker, S. C. et al. C‐low threshold mechanoafferent targeted dynamic touch modulates stress resilience in rats exposed to chronic mild stress. Eur. J. Neurosci. 55 , 2925–2938 (2022).

Weinrich, S. P. & Weinrich, M. C. The effect of massage on pain in cancer patients. Appl. Nurs. Res. 3 , 140–145 (1990).

Wheeden, A. et al. Massage effects on cocaine-exposed preterm neonates. Dev. Behav. Pediatr. 14 , 318–322 (1993).

White, J. L. & Labarba, R. C. The effects of tactile and kinesthetic stimulation on neonatal development in the premature infant. Dev. Psychobiol. 9 , 569–577 (1976).

Wilkie, D. J. et al. Effects of massage on pain intensity, analgesics and quality of life in patients with cancer pain: a pilot study of a randomized clinical trial conducted within hospice care delivery. Hosp. J. 15 , 31–53 (2000).

Willemse, C. J. A. M., Toet, A. & van Erp, J. B. F. Affective and behavioral responses to robot-initiated social touch: toward understanding the opportunities and limitations of physical contact in human–robot interaction. Front. ICT 4 , 12 (2017).

Willemse, C. J. A. M. & van Erp, J. B. F. Social touch in human–robot interaction: robot-initiated touches can induce positive responses without extensive prior bonding. Int. J. Soc. Robot. 11 , 285–304 (2019).

Woods, D. L., Beck, C. & Sinha, K. The effect of therapeutic touch on behavioral symptoms and cortisol in persons with dementia. Res. Complement. Med. 16 , 181–189 (2009).

Yamaguchi, M., Sekine, T. & Shetty, V. A salivary cytokine panel discriminates moods states following a touch massage intervention. Int. J. Affect. Eng. 19 , 189–198 (2020).

Yamazaki, R. et al. Intimacy in phone conversations: anxiety reduction for Danish seniors with hugvie. Front. Psychol. 7 , 537 (2016).

Yang, M.-H. et al. Comparison of the efficacy of aroma-acupressure and aromatherapy for the treatment of dementia-associated agitation. BMC Complement. Altern. Med. 15 , 93 (2015).

Yates, C. C. et al. The effects of massage therapy to induce sleep in infants born preterm. Pediatr. Phys. Ther. 26 , 405–410 (2014).

Yu, H. et al. Social touch-like tactile stimulation activates a tachykinin 1-oxytocin pathway to promote social interactions. Neuron 110 , 1051–1067 (2022).

Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t -tests and ANOVAs. Front. Psychol. 4 , 863 (2013).

Page, M. J., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst. Rev. https://doi.org/10.1186/s13643-021-01626-4 (2021).

Wilson, D. B. Practical meta-analysis effect size calculator (Version 2023.11.27). https://campbellcollaboration.org/research-resources/effect-size-calculator.html (2023).

Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw https://doi.org/10.18637/jss.v036.i03 (2010).

Scammacca, N., Roberts, G. & Stuebing, K. K. Meta-analysis with complex research designs: dealing with dependence from multiple measures and multiple group comparisons. Rev. Educ. Res. 84 , 328–364 (2014).

Pustejovsky, J. E. & Tipton, E. Meta-analysis with robust variance estimation: expanding the range of working models. Prev. Sci. Off. J. Soc. Prev. Res. 23 , 425–438 (2022).

Cook, R. D. in International Encyclopedia of Statistical Science (ed. M. Lovric) S. 301–302 (Springer, 2011).

Higgins, J. P. T., Thompson, S. & Deeks, J. Measuring inconsistency in meta-analyses. BMJ https://doi.org/10.1136/bmj.327.7414.557 (2003).

Oberauer, K. The importance of random slopes in mixed models for Bayesian hypothesis testing. Psychol. Sci. 33 , 648–665 (2022).

Nakagawa, S. et al. The orchard plot: cultivating a forest plot for use in ecology, evolution, and beyond. Res. Synth. Methods 12 , 4–12 (2021).

Download references

Acknowledgements

We thank A. Frick and E. Chris for supporting the initial literature search and coding. We also thank A. Dreisoerner, T. Field, S. Koole, C. Kuhn, M. Henricson, L. Frey Law, J. Fraser, M. Cumella Reddan, and J. Stringer, who kindly responded to our data requests and provided additional information or data with respect to single studies. J.P. was supported by the German National Academy of Sciences Leopoldina (LPDS 2021-05). H.H. was supported by the Marietta-Blau scholarship of the Austrian Agency for Education and Internationalisation (OeAD) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project ID 422744262 – TRR 289). C.K. received funding from OCENW.XL21.XL21.069 and V.G. from the European Research Council (ERC) under European Union’s Horizon 2020 research and innovation programme, grant ‘HelpUS’ (758703) and from the Dutch Research Council (NWO) grant OCENW.XL21.XL21.069. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Open access funding provided by Ruhr-Universität Bochum.

Author information

Julian Packheiser

Present address: Social Neuroscience, Faculty of Medicine, Ruhr University Bochum, Bochum, Germany

These authors contributed equally: Julian Packheiser, Helena Hartmann.

Authors and Affiliations

Social Brain Lab, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Art and Sciences, Amsterdam, the Netherlands

Julian Packheiser, Helena Hartmann, Kelly Fredriksen, Valeria Gazzola, Christian Keysers & Frédéric Michon

Center for Translational and Behavioral Neuroscience, University Hospital Essen, Essen, Germany

Helena Hartmann

Clinical Neurosciences, Department for Neurology, University Hospital Essen, Essen, Germany

You can also search for this author in PubMed   Google Scholar

Contributions

J.P. contributed to conceptualization, methodology, formal analysis, investigation, data curation, writing the original draft, review and editing, visualization, supervision and project administration. HH contributed to conceptualization, methodology, formal analysis, investigation, data curation, writing the original draft, review and editing, visualization, supervision and project administration. K.F. contributed to investigation, data curation, and review and editing. C.K. and V.G. contributed to conceptualization, and review and editing. F.M. contributed to conceptualization, methodology, formal analysis, investigation, writing the original draft, and review and editing.

Corresponding author

Correspondence to Julian Packheiser .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Ville Harjunen, Rebecca Boehme and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–21 and Tables 1–4.

Reporting Summary

Peer review file, supplementary table 1.

List of studies included in and excluded from the meta-analyses/review.

Supplementary Table 2

PRISMA checklist, manuscript.

Supplementary Table 3

PRISMA checklist, abstract.

Source Data Fig. 2

Effect size/error (columns ‘Hedges_g’ and ‘variance’) information for each study/cohort/effect included in the analysis. Source Data Fig. 3 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘Outcome’) for each study/cohort/effect included in the analysis. Source Data Fig. 4 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (columns ‘dyad_type’ and ‘skin_to_skin’) for each study/cohort/effect included in the analysis. Source Data Fig. 5 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘touch_type’) for each study/cohort/effect included in the analysis. Source Data Fig. 6 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘clin_sample’) for each study/cohort/effect included in the analysis. Source Data Fig. 7 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘familiarity’) for each study/cohort/effect included in the analysis. Source Data Fig. 7 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (columns ‘touch_duration’ and ‘sessions’) for each study/cohort/effect included in the analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Packheiser, J., Hartmann, H., Fredriksen, K. et al. A systematic review and multivariate meta-analysis of the physical and mental health benefits of touch interventions. Nat Hum Behav (2024). https://doi.org/10.1038/s41562-024-01841-8

Download citation

Received : 16 August 2023

Accepted : 29 January 2024

Published : 08 April 2024

DOI : https://doi.org/10.1038/s41562-024-01841-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research on meta analysis

Meta-analysis of data

Meta-analysis

Reviewed by Psychology Today Staff

Meta-analysis is an objective examination of published data from many studies of the same research topic identified through a literature search. Through the use of rigorous statistical methods, it can reveal patterns hidden in individual studies and can yield conclusions that have a high degree of reliability. It is a method of analysis that is especially useful for gaining an understanding of complex phenomena when independent studies have produced conflicting findings.

Meta-analysis provides much of the underpinning for evidence-based medicine. It is particularly helpful in identifying risk factors for a disorder, diagnostic criteria, and the effects of treatments on specific populations of people, as well as quantifying the size of the effects. Meta-analysis is well-suited to understanding the complexities of human behavior.

  • How Does It Differ From Other Studies?
  • When Is It Used?
  • What Are Some Important Things Revealed by Meta-analysis?

Person performing a meta-analysis

There are well-established scientific criteria for selecting studies for meta-analysis. Usually, meta-analysis is conducted on the gold standard of scientific research—randomized, controlled, double-blind trials. In addition, published guidelines not only describe standards for the inclusion of studies to be analyzed but also rank the quality of different types of studies. For example, cohort studies are likely to provide more reliable information than case reports.

Through statistical methods applied to the original data collected in the included studies, meta-analysis can account for and overcome many differences in the way the studies were conducted, such as the populations studied, how interventions were administered, and what outcomes were assessed and how. Meta-analyses, and the questions they are attempting to answer, are typically specified and registered with a scientific organization, and, with the protocols and methods openly described and reviewed independently by outside investigators, the research process is highly transparent.

Meta-analysis of data

Meta-analysis is often used to validate observed phenomena, determine the conditions under which effects occur, and get enough clarity in clinical decision-making to indicate a course of therapeutic action when individual studies have produced disparate findings. In reviewing the aggregate results of well-controlled studies meeting criteria for inclusion, meta-analysis can also reveal which research questions, test conditions, and research methods yield the most reliable results, not only providing findings of immediate clinical utility but furthering science.

The technique can be used to answer social and behavioral questions large and small. For example, to clarify whether or not having more options makes it harder for people to settle on any one item, a meta-analysis of over 53 conflicting studies on the phenomenon was conducted. The meta-analysis revealed that choice overload exists—but only under certain conditions. You will have difficulty selecting a TV show to watch from the massive array of possibilities, for example, if the shows differ from each other in multiple ways or if you don’t have any strong preferences when you finally get to sit down in front of the TV.

Person analyzing results of meta-analysis

A meta-analysis conducted in 2000, for example, answered the question of whether physically attractive people have “better” personalities . Among other traits, they prove to be more extroverted and have more social skills than others. Another meta-analysis, in 2014, showed strong ties between physical attractiveness as rated by others and having good mental and physical health. The effects on such personality factors as extraversion are too small to reliably show up in individual studies but real enough to be detected in the aggregate number of study participants. Together, the studies validate hypotheses put forth by evolutionary psychologists that physical attractiveness is important in mate selection because it is a reliable cue of health and, likely, fertility.

research on meta analysis

Personal Perspective: Mental healthcare AI is evolving beyond administrative roles. By automating routine tasks, therapists can spend sessions focusing on human interactions.

research on meta analysis

Investing in building a positive classroom climate holds benefits for students and teachers alike.

research on meta analysis

Mistakenly blaming cancer-causing chemicals and radiation for most cancers lets us avoid the simple lifestyle changes that could protect us from cancer far more.

research on meta analysis

According to astronomer Carl Sagan, "Extraordinary claims require extraordinary evidence." Does the claim that pet owners live longer pass the extraordinary evidence requirement?

research on meta analysis

People, including leading politicians, are working later in life than ever before. Luckily, social science suggests that aging does not get in the way of job performance.

research on meta analysis

The healthcare industry is regulated to ensure patient safety, efficacy of treatments, and ethical practices. Why aren't these standards applied to mental health apps?

research on meta analysis

Being able to forgive others makes you more resilient. You can learn to let go of anger and bitterness.

research on meta analysis

Discover how recent findings reveal a more promising outlook on the accuracy of performance appraisals.

research on meta analysis

We all love the thrill of experiencing something new. However, recent research can help us understand the powerful appeal of the familiar.

research on meta analysis

Amid growing calls to make verbal abuse a defined category of child maltreatment, a new meta-analysis identifies at least 22 types of verbally abusive behavior.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Teletherapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Coronavirus Disease 2019
  • Affective Forecasting
  • Neuroscience
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

information for practice

news, new scholarship & more from around the world

  • gary.holden@nyu.edu
  • @ Info4Practice

Meta-analysis in a digitalized world: A step-by-step primer

In recent years, much research and many data sources have become digital. Some advantages of digital or Internet-based research, compared to traditional lab research (e.g., comprehensive data collection and storage, availability of data) are ideal for an improved meta-analyses approach.In the meantime, in meta-analyses research, different types of meta-analyses have been developed to provide research syntheses with accurate quantitative estimations. Due to its rich and unique palette of corrections, we recommend to using the Schmidt and Hunter approach for meta-analyses in a digitalized world. Our primer shows in a step-by-step fashion how to conduct a high quality meta-analysis considering digital data and highlights the most obvious pitfalls (e.g., using only a bare-bones meta-analysis, no data comparison) not only in aggregation of the data, but also in the literature search and coding procedure which are essential steps in any meta-analysis. Thus, this primer of meta-analyses is especially suited for a situation where much of future research is headed to: digital research. To map Internet-based research and to reveal any research gap, we further synthesize meta-analyses on Internet-based research (15 articles containing 24 different meta-analyses, on 745 studies, with 1,601 effect sizes), resulting in the first mega meta-analysis of the field. We found a lack of individual participant data (e.g., age and nationality). Hence, we provide a primer for high-quality meta-analyses and mega meta-analyses that applies to much of coming research and also basic hands-on knowledge to conduct or judge the quality of a meta-analyses in a digitalized world.

Read the full article ›

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

The Role of Meta-Analysis in Scientific Studies

Sean is a fact-checker and researcher with experience in sociology, field research, and data analytics.

research on meta analysis

 Maskot / Getty Images

  • Why It Matters
  • Reasons for Use

Disadvantages

At a glance.

Psychological researchers can use meta-analysis to review and analyze many studies on the same subject. While it can be a very helpful way to get a “big picture” view of a topic, meta-analysis also has limitations.

A meta-analysis is a type of statistical analysis in which the results of multiple studies are combined and then analyzed. Researchers can perform this type of study when there have been previous studies looking at the same question.

A meta-analysis is a type of statistical analysis where researchers review, combine, and analyze the results of multiple studies (integrated results). Meta-analysis is useful when there have been many previous studies on the same topic or asking the same question.

This article discusses when meta-analysis might be used and why it’s important. It also covers some advantages and disadvantages of using meta-analysis in psychology research.

What Is Meta-Analysis?

A simple definition of meta-analysis in psychology is that it’s a study of past studies on a subject that can give researchers a “big picture” view of the topic. To do a meta-analysis, a researcher reviews the published studies on a topic and then analyzes all the results to look for trends. Meta-analysis is used in  psychology , medicine, and other fields.

New studies from around the world are constantly being published, so the amount of research that’s out there on any given topic can be overwhelming. A meta-analysis is helpful because it's designed to summarize all the research information on a subject. There are a few general principles that a meta-analysis follows:

  • It is done systematically.
  • It uses certain criteria.
  • It contains a pool of results.
  • It is based on quantitative analysis (mathematical and statistical techniques to measure, model, and understand aspects of human behavior).

Why Is Meta-Analysis Important?

The data provided by a meta-analysis is bigger-picture than a single study, so it gives psychology researchers a better sense of the magnitude of the effect of whatever it is that is being studied—for example, a treatment. A meta-analysis also makes important conclusions clear and can identify trends that can inform future studies, policy decisions, and patient care.

Reasons Researchers Do Meta-Analysis

In addition to summarizing and analyzing integrated results, a meta-analysis also has other uses. For example, psychology researchers can use a meta-analysis to:

  • Evaluate effects in different subsets of participants.
  • Create new hypotheses to be studied in future research.
  • Overcome the limitations of small sample sizes.
  • Establish statistical significance.

Increasing Sample Size

One of the reasons why meta-analyses are used is to overcome a very common problem in research: small  sample sizes .

Even though researchers would often prefer to have a large sample size for a study, it requires more resources, such as funds and personnel, than a small sample size does. When individual studies do not use a large number of subjects, it can be harder to draw reliable and valid conclusions from the findings. 

A meta-analysis helps overcome the issue of small sample sizes because it reviews multiple studies from the same subject area, essentially creating a larger sample size.

Establishing Statistical Significance

Meta-analyses can also help establish  statistical significance  across studies that might otherwise seem to have conflicting results. Statistical significance refers to the probability of the study’s results being due to random chance rather than an important difference. 

When you consider multiple studies at the same time, the statistical significance that is established is much greater than it would be in one study on its own. This is important because statistical significance increases the validity of any observed differences in a study, which, in turn, increases the reliability of the information researchers may glean from the findings.

Benefits of a Meta-Analysis

Meta-analyses offer many advantages over individual studies. Here are just a few benefits of meta-analysis:

  • It has greater statistical power and the ability to extrapolate to the broader population.
  • It is evidence-based.
  • It is more likely to show an effect because smaller studies are combined into one larger study.
  • It has better accuracy (because smaller studies are pooled and analyzed).
  • It is more efficient (because researchers can collect a large amount of data without spending a lot of time, money, and resources since the bulk of the data collection work has already been completed).

Meta-analysis provides a view of the research that has been done in a particular field, summarizes and integrates the different findings, and provides possible directions for future research.

A meta-analysis also reduces the amount of work required to research a topic for other researchers and policymakers. For example, instead of having to look at the results of many smaller studies, people can get a more accurate view of what might be happening in a population by looking at the results of one meta-analysis.

Although it can be a powerful research tool, meta-analysis does have disadvantages:

  • It can be difficult and time-consuming to find all of the appropriate studies to look at.
  • It requires complex statistical skills and techniques (which can be intimidating and challenging for researchers who may lack experience with this type of research).
  • It may have the effect of halting research on a particular topic (for example, rather than giving directions for future research, a meta-analysis may imply that a specific question has been answered sufficiently and no more research is needed).

Types of Bias in Meta-Analysis

The way researchers do a meta-analysis (procedure) can affect the results. Following certain principles is crucial to making sure they draw valid and reliable conclusions from their work.

Even straying slightly from the protocol can produce biased and misleading results. The three main types of bias that can be a problem in meta-analysis are:

  • Publication bias :   When "positive" studies are more likely to be accepted and printed.
  • Search bias : When the search for studies produces unintentionally biased results. This includes using an incomplete set of keywords or varying strategies to search databases. Also, the search engine used can be a factor.
  • Selection bias : When researchers do not clearly define criteria for choosing from the long list of potential studies to be included in the meta-analysis to make sure they get unbiased results.

Examples of Meta-Analysis in Psychology

It can be helpful to look at how a meta-analysis might be used in psychology to research specific topics. For example, imagine that a small study showed that consuming sugar before an exam was correlated to decreased test performance. Taken alone, such results would imply that students should avoid sugar consumption before taking an exam. However, a meta-analysis that pools data looking at eating behavior and subsequent test results might demonstrate that this previous study was an outlier.

Here are a few examples of meta-analysis that have been published on topics in psychology:

  • Massoud Sokouti, Ali Reza Shafiee-Kandjani, Mohsen Sokouti, Babak Sokouti. A meta-analysis of systematic reviews and meta-analyses to evaluate the psychological consequences of COVID-19 .  BMC Psychology . 2023;11(1). doi:10.1186/s40359-023-01313-0
  • Pim Cuijpers, Franco P, Markéta Čihařová, et al. Psychological treatment of perinatal depression: a meta-analysis .  Psychological Medicine . 2021;53(6):2596-2608. doi: 10.1017/s0033291721004529
  • Xu C, Lucille Lucy Miao, Turner DA, DeRubeis RJ. Urbanicity and depression: A global meta-analysis.  Journal of Affective Disorders . 2023;340:299-311. doi:10.1016/j.jad.2023.08.030
  • Pauley D, Pim Cuijpers, Papola D, Miguel C, Eirini Karyotaki. Two decades of digital interventions for anxiety disorders: a systematic review and meta-analysis of treatment effectiveness .  Psychological Medicine . Published online May 28, 2021:1-13. doi:10.1017/s0033291721001999
  • Bhattacharya S, Goicoechea C, Heshmati S, Carpenter JK, Hofmann S. E fficacy of cognitive behavioral therapy for anxiety-related disorders: A meta-analysis of recent literature .  Current Psychiatry Reports . 2022;25(1):19-30. doi:10.1007/s11920-022-01402-8

A meta-analysis can be a useful research tool in psychology. In addition to providing an accurate, big-picture view of a specific topic, the studies can also make it easier for policymakers and other decision-makers to see a summary of findings more quickly. Meta-analysis can run into problems with bias and may suggest that more research is needed on a particular topic, but researchers can avoid these pitfalls by following procedures for doing a meta-analysis closely

‌George Washington University. Study design 101: Meta-analysis .

Cochrane Library. Chapter 10: Analysing data and undertaking meta-analyses .

Wilson, LC. American Psychological Association. Introduction to meta-analysis: a guide for the novice .

Paul J, Mojtaba Barari. Meta‐analysis and traditional systematic literature reviews—What, why, when, where, and how?  Psychology & Marketing . 2022;39(6):1099-1115. doi: 10.1002/mar.21657

Maziarz M. Is meta-analysis of RCTs assessing the efficacy of interventions a reliable source of evidence for therapeutic decisions?  Studies in History and Philosophy of Science . 2022;91:159-167. doi: 10.1016/j.shpsa.2021.11.007

Cochrane. When not to use meta-analysis in a review .

Association for Psychological Science. Meta-analysis helps psychologists build knowledge .

Mikolajewicz N, Komarova SV. Meta-analytic methodology for basic research: a practical guide . Front Physiol . 2019;10:203. doi:10.3389/fphys.2019.00203

František Bartoš, Maier M, Shanks DR, Stanley TD, Sladekova M, Eric‐Jan Wagenmakers. Meta-analyses in psychology often overestimate evidence for and size of effects. Royal Society Open Science . 2023;10(7). doi:10.1098/rsos.230224

Walker E, Hernandez AV, Kattan MW. Meta-analysis: Its strengths and limitations . Cleve Clin J Med. 2008;75(6):431-439. doi:10.3949/ccjm.75.6.431

By Kristalyn Salters-Pedneault, PhD  Kristalyn Salters-Pedneault, PhD, is a clinical psychologist and associate professor of psychology at Eastern Connecticut State University.

  • Open access
  • Published: 23 February 2021

Beta-blocker therapy in patients with COPD: a systematic literature review and meta-analysis with multiple treatment comparison

  • Claudia Gulea   ORCID: orcid.org/0000-0001-9607-5901 1 , 2 ,
  • Rosita Zakeri 3 ,
  • Vanessa Alderman 4 ,
  • Alexander Morgan 5 ,
  • Jack Ross 6 &
  • Jennifer K. Quint 1 , 2 , 7  

Respiratory Research volume  22 , Article number:  64 ( 2021 ) Cite this article

28k Accesses

29 Citations

37 Altmetric

Metrics details

Beta-blockers are associated with reduced mortality in patients with cardiovascular disease but are often under prescribed in those with concomitant COPD, due to concerns regarding respiratory side-effects. We investigated the effects of beta-blockers on outcomes in patients with COPD and explored within-class differences between different agents.

We searched the Cochrane Central Register of Controlled Trials, Embase, Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Medline for observational studies and randomized controlled trials (RCTs) investigating the effects of beta-blocker exposure versus no exposure or placebo, in patients with COPD, with and without cardiovascular indications. A meta-analysis was performed to assess the association of beta-blocker therapy with acute exacerbations of COPD (AECOPD), and a network meta-analysis was conducted to investigate the effects of individual beta-blockers on FEV1. Mortality, all-cause hospitalization, and quality of life outcomes were narratively synthesized.

We included 23 observational studies and 14 RCTs. In pooled observational data, beta-blocker therapy was associated with an overall reduced risk of AECOPD versus no therapy (HR 0.77, 95%CI 0.70 to 0.85) . Among individual beta-blockers, only propranolol was associated with a relative reduction in FEV1 versus placebo, among 199 patients evaluated in RCTs. Narrative syntheses on mortality, all-cause hospitalization and quality of life outcomes indicated a high degree of heterogeneity in study design and patient characteristics but suggested no detrimental effects of beta-blocker therapy on these outcomes.

The class effect of beta-blockers remains generally positive in patients with COPD. Reduced rates of AECOPD, mortality, and improved quality of life were identified in observational studies, while propranolol was the only agent associated with a deterioration of lung function in RCTs.

COPD and cardiovascular disease (CVD) often co-occur, in an interaction characterized by complex biological mechanisms and risk factors such as smoking. Beta-blockers are recommended in treatment regimens of people with heart failure (HF), following myocardial infarction (MI), angina or hypertension, due to proven mortality benefits [ 1 , 2 , 3 , 4 ]. Seventeen years after the publication of the first robust meta-analysis demonstrating that beta-blockers do not impair lung function in patients with COPD [ 5 ], prescription rates remain lower than for people without COPD, among those with an indication for treatment. This treatment gap is thought to be, in part, due to concerns regarding adverse respiratory effects (such as a decrease in lung function) despite accumulating evidence to the contrary[ 6 ]. Concomitant CVD independently affects mortality and hospitalization in patients with COPD, further adding to the clinical burden and complexity of treatment pathways in these patients[ 7 , 8 ].

COPD guidelines recommend the use of cardioselective beta-blockers when appropriate, reinforced by evidence gathered in a Cochrane review [ 9 ]. Data regarding the association of beta-blocker therapy with mortality and acute exacerbations due to COPD (AECOPD) is derived mostly from observational data and previous reviews have aggregated results for cardio and non-cardioselective agents [ 10 , 11 ]. However, a recent single RCT [ 12 ] reported more hospitalizations due to AECOPD in patients treated with metoprolol as compared to placebo, though results on mortality and FEV1 were inconclusive.

Our study expands on previous literature by dissecting the effects of beta-blockers from both RCTs and observational studies, on a wide-range of clinically-relevant end points (mortality, AECOPD, FEV1, all-cause hospitalization and quality of life outcomes such as St. George’s Respiratory Questionnaire (SGRQ), 12 and 6MWT (12, 6 Minute Walking Test) and the Short-Form Health Survey Questionnaire (SF-36), thereby providing a comprehensive assessment of the effects of beta-blocker treatment in COPD. We have two overarching aims: (1) to identify and assess the class-effect of beta-blockers and (2) to compare within-class effects of beta-blockers on the aforementioned outcomes. If all studies have a minimum of one intervention in common with another, it will be possible to create a network of treatments, allowing both direct and indirect evidence to be used in deriving comparisons between beta-blockers not studied in a head-to-head manner, using a network-meta-analysis (NMA). Importantly, we also want to address a current gap in knowledge—we will investigate whether the potential benefits of beta-blockers are limited to those with CVD or may extend in the wider COPD population with or without undiagnosed CVD.

The protocol for this review was previously published [ 13 ]. Searches were conducted from inception to January 2021 in MEDLINE, Embase and CINAHL via Ovid and The Cochrane Collection Central Register of Clinical Trials to identify studies that examined the association between beta-blockers in patients with COPD (defined as post-bronchodilator FEV1/FVC of < 0.70, or as being in accordance with GOLD guidelines [ 6 ]; patients with a clinical diagnosis of COPD) and clinical, safety and quality of life outcomes. To ensure we captured all relevant evidence, we included prospective interventional trials (RCTs) and prospective observational studies (single-arm studies were excluded). At the screening stage, due to a scarcity of prospective observational studies, we decided to also include retrospective observational studies. We required all studies to report on mortality, AECOPD, all-cause hospitalization and quality of life outcomes. We also manually searched reference lists of previously published reviews. Abstracts were screened for inclusion by two independent reviewers, with any discrepancies resolved through discussion. Full texts of included abstracts were screened by a single investigator, and 25% of articles were additionally validated by a second investigator. Full inclusion/exclusion criteria applied at each stage are available in the Additional file 1 : Table S1.

Data extraction and quality assessment

For each accepted study, data was extracted on design, characteristics of study population including comorbidities, inclusion and exclusion criteria, treatment administered and the reported effect of beta-blocker on included outcomes. Details on planned data extraction are available in the protocol [ 13 ]. Authors were contacted to clarify ambiguously reported data from published reports. Included observational studies were assessed for risk of bias using the ROBIN-I [ 14 ] tool for cohort studies and RCTs were assessed using the ROB tool [ 15 ]. Bias domains evaluated include confounding, reporting, attrition, or measurement of outcomes. Each domain was assigned to a risk category such as “low”, “moderate”, “high” or “unclear” for observational studies and “low”, “high” or “some concerns” for RCTs. Additionally, we assessed the certainty of the evidence using the Grading of Recommendations Assessment Development and Evaluation (GRADE) framework [ 16 ].

Searches identified studies reporting on all-cause mortality, AECOPD, FEV1, all-cause hospitalization, SGRQ, the 12 and 6 MWT, and the SF-36. Four researchers extracted data from the included articles, and all were validated by a second researcher.

Data analysis

Where included studies were reasonably statistically and clinically similar, we pooled results using meta-analysis (to investigate class-effect of beta-blocker treatment), or NMA, where data on individual therapeutic compounds was available. Publication bias was assessed using funnel plots if there were at least 10 studies included in meta-analysis [ 17 ]. For binary outcomes we initially included studies that reported on outcomes in any format (Hazard ratio [HR], Odds Ratio [OR], Risk ratio, Incidence Rate); however, the final inclusion list contains only studies reporting HRs since this was the most common amongst included studies. Heterogeneity was assessed using I 2 [ 18 ].

FEV1—Network meta-analysis of RCTs

We performed a random-effects Bayesian NMA to estimate mean change in FEV1 between patients who received individual beta-blockers versus (vs.) placebo with 95% Credible intervals (CrI), using package gemtc [ 19 ] in R v3.6. CrIs represent the 95% probability that the true underlying effect lies in the interval specified. In cases where the standard deviation (SD) for the FEV1 measures was not reported, the SD was extrapolated by averaging the SDs from other studies with similar sample characteristics. Random-effect analyses are widely accepted as the appropriate, more conservative approach when there is heterogeneity across study methods. By contrast, fixed-effect models assume that effect size associated with an intervention does not vary from study to study, and they may be particularly appropriate when only few studies are available for analysis. The best model fit for each network was selected based on a review of the deviance information criterion (DIC) and an evaluation of the different model assumptions.

NMAs include direct and indirect evidence from trials to determine the best available treatment with respect to an outcome of interest. For the results to be valid, NMA assumptions need to be met, including the transitivity and consistency assumptions. For the transitivity assumption to be met, the studies that contribute direct evidence must be similar in distribution of covariates and effect modifiers across the trial populations. Inconsistency occurs when the indirect evidence in a network is different compared to the direct evidence. Assessing consistency of data in the network model is done implicitly in package “gemtc” which uses a decision rule to choose which comparisons may be potentially inconsistent—the “node-splitting” method. Small study effects were explored by conducting comparison-adjusted funnel plots [ 20 ] and publication bias was assessed by Egger’s test among comparisons of beta-blockers and placebo. A value of p < 0.1 indicated significant publication bias. To assess the probability that a treatment is the best within a network, rank probabilities were determined—the probability for each treatment to obtain each possible rank in terms of their relative effects. Interpretation needs to be made with caution, because a treatment may have a high probability of being first, or last treatment and its’ benefit over other treatments may be of little clinical value [ 21 ]. For this reason, we report a full ranking profile (where each treatment is assigned a probability of being first, second, and so on, best treatment in the network) which was derived using the surface under the cumulative ranking curve (SUCRA) [ 22 ].

Sensitivity analyses

We conducted two meta-regressions to establish whether FEV1 measurement at baseline or study duration influenced the main NMA results. These variables were added, separately, as covariates in the main NMA model; FEV1 as a continuous variable and follow-up dichotomised into short follow-up (less than 24 h) vs. long follow-up (more than 24 h). We compared model fit between models with and without covariates using the DIC. Where possible, we analyzed patients with and without CVD separately.

AECOPD—meta-analysis of observational studies

We pooled HRs denoting the association between beta-blocker exposure (vs. no exposure) amongst patients with COPD, using random-effects meta-analysis with the DerSimonian-Lard estimator in “metafor” [ 23 ] package in R v3.6.

Mortality; quality of life—narrative synthesis

If studies were too heterogeneous (I 2  > 75%), or where outcomes were reported in under three studies per treatment comparison, quantitative analysis was not reported, but summary results were graphed on forest plots without pooling the results (mortality) and/or synthesized qualitatively (quality of life outcomes).

The database search identified 2932 potentially relevant articles whilst other sources revealed six. After title and abstract screening, 187 articles underwent full-text review. We included 23 observational studies and 14 RCTs that reported on patients with COPD, in the systematic literature review. Out of a total of 23 observational studies, 21 reported on mortality [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ], five reported on AECOPD [ 24 , 33 , 35 , 45 , 46 ] three reported on all-cause hospitalization [ 47 , 48 , 49 ], one reported on SGRQ [ 45 ] and one reported on SF-36 [ 42 ]. From a total of 14 RCTs, 12 reported on FEV1 [ 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 ], two each reported on 12MWT [ 59 , 62 ] and 6MWT [ 12 , 56 ] and two reported on SGRQ [ 12 , 56 ] (Fig.  1 ).

figure 1

According to our protocol, we intended to include data on effect of beta-blockers on AECOPD from RCTs, however our search strategy revealed only one study of this kind [ 12 ]. Based on a population of 532 patients with moderate to severe COPD, the authors reported no significant difference in time to first AECOPD (of any severity) between metoprolol and placebo, however the use of the beta-blocker was associated with a higher risk of severe exacerbation (requiring hospitalization). This study could not be included in the quantitative analysis, as there was no other RCT data to corroborate.

Quantitative analyses

There were five [ 24 , 33 , 35 , 45 , 46 ] observational studies reporting on the effect of beta-blockers on AECOPD in patients from at least five countries across Europe. Follow-up varied from 0.76 [ 46 ] to 7.2 years [ 33 ]. The average age of the patients ranged from 62.8 [24] to 74 [46] years old and the proportion of males from 49.8% [33] to 72.3% [45]. Only two studies reported on smoking status [ 33 , 45 ], which indicated the majority of patients were either current or former smokers. Comorbidities were frequent in all cohorts, specifically CVD, reported in all but one study [ 35 ]. Body mass index (BMI) was reported in only two studies and ranged between 25.5 [ 45 ] and 29. 9 kg/m 2 [ 24 ]. All study characteristics are available in Additional file 1 : Table S2 and Table S3.

In the presence of low statistical heterogeneity (< 25%), the random effects and fixed effects method for pooling effect estimates give identical results. Due to low heterogeneity (I 2  = 0, owing to the large weight attributed to one study only [ 46 ]) and the small overall number of studies, we report both random and fixed-effects meta-analyses of AECOPD. In random-effects analysis, the pooled estimate risk of AECOPD associated with beta-blocker use, from an total of 27,717 patients, was HR 0.78 [95%CI 0.74–0.82] suggesting a reduction in relative risk in the presence of beta-blockers (Fig.  2 , Additional file 1 : Table S4 for individual study outcomes). The fixed-effects meta-analysis yielded similar results (Additional File 1 : Figure S1). Due to low number of studies we could not formally assess the extent of publication bias. The GRADE assessment indicated the overall quality of evidence based on which the meta-analysis was conducted was low (Additional file 1 : Table S18).

figure 2

Forest plot illustrating results of the meta-analysis evaluating the impact of beta-blocker therapy versus no beta-blocker therapy on AECOPD in patients with COPD (Estimate: HR hazard ratio, 95% CI confidence interval)

Data from 12 RCTs evaluating FEV1 in 199 patients and seven beta-blockers (atenolol, bisoprolol, carvedilol, celiprolol, metoprolol, propranolol, labetalol) were evaluated [ 50 , 51 , 52 , 53 , 54 , 55 , 57 , 58 , 59 , 60 , 61 , 63 ]. Duration of trials varied from 1 hour [ 53 , 59 ] to 3–4 months [ 57 ] and FEV1 measurement at baseline between 1.15 [ 59 ] and 2.41 L [l][ 61 ]. Most patients were over 40 years old except for one study where mean age was 39 [ 60 ]. Across all studies, over 50% of the patient population were male and four studies only included patients with CVD or hypertension explicitly [ 50 , 54 , 55 , 57 ] (Additional file 1 : Table S5). A comparison between studies enrolling patients with CVD and those enrolling patients with COPD only is difficult due to scarcity of reported data. BMI was available in two studies of COPD and CVD [ 55 , 57 ] and in one study only which excluded CVD [ 58 ]. Estimates were however similar and denoted overweight, but not obese patient populations. Celiprolol was the only treatment which was evaluated in patients without CVD exclusively, in one trial [ 61 ] only. Sample size, age and proportion of males were similar across all studies.

Figure  3 shows the network of eligible comparisons for FEV1 mean change from baseline to time-point, including seven treatments. All beta-blockers except carvedilol were evaluated in at least one placebo-controlled trial. Individual study FEV1 measurements are presented in Additional file 1 : Table S6. Figure  4 and Additional file 1 : Table S7 show the NMA results for FEV1. Consistency results are illustrated in Additional file 1 : Figure S2. Effects relative to placebo are presented separately for each treatment.

figure 3

Network of beta-blockers used to treat patients with COPD, from RCTs assessing FEV1

figure 4

Network meta-analysis results for mean difference in FEV1 (95 CrI), beta-blockers compared to placebo [measured in liters, CrI  credible intervals]

There was no significant difference in FEV1 amongst all beta-blockers except for propranolol, which was the only treatment associated with a decrease in FEV1 (mean difference [MD]:− 0.14 ml, 95% CrI,  0.28 to 0.016). Individual medications were ranked and are presented with estimates of the probability that each is the best treatment (i.e. probability that the treatment improves lung function). Figure  5 shows that celiprolol had the highest likelihood of being ranked best treatment, followed by labetalol. For the second rank, the same treatments appear the most likely. Overall, the SUCRA results based on the rankogram values appear to suggest labetalol (86.2%) and celiprolol (80%) are the most likely of being the best treatments to positively affect FEV1, whilst propranolol was the least likely (16.2% probability of being the best) (Additional file 1 : Table S9). According to the comparison-adjusted funnel plot, no publication bias was found for Egger’s test (p = 0.1286, Additional file 1 : Figure S3).

figure 5

Rankogram illustrating probabilities that each treatment is first, second, third…eighth with regards to FEV1 improvement

The meta-regression analyses, with baseline FEV1 measurement, follow-up duration, respectively, added as covariates, showed similar results to the main analysis (model fit did not improve in either model with added covariates, Additional file 1 : Figure S4).

Beta-blocker therapy effect on FEV1 in patients with COPD with and without explicit CVD

Data from eight trials evaluating six beta-blockers (atenolol, bisoprolol, carvedilol, celiprolol, metoprolol, and propranolol) in 137 patients with COPD and no explicit CVD were evaluated [ 51 , 52 , 53 , 56 , 58 , 59 , 60 , 61 ]. No significant difference in FEV1 was detected when comparing each of the active treatments with placebo (Additional file 1 : Figure S5A). Additional file 1 : Figure S6 shows celiprolol was similarly likely to rank first in terms of increasing FEV1, while the second rank was surprisingly obtained by placebo, then celiprolol. There were four trials investigating six  beta-blockers (carvedilol, bisoprolol, atenolol, propranolol, metoprolol, labetalol) in patients with COPD and CVD [ 50 , 54 , 55 , 57 ]. No significant difference in FEV1 was detected when comparing each of the active treatments with placebo (Additional file 1 : Figure S5B, Additional file 1 : Figure S7).

Narrative synthesis

There were 21 observational studies reporting on mortality [ 24 , 25 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 43 , 44 , 64 ] which evaluated the effect of beta-blockers vs. no beta-blocker use, in an overall population of 422,552 patients from at least 11 countries (Additional file 1 : Table S2). According to inclusion criteria, 15 studies enrolled patients with COPD and a CVD indication [ 25 , 27 , 28 , 29 , 34 , 37 , 39 , 40 , 43 , 64 , 65 ], while the remaining six [ 24 , 26 , 32 , 33 , 35 , 44 ] did not specify whether those with CVD were specifically excluded; however, all studies had varying percentages of CVD comorbidities. Overall, patient characteristics varied: mean age ranged between 62.8 [ 24 ] and 84.6 [ 44 ] years old and the proportion of males between and 37% [ 26 ] and 100 [ 44 ]%. Distribution of comorbidities was mixed, with hypertension being the most widely reported and ranging between 27.5% [ 33 ] and 88.3% [ 37 ]. Smoking status was reported in seven studies [ 25 , 26 , 28 , 31 , 33 , 41 , 44 ] where most patients were recorded as being either current or former smokers, however data was not available consistently. BMI was reported in five studies [ 24 , 28 , 29 , 41 , 44 ] only, ranging between 20.4 [ 29 ] and 29.9 [ 24 ]. Follow-up time was also highly variable, ranging from 2 [ 30 ] to 112 months [ 40 ].

Individual adjusted study risk estimates for mortality associated with beta-blocker use vs. no beta-blocker use ranged from HR 0.46 (95%CI 0.19–1.11) [ 29 ] to 1.19 (95%CI 1.04 to 1.37) [ 26 ] (Fig.  6 ). While age and sex were the most common covariates adjusted for, the majority of studies used a variety of study-specific variables: medications for specific indications (such as hypertension, HF) and other comorbidities or clinical variables (Additional file 1 : Table S10). Two studies reported unadjusted analyses [ 27 , 28 ]. There was one study only reporting an increase in mortality risk associated with beta-blockers (HR: 1.19, 95% CI 1.04 to 1.37); however the population assessed in this report consisted of severe COPD patients who were undergoing long-term oxygen therapy [ 26 ]. There was a very high degree of heterogeneity amongst studies (I 2  = 99.3%). This was explored by conducting stratified analyses (i.e. stratifying by type of beta-blocker [cardioselective vs. non-cardioselective, Additional file 1 : Figure S8]; excluding unadjusted estimates; excluding the only study which exclusively included very severe COPD patients). However, due to heterogeneity remaining very high (I 2  > 75%), results from the outcome analysis are presented graphically (Fig.  6 ).

figure 6

Forest plot illustrating the impact of beta-blocker therapy versus no beta-blocker therapy on mortality, in patients with COPD (Estimate: HR hazard ratio, 95% CI confidence interval)

All-cause hospitalization

All-cause hospitalization was reported in three observational studies [ 47 , 48 , 49 ]. One compared cardioselective beta-blockers to non-selective beta-blockers (and presented odds ratios [OR]) [ 48 ]; one compared non-cardioselective beta-blockers to selective beta-blockers (and presented HR) [ 49 ] and one compared cardioselective beta-blockers to lack of beta-blocker treatment (and presented relative risk) [ 47 ] therefore no class-effect comparison could be inferred. None of the studies found significant differences in all-cause hospitalization associated with the investigated treatments (Additional file 1 : Table S11).

Quality of life

SGRQ was assessed in two RCTs [ 12 , 56 ] and one observational study [ 45 ], but none reported mean change from baseline to follow-up per treatment arm. One RCT [ 66 ] compared metoprolol to placebo and one observational study [ 45 ] assessed any beta-blocker compared to lack of beta-blocker treatment; both reported no significant difference in SGRQ between the two treatment arms at one-year follow-up (Additional file 1 : Table S12).

12MWT was investigated in two RCTs [ 59 , 62 ]; one study investigated atenolol and metoprolol vs. placebo, which did not report a mean change in score at  four weeks follow-up [ 62 ]; the second study did not find a significant difference in distance walked between patients that received metoprolol vs. propranolol  six hours after treatment was administered [ 59 ] (Additional file 1 : Table S13).

Data on 6MTW was reported in two recent RCTs from 2017 [ 56 ], respectively 2019 [ 12 ]. The first evaluated the effect of bisoprolol compared to carvedilol and did not present mean change between treatment groups, however the calculated estimates suggest both agents decreased distance walked in patients with COPD with no difference apparent between the two; the second trial [ 12 ] did not identify a significant difference between the metoprolol and placebo on 6MWT (Additional file 1 : Table S14).

Data on SF-36 was available in one observational study [ 42 ]. Whilst overall scores were not available per treatment group, authors reported no significant association between beta-blocker treatment and individual domains of the quality of life assessment tool, either at baseline or 6.4 years follow-up (Additional file 1 : Table S15).

Risk of bias

Observational studies were mostly judged to have moderate risk of bias (23 studies [ 24 , 25 , 26 , 28 , 30 , 31 , 32 , 33 , 34 , 35 , 37 , 38 , 39 , 40 , 41 , 42 , 46 , 47 , 48 , 49 , 64 , 65 ]), two studies [ 25 , 45 ] were considered to be of low risk of bias, one [ 44 ] had serious risk of bias and one [ 27 ] did not provide enough information for a judgment to be made. The domains of bias which were mostly affected by a “moderate rating” were “bias due to confounding” and “bias in selection of participants into study” as the majority of studies included patients recruited from databases which did not provide clinical diagnoses and relied on ICD coding (without confirming validity of diagnosis) (Additional file 1 : Table S16). Ten RCTs [ 53 , 55 , 58 , 60 , 61 ] had moderate risk of bias, denoted by ratings of “some concerns”; two[ 56 , 57 ] studies were deemed of serious risk of bias, both due to the lack of blinding (Additional file 1 : Figure S9).

This comprehensive and up-to-date evaluation of the effects of beta-blockers in patients with COPD adds to the previous literature in several ways: we included all studies reporting on any type of beta-blocker treatment in patients with COPD, showing overall beneficial effects on AECOPD and mortality. For the first time, we used a probabilistic approach to evaluate the effect of beta-blockers on FEV1 using direct and indirect evidence from RCTs in an NMA, comparing seven treatments against placebo, and presented results for patients with COPD with and without CVD disease separately. No beta-blocker affected lung function significantly except propranolol, and the treatments less likely to have a detrimental effect on FEV1 were labetalol (in those with COPD and CVD) and celiprolol (in those with COPD without explicit CVD). Lastly, we found that data on all-cause hospitalization and quality of life endpoints such as SGRQ, 12 and 6MWT and SF-36 were scarcely reported across the literature and did not lend themselves to formal quantitative analysis—suggesting an area of focus for future studies.

Despite heterogeneous elements such as follow-up time, baseline characteristics including age, sex and comorbidities and geographical location, individual results from the 17 out of 21 studies reporting on mortality suggested beta-blocker therapy was associated with a diminished risk of death compared to those not prescribed beta-blockers, in patients with COPD. However, this quality of evidence was deemed “low” per GRADE assessment (Additional file 1 : Table S17) and we were not able to quantify the effect of beta-blockers on mortality due to considerable heterogeneity (I 2  > 75%). Previous reports [ 10 , 11 , 67 ] have provided pooled estimates of reductions in mortality risk associated with beta-blocker treatment, however all reported degrees of heterogeneity above the Cochrane I 2 threshold of 75%; 89.3% [ 10 ], 83% [ 11 ] and most recently 96% [ 67 ] bringing into question the validity and interpretability of these results as applied to the general COPD population. Reasons for very high heterogeneity in previous meta-analyses include: differences in study populations (i.e. including patients with differing degrees of severity), inaccurate risk of bias assessment and inclusion of different comparators for the intervention effect of interest (i.e. including studies where comparator arms received calcium channel blockers, despite aiming to assess the effect of beta-blocker treatment vs. lack of treatment) [ 67 ].

In our analysis, most studies were affected by bias, particularly due to confounding: two studies did not adjust for any covariate factors [ 27 , 55 ], whilst nine did not adjust for COPD severity either directly, or indirectly by including COPD medication regimen/exacerbation history in the final model [ 25 , 26 , 27 , 28 , 30 , 32 , 34 , 36 , 37 ]. Therefore, these studies may overestimate the prognostic effect of beta-blocker therapy on patients with COPD and may, in turn, skew results to show benefits. One of the reasons for the lack of adjustment for COPD-related variables may be due to using data from either existing drug-trials or CVD-specific registries which included data on subgroups of patients with COPD, reiterating the need for trials designed specifically for patients with COPD (with and without additional CVD) which may allow for reliable assessment of the true effect of beta-blockers in these patients. Furthermore, it is not surprising to observe a decrease in mortality, as this could be related to the effect of beta-blockers on other comorbid conditions of patients (i.e. CVD), which is established. A previous study [ 33 ] suggested long-term treatment with beta-blockers improved survival of patients with COPD without CVD, however future studies are needed to confirm this result and to assess whether beta-blockers provide non-CV mortality benefits.

We found evidence to suggest that patients with COPD who are given beta-blockers are at decreased risk of AECOPD (HR 0.78 [95%CI 0.74–0.82]), replicating findings from Du and colleagues [ 10 ] who report an even larger reduction in risk, of 37% (RR 0.63 [95% CI, 0.57–0.71]). However, this previous meta-analysis, had methodological limitations inherent to the observational nature of the pooled studies (i.e. residual confounding, immortal time bias), which may limit generalizability of results. However, the GRADE assessment revealed the body of observational evidence on which our estimate was derived was of “low” quality (Additional file 1 : Table S19). A recent RCT [ 12 ], less likely to be affected by the biases of previous observational studies, found no significant difference between metoprolol and placebo on the time to AECOPD of any severity, but revealed a significant increase in risk of AECOPD requiring hospitalization, in patients with COPD without an indication for beta-blocker treatment, bringing into question the protective effect of this specific beta-blocker agent.

However, this trial did not evaluate other beta-blockers, therefore future RCTs evaluating multiple regimens, are needed to confirm the benefit of these agents. Whether beta-blockers have an indirect effect on exacerbations of COPD could be assessed in clinical trials including patients with COPD and comorbid CVD, allowing assessment of these agents in a more representative COPD population.

FEV1 was assessed in 199 patients enrolled in 12 RCTs and we found that none of the individual cardioselective beta-blockers included in our NMA (atenolol, bisoprolol, celiprolol, metoprolol) were associated with significant effects on lung function in patients with COPD, regardless of baseline FEV1 or follow-up time. This is in line with a Cochrane review [ 9 ] which concluded that cardioselective beta-blockers given in either single dose or for longer durations, do not affect FEV1 in patients with COPD, even in those with the lowest baseline FEV1 measurements. Furthermore, our report extends to incorporate a lack of effect on FEV1 of non-selective beta-blockers such as carvedilol and labetalol. Propranolol was the only medication found to be associated with a reduction of 140 ml in FEV1 (95% CrI: -0.28, -0.016), which is larger than the threshold of 100 ml change deemed clinically significant by the American Thoracic Society and European Respiratory Society guidelines. This result is based on high quality evidence, according to the GRADE assessment (Additional file 1 : Table S19), and thus supports current recommendations to not use this medication in patients with COPD.

For the first time reported in the literature, we aimed to rank beta-blockers with respect to their effect on lung function. Propranolol had the lowest probability of being ranked first (suggesting worse impact on lung function), compared to all other individual treatments considered in our NMA, including placebo. Labetalol and celiprolol—drugs used in hypertension—were the least likely drugs to negatively impact FEV1, compared to all other beta-blockers; however, neither affected FEV1 with certainty compared to placebo and results were  inferred from very low quality evidence according to GRADE (Additional file 1 : Table S18), bringing into question their leading positions in the hierarchy. Since choice of beta-blocker may be influenced by CVD comorbidity (i.e. carvedilol, metoprolol and bisoprolol are recommended in stable HF; atenolol is more often prescribed in patients with asymptomatic hypertension, while bisoprolol is also used in atrial fibrillation, and propranolol is infrequently used to treat tachyarrhythmias), it is perhaps not surprising that we did not identify a clear “best” beta-blocker to be used in COPD. The fact that the beta-blockers less likely to decrease lung function are mainly used to treat hypertension may just reflect this subgroup of patients could be less prone to detrimental side-effects (i.e. indication bias), compared to others with COPD and more severe comorbidities. Indeed, the prescription of beta-blockers in COPD needs to consider clinically significant lung function alteration vs. mortality benefits in those with CVD, particularly MI [ 68 ] and HF [ 69 ].

Whilst CVD is diagnosed in 20 to 60% patients with COPD [ 70 ], our main analysis included primarily small trials and only three explicitly included patients with a cardiac comorbidity (one included angina [ 54 ], two included HF patients [ 55 , 57 ], and one included patients with hypertension, which is a common CVD risk factor [ 50 ]. In line with previous research [ 9 ], we report no significant FEV1 treatment effect in patients with COPD with CVD.

The remaining eight trials excluded those with CVD (or simply did not report whether this was present), and results mirrored those observed for patients with CVD. Whilst results from this subgroup analysis are encouraging, previous clinical data on in this subgroup is scarce. A recent single RCT including COPD patients without an indication for beta-blockers (therefore those with HF, previous MI or revascularization) failed to demonstrate clear benefits of metoprolol over placebo. Observational studies have included a more varied breadth of specific beta-blockers, however they do not present a clear picture: the population-based Rotterdam Study [ 71 ] reported significant decreases in FEV1 associated with both cardio and non-cardioselective beta-blockers, while two other studies, one from Scotland [ 35 ] and an one from Japan [ 72 ] reported no significant difference in FEV1. Yet, these results may be affected by confounding by indication, which may explain the variability of estimates. Additionally, the longer follow-up times in these studies (ranging from 4 to 6 years) may overlook effects of FEV1 decline which is documented in patients with COPD, regardless of CVD comorbidities.

Overall, our FEV1 analysis suggests the beta-blockers included in this review do not affect lung function in patients with COPD regardless of CVD disease status, and selectivity of agent does not appear to have an impact. However, the two treatment networks contained different medications (celiprolol was assessed in one trial excluding CVD, while labetalol in one trial including CVD) thus we cannot rule out any other potential differential results if a whole range of beta-blockers were included. Finally, we included evidence based on a relatively small population and some of the studies were conducted decades ago; therefore, large clinical studies are needed to assess other agents which may confer lung function benefits across contemporary COPD patients.

The effect of beta-blocker exposure on all-cause hospitalization and quality of life outcomes in patients with COPD could not be quantified, due to a paucity of data. Narrative results from the assessment of studies investigating quality of life outcomes, such as SGRQ, 12 and 6 MWT and SF-36 all suggest non-significant effect of beta-blockers, from both RCTs and observational studies, albeit the data was deemed to be of “very low” quality according to GRADE (Additional file 1 : Table S17). Currently, COPD management is focused on preventing exacerbations and improving functioning and health-related quality of life. Clinical studies of beta-blocker treatment in cardiac disease suggests improvements in exercise tolerance and functional status, so whether beta-blockers impair or improve these outcomes in patients with COPD also, is a topic of importance for clinical management. Both randomized trials and, importantly, prospective observational studies with longer follow-up times are needed.

Limitations

There are several limitations to our analysis: first, we included published, peer-reviewed literature only thus, results may affected by publication bias as it is more likely that studies reporting positive results (i.e. that did not find beta-blockers were associated with negative outcomes) are more often reported than negative studies. Nevertheless, our data is based on the most recent available evidence and portray a nuanced implication of specific beta-blocker treatment in patients with COPD, emphasizing the need for a targeted treatment of CVD comorbidity in these patients.

We only included stable COPD patients and whilst we showed that FEV1 reduction (or increase) was not significant according to beta-blocker exposure (apart for propranolol), we could not verify whether these therapeutic agents diminish the response to rescue COPD medication such as beta-agonists, administered during an exacerbation of COPD. We also did not verify long-term effects of co-administration of beta-blockers and beta-agonists and how their interaction may affect outcomes in patients receiving both types of medication.

Another issue is undiagnosed CVD in patients with COPD. Symptoms of ischemic heart disease or HF may be misattributed or overlapping with COPD, and thus not formally diagnosed, posing difficulties in disentangling possible non-cardiac effects of beta-blockers, independent of their proven cardiac benefits. One advantage of our FEV1 analysis is that we included RCTs only, where concomitant CVD is often ascertained more rigorously and therefore CVD status was known with a greater degree of confidence that may be the case in observational studies.

Furthermore, no statistically significant effect was detected in subgroup analyses stratified by CVD status, which may be due to limited sample size. Future, adequately powered RCTs are needed to assess the effect of beta-blockers in a diverse COPD population, allowing for accurate comparisons based on CVD status to be made.

A recent RCT [ 12 ] comparing metoprolol with placebo failed to find a significant effect on FEV1, but reported worsening of dyspnea and overall COPD symptoms, suggestive of respiratory effects not captured by spirometry. This confirms the need to evaluate a spectrum of respiratory outcomes to fully assess the implications of beta-blocker treatment in patients with COPD, which needs to be addressed in future studies

Confounding by contraindication is likely to affect interpretation of results—if we assume clinicians knowingly withheld treatment from patients due to concerns regarding breathlessness, this may have resulted in a reduced sample size of possible COPD patients who may have been eligible for beta-blocker therapy. Alternatively, doctors may prescribe beta-blockers to less severe patients, limiting generalizability.

Our AECOPD analysis is also limited by a low number of included studies, all of which were observational—we identified one RCT only (evaluating metoprolol). This reinstates the need of more carefully conducted RCTs to evaluate a range of beta-blockers and their effects of AECOPD, in order to validate observational data.

Findings from this analysis represent the most comprehensive and up-to-date available evidence synthesis to assess the effects of beta-blocker use in patients with COPD, spanning data published over four decades. A reduction in COPD exacerbation risk was inferred from observational data while clinical data were pooled to assess lung function. Mortality and quality of life were narratively described owing to high heterogeneity or sparsity of data, respectively. FEV1 was significantly impacted by propranolol, but not by atenolol, bisoprolol, carvedilol, celiprolol, labetalol or metoprolol. In the subset of individuals with CVD, no individual beta-blocker was associated with a reduction in lung function. Treatment choice in patients with COPD should be made according to CVD comorbidity guidelines on management.

Availability of data and materials

The datasets analyzed during this study are available from the corresponding author upon reasonable request.

Abbreviations

6-Minute walking test

12-Minute walking test

Acute exacerbation due to COPD

Body mass index

Chronic obstructive pulmonary disease

Confidence interval

Credible interval

Cardiovascular disease

Forced expiratory volume in one second

Heart failure

Hazard ratio

  • Network meta-analysis

Myocardial infarction

Short-Form Health Survey Questionnaire

St. George’s Respiratory Questionnaire

Surface under the cumulative ranking

Randomized controlled trial

Heidenreich PAMK, Hastie T, Fadel B, Hagan V, Lee BK, Hlatky MA. Meta-analysis of trials comparing β-blockers, calcium antagonists, and nitrates for stable angina. JAMA. 1999;281(20):1927–36.

Article   CAS   PubMed   Google Scholar  

Ponikowski P, Voors AA, Anker SD, Bueno H, Cleland JGF, Coats AJS, et al. 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: The Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC) Developed with the special contribution of the Heart Failure Association (HFA) of the ESC. Eur Heart J. 2016;37(27):2129–200.

Article   PubMed   Google Scholar  

Task Force on the management of STseamiotESoC, Steg PG, James SK, Atar D, Badano LP, Blomstrom-Lundqvist C, et al. ESC Guidelines for the management of acute myocardial infarction in patients presenting with ST-segment elevation. Eur Heart J. 2012;33(20):2569–619.

Article   CAS   Google Scholar  

Whelton PK, Carey RM, Aronow WS, Casey DE Jr, Collins KJ, Dennison Himmelfarb C, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: Executive Summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension. 2018;71(6):1269–324.

Salpeter SR, Ormiston TM, Salpeter EE, Poole PJ, Cates CJ. Cardioselective beta-blockers for chronic obstructive pulmonary disease: a meta-analysis. Respir Med. 2003;97(10):1094–101.

Vestbo J, Hurd SS, Agusti AG, Jones PW, Vogelmeier C, Anzueto A, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med. 2013;187(4):347–65.

Morgan AD, Zakeri R, Quint JK. Defining the relationship between COPD and CVD: what are the implications for clinical practice? Ther Adv Respir Dis. 2018;12:1753465817750524.

Article   PubMed   PubMed Central   Google Scholar  

Rabe KF, Hurst JR, Suissa S. Cardiovascular disease and COPD: dangerous liaisons? Eur Respir Rev 2018;27(149).

Salpeter S, Ormiston T, Salpeter E. Cardioselective beta-blockers for chronic obstructive pulmonary disease. Cochrane Database Syst Rev. 2016;4:CD003566.

Google Scholar  

Du Q, Sun Y, Ding N, Lu L, Chen Y. Beta-blockers reduced the risk of mortality and exacerbation in patients with COPD: a meta-analysis of observational studies. PLoS ONE. 2014;9(11):e113048.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Etminan MJS, Carleton B, FitzGerald JM. Beta-blocker use and COPD mortality: a systematic review and meta-analysis. BMC Pulm Med. 2012;12(1):48.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Dransfield MT, Voelker H, Bhatt SP, Brenner K, Casaburi R, Come CE, et al. Metoprolol for the Prevention of Acute Exacerbations of COPD. N Engl J Med. 2019;381(24):2304–14.

Gulea C, Zakeri R, Quint JK. Effect of beta-blocker therapy on clinical outcomes, safety, health-related quality of life and functional capacity in patients with chronic obstructive pulmonary disease (COPD): a protocol for a systematic literature review and meta-analysis with multiple treatment comparison. BMJ Open. 2018;8(11):e024736.

Sterne JA, Hernan MA, Reeves BC, Savovic J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Sterne JAC, Savovic J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

Puhan MA, Schunemann HJ, Murad MH, Li T, Brignardello-Petersen R, Singh JA, et al. A GRADE Working Group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ. 2014;349:g5630.

Higgins JPTJ, Chandler J, Cumpston M, Li T, Page MJ, Welch VA, editors. Cochrane handbook for systematic reviews of interventions. New Jersey: Wiley; 2019.

Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–58.

van Valkenhoef G, Lu G, de Brock B, Hillege H, Ades AE, Welton NJ. Automating network meta-analysis. Res Synth Methods. 2012;3(4):285–99.

Salanti G, Del Giovane C, Chaimani A, Caldwell DM, Higgins JP. Evaluating the quality of evidence from a network meta-analysis. PloS one. 2014;3;9(7):e99682.

Trinquart L, Attiche N, Bafeta A, Porcher R, Ravaud P. Uncertainty in treatment rankings: reanalysis of network meta-analyses of randomized trials. Ann Intern Med. 2016;164(10):666–73.

Furukawa TA, Salanti G, Atkinson LZ, Leucht S, Ruhe HG, Turner EH, et al. Comparative efficacy and acceptability of first-generation and second-generation antidepressants in the acute treatment of major depression: protocol for a network meta-analysis. BMJ Open. 2016;6(7):e010919.

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

Article   Google Scholar  

Bhatt SP, Wells JM, Kinney GL, Washko GR Jr, Budoff M, Kim YI, et al. Beta-blockers are associated with a reduction in COPD exacerbations. Thorax. 2016;71(1):8–14.

Coiro S, Girerd N, Rossignol P, Ferreira JP, Maggioni A, Pitt B, et al. Association of beta-blocker treatment with mortality following myocardial infarction in patients with chronic obstructive pulmonary disease and heart failure or left ventricular dysfunction: a propensity matched-cohort analysis from the High-Risk Myocardial Infarction Database Initiative. Eur J Heart Fail. 2017;19(2):271–9.

Ekstrom MP, Hermansson AB, Strom KE. Effects of cardiovascular drugs on mortality in severe chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2013;187(7):715–20.

Gottlieb SSMR, Vogel RA. Effect of beta-blockade on mortality among high-risk and low-risk patients after myocardial infarction. N Engl J Med. 1998;339(8):489–97.

Hawkins NM, Huang Z, Pieper KS, Solomon SD, Kober L, Velazquez EJ, et al. Chronic obstructive pulmonary disease is an independent predictor of death but not atherosclerotic events in patients with myocardial infarction: analysis of the Valsartan in Acute Myocardial Infarction Trial (VALIANT). Eur J Heart Fail. 2009;11(3):292–8.

Kubota Y, Asai K, Furuse E, Nakamura S, Murai K, Tsukada YT, et al. Impact of beta-blocker selectivity on long-term outcomes in congestive heart failure patients with chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. 2015;10:515–23.

Mentz RJ, Wojdyla D, Fiuzat M, Chiswell K, Fonarow GC, O’Connor CM. Association of beta-blocker use and selectivity with outcomes in patients with heart failure and chronic obstructive pulmonary disease (from OPTIMIZE-HF). Am J Cardiol. 2013;111(4):582–7.

Quint JK, Herrett E, Bhaskaran K, Timmis A, Hemingway H, Wedzicha JA, et al. Effect of beta blockers on mortality after myocardial infarction in adults with COPD: population based cohort study of UK electronic healthcare records. BMJ. 2013;347:f6650.

Rodriguez-Manero M, Lopez-Pardo E, Cordero A, Ruano-Ravina A, Novo-Platas J, Pereira-Vazquez M, et al. A prospective study of the clinical outcomes and prognosis associated with comorbid COPD in the atrial fibrillation population. Int J Chron Obstruct Pulmon Dis. 2019;14:371–80.

Rutten FH, Hak Eelko ZNPA, Grobbee DE, Hoes AW. Blockers may reduce mortality and risk of exacerbations in patients with chronic obstructive pulmonary disease. Arch Intern Med. 2010;170(10):880–7.

Scrutinio D, Guida P, Passantino A, Ammirati E, Oliva F, Lagioia R, et al. Acutely decompensated heart failure with chronic obstructive pulmonary disease: clinical characteristics and long-term survival. Eur J Intern Med. 2019;60:31–8.

Short PM, Lipworth SI, Elder DH, Schembri S, Lipworth BJ. Effect of beta blockers in treatment of chronic obstructive pulmonary disease: a retrospective cohort study. BMJ. 2011;342:d2549.

Sin DD, McAlister FA. The effects of beta-blockers on morbidity and mortality in a population-based cohort of 11,942 elderly patients with heart failure. Am J Med. 2002;113(8):650–6.

Staszewsky L, Cortesi L, Tettamanti M, Dal Bo GA, Fortino I, Bortolotti A, et al. Outcomes in patients hospitalized for heart failure and chronic obstructive pulmonary disease: differences in clinical profile and treatment between 2002 and 2009. Eur J Heart Fail. 2016;18(7):840–8.

Su TH, Chang SH, Kuo CF, Liu PH, Chan YL. beta-blockers after acute myocardial infarction in patients with chronic obstructive pulmonary disease: a nationwide population-based observational study. PLoS ONE. 2019;14(3):e0213187.

Su VY, Chang YS, Hu YW, Hung MH, Ou SM, Lee FY, et al. Carvedilol, bisoprolol, and metoprolol use in patients with coexistent heart failure and chronic obstructive pulmonary disease. Medicine (Baltimore). 2016;95(5):e2427.

Article   CAS   PubMed Central   Google Scholar  

Su VYYY, Perng DW, Tsai YH, Chou KT, Su KC, Su WJ, Chen PC, Yang KY. Real-world effectiveness of medications on survival in patients with COPD-heart failure overlap. Aging. 2019;11(11):3650.

van Gestel YR, Hoeks SE, Sin DD, Welten GM, Schouten O, Witteveen HJ, et al. Impact of cardioselective beta-blockers on mortality in patients with chronic obstructive pulmonary disease and atherosclerosis. Am J Respir Crit Care Med. 2008;178(7):695–700.

van Gestel YR, Hoeks SE, Sin DD, Stam H, Mertens FW, Bax JJ, van Domburg RT, Poldermans D. Beta-blockers and health-related quality of life in patients with peripheral arterial disease and COPD. Int J Chronic Obstructive Pulm Dis. 2009;4:177.

Wang WH, Cheng CC, Mar GY, Wei KC, Huang WC, Liu CP. Improving outcomes in chronic obstructive pulmonary disease by taking beta-blockers after acute myocardial infarction: a nationwide observational study. Heart Vessels. 2019;34(7):1158–67.

Zeng LH, Hu YX, Liu L, Zhang M, Cui H. Impact of beta2-agonists, beta-blockers, and their combination on cardiac function in elderly male patients with chronic obstructive pulmonary disease. Clin Interv Aging. 2013;8:1157–65.

CAS   PubMed   PubMed Central   Google Scholar  

Maltais F, Buhl R, Koch A, Amatto VC, Reid J, Gronke L, et al. Beta-blockers in COPD: a cohort study from the TONADO Research Program. Chest. 2018;153(6):1315–25.

Rasmussen DB, Bodtger U, Lamberts M, Torp-Pedersen C, Gislason G, Lange P, et al. Beta-blocker use and acute exacerbations of COPD following myocardial infarction: a Danish nationwide cohort study. Thorax. 2020;75(11):928–33.

Brooks TW, Creekmore F, Young DC, Asche CV, Oberg B, Samuelson WM. Rates of hospitalizations and emergency department visits in patients with asthma and chronic obstructive pulmonary disease taking β-blockers. Pharmacotherapy. 2007;27(5):684–90.

Farland MZ, Peters CJ, Williams JD, Bielak KM, Heidel RE, Ray SM. beta-Blocker use and incidence of chronic obstructive pulmonary disease exacerbations. Ann Pharmacother. 2013;47(5):651–6.

Article   PubMed   CAS   Google Scholar  

Sessa M, Mascolo A, Mortensen RN, Andersen MP, Rosano GMC, Capuano A, et al. Relationship between heart failure, concurrent chronic obstructive pulmonary disease and beta-blocker use: a Danish nationwide cohort study. Eur J Heart Fail. 2018;20(3):548–56.

Adam WR, Meagher EJ, Barter CE. Labetalol, beta blockers, and acute deterioration of chronic airway obstruction. Clin Exp Hypertens A. 1982;4(8):1419–28.

CAS   PubMed   Google Scholar  

Chang CL, Mills GD, McLachlan JD, Karalus NC, Hancox RJ. Cardio-selective and non-selective beta-blockers in chronic obstructive pulmonary disease: effects on bronchodilator response and exercise. Intern Med J. 2010;40(3):193–200.

Chester EH, Schwartz HJ, Fleming GM. Adverse effect of propranolol on airway function in nonasthmatic chronic obstructive lung disease. Chest. 1981;79(5):540–4.

Sinclair DJ. Comparison of effects of propranolol and metoprolol on airways obstruction in chronic bronchitis. Br Med J. 1979;1(6157):168.

Dorow PBH, Tönnesmann U. Effects of single oral doses of bisoprolol and atenolol on airway function in nonasthmatic chronic obstructive lung disease and angina pectoris. Eur J Clin Pharmacol. 1986;31(2):143–7.

Hawkins NM, MacDonald MR, Petrie MC, Chalmers GW, Carter R, Dunn FG, et al. Bisoprolol in patients with heart failure and moderate to severe chronic obstructive pulmonary disease: a randomized controlled trial. Eur J Heart Fail. 2009;11(7):684–90.

Jabbal S, Anderson W, Short P, Morrison A, Manoharan A, Lipworth BJ. Cardiopulmonary interactions with beta-blockers and inhaled therapy in COPD. QJM. 2017;110(12):785–92.

Lainscak M, Podbregar M, Kovacic D, Rozman J, von Haehling S. Differences between bisoprolol and carvedilol in patients with chronic heart failure and chronic obstructive pulmonary disease: a randomized trial. Respir Med. 2011;105:S44–9.

Mainguy V, Girard D, Maltais F, Saey D, Milot J, Senechal M, et al. Effect of bisoprolol on respiratory function and exercise capacity in chronic obstructive pulmonary disease. Am J Cardiol. 2012;110(2):258–63.

McGavin CRWI. The effects of oral propranolol and metoprolol on lung function and exercise performance in chronic airways obstruction. Br J Dis Chest. 1978;72:327–32.

Ranchod SR. The effect of beta-blockers on ventilatory function in chronic bronchitis. South African Med J. 1982;61(12):423–4.

CAS   Google Scholar  

van der Woude HJZJ, Postma DS, Winter TH, van Hulst M, Aalbers R. Detrimental effects of β-blockers in COPD: a concern for nonselective β-blockers. Chest. 2005;127(3):818–24.

Butland RJPJ, Geddes DM. Effect of beta-adrenergic blockade on hyperventilation and exercise tolerance in emphysema. J Appl Physiol. 1983;54(5):1368–73.

Jabbal S, Lipworth BJ. Tolerability of bisoprolol on domiciliary spirometry in COPD. Lung. 2018;196(1):11–4.

Ellingsen J, Johansson G, Larsson K, Lisspers K, Malinovschi A, Stallberg B, et al. Impact of comorbidities and commonly used drugs on mortality in COPD—Real-world data from a primary care setting. Int J Chron Obstruct Pulmon Dis. 2020;15:235–45.

Sin DD, Anthonisen NR, Soriano JB, Agusti AG. Mortality in COPD: role of comorbidities. Eur Respir J. 2006;28(6):1245–57.

Dransfield MT, Rowe SM, Johnson JE, Bailey WC, Gerald LB. Use of beta blockers and the risk of death in hospitalised patients with acute exacerbations of COPD. Thorax. 2008;63(4):301–5.

Yang YL, Xiang ZJ, Yang JH, Wang WJ, Xu ZC, Xiang RL. Association of β-blocker use with survival and pulmonary function in patients with chronic obstructive pulmonary and cardiovascular disease: a systematic review and meta-analysis. Eur Heart J. 2020;41(46):4415–22.

Hjalmarson ÅHJ, Malek I, Ryden L, Vedin A, Waldenström A, Wedel H, Elmfeldt D, Holmberg S, Nyberg G, Swedberg K. Effect on mortality of metoprolol in acute myocardial infarction: a double-blind randomised trial. Lancet. 1981;17(318):823–7.

Hjalmarson ÅGS, Fagerberg B, Wedel H, Waagstein F, Kjekshus J, Wikstrand J, El Allaf D, Vítovec J, Aldershvile J, Halinen M. Effects of controlled-release metoprolol on total mortality, hospitalizations, and well-being in patients with heart failure: the Metoprolol CR/XL Randomized Intervention Trial in congestive heart failure (MERIT-HF). JAMA. 2000;283(10):1295–302.

Loth DW, Brusselle GG, Lahousse L, Hofman A, Leufkens HG, Stricker BH. beta-Adrenoceptor blockers and pulmonary function in the general population: the Rotterdam Study. Br J Clin Pharmacol. 2014;77(1):190–200.

Oda N, Miyahara N, Ichikawa H, Tanimoto Y, Kajimoto K, Sakugawa M, et al. Long-term effects of beta-blocker use on lung function in Japanese patients with chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. 2017;12:1119–24.

Download references

CG is funded by an NHLI studentship.

Author information

Authors and affiliations.

National Heart and Lung Institute, Imperial College London, Manresa Road, London, UK

Claudia Gulea & Jennifer K. Quint

NIHR Imperial Biomedical Research Centre, London, UK

British Heart Foundation Centre for Research Excellence, King’s College London, London, UK

Rosita Zakeri

Homerton University Hospital NHS Foundation Trust, London, UK

Vanessa Alderman

Epsom and St. Helier University Hospitals NHS Trust, Epsom, UK

Alexander Morgan

Guy’s & St Thomas’ NHS Foundation Trust, London, UK

Royal Brompton & Harefield NHS Foundation Trust, London, UK

Jennifer K. Quint

You can also search for this author in PubMed   Google Scholar

Contributions

CG, RZ and JKQ made substantial contributions to the conception and design of the study. CG, VA, AM and JR screened abstracts and full-texts and extracted the data. CG carried out statistical analyses and wrote the first draft. CG, RZ, JKQ, VA, AM and JR contributed to data interpretation and provided revisions to the final manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Claudia Gulea .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

CG, RZ, VA, AM and JR have no conflict of interest. JKQ’s research group has received funds from AZ, GSK, The Health Foundation, MRC, British Lung Foundation, IQVIA, Chiesi, and Asthma UK outside the submitted work; grants and personal fees from GlaxoSmithKline, Boehringer Ingelheim, AstraZeneca, Bayer, Insmed outside the submitted work.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: figure s1..

Forest plot illustrating results of the meta-analysis evaluating the impact of beta-blocker therapy vs. no beta-blocker therapy on AECOPD in patients with COPD. Figure S2 . Consistency results illustrating no significant difference between direct and indirect evidence across all comparisons that were assessed in the FEV1 network meta-analysis. Figure S3 . Comparison-adjusted funnel plot . Figure S4. Network meta-analysis with meta-regression results (long vs. short follow-up) . Figure S5 . Network meta-analysis results for patients without A) COPD with explicit cardiovascular disease; B) with cardiovascular disease. Figure S6. Rankogram illustrating probabilities of being 1st, 2nd, 3rd…7th with respect to improvement in lung function, for each beta-blocker (and placebo) in patients with COPD without explicit cardiovascular disease. Figure S7 . Rankogram illustrating probabilities of being 1st, 2nd, 3rd…7th with respect to improvement in lung function for each beta-blocker (and placebo) in patients with COPD with cardiovascular disease. Figure S8. Forest plot showing hazard ratios associated with A) Cardioselective beta-blockers and B) Non-cardioselective beta-blockers and mortality in patients with COPD . Figure S9. Risk of bias assessment, RCTs. Table S1 . Screening criteria. Table S2 . Summary of observational studies. Table S3. Patient characteristics—observational studies. Table S4 . AECOPD estimates for beta-blocker versus no beta-blocker use, from individual observational studies. Table S5 . Study characteristics—RCTs. Table S6 . Baseline  characteristics—RCTs. Table S7 . FEV1 measurements—RCTs. Table S8 . Network meta-analysis results—league table. Table S9 . SUCRA ranking probability of being the best treatment. Table S10. Mortality estimates for beta-blocker versus no beta-blocker use, from individual studies. Table S11 . All-cause hospitalization results. Table S12. SGRQ results. Table S13 .12MWT results. Table S14. 6MWT results. Table S15 . SF-36 results. Table S16 . Risk of bias assessment, observational studies. Table S17. GRADE assessment (mortality, quality of life). Table S18 . GRADE assessment (AECOPD). Table S19. GRADE assessment from each pair-wise comparison within the NMA network (FEV1 analysis)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gulea, C., Zakeri, R., Alderman, V. et al. Beta-blocker therapy in patients with COPD: a systematic literature review and meta-analysis with multiple treatment comparison. Respir Res 22 , 64 (2021). https://doi.org/10.1186/s12931-021-01661-8

Download citation

Received : 21 December 2020

Accepted : 10 February 2021

Published : 23 February 2021

DOI : https://doi.org/10.1186/s12931-021-01661-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Beta-blockers

Respiratory Research

ISSN: 1465-993X

research on meta analysis

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Hippokratia
  • v.14(Suppl 1); 2010 Dec

Meta-analysis in medical research

The objectives of this paper are to provide an introduction to meta-analysis and to discuss the rationale for this type of research and other general considerations. Methods used to produce a rigorous meta-analysis are highlighted and some aspects of presentation and interpretation of meta-analysis are discussed.

Meta-analysis is a quantitative, formal, epidemiological study design used to systematically assess previous research studies to derive conclusions about that body of research. Outcomes from a meta-analysis may include a more precise estimate of the effect of treatment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. The examination of variability or heterogeneity in study results is also a critical outcome. The benefits of meta-analysis include a consolidated and quantitative review of a large, and often complex, sometimes apparently conflicting, body of literature. The specification of the outcome and hypotheses that are tested is critical to the conduct of meta-analyses, as is a sensitive literature search. A failure to identify the majority of existing studies can lead to erroneous conclusions; however, there are methods of examining data to identify the potential for studies to be missing; for example, by the use of funnel plots. Rigorously conducted meta-analyses are useful tools in evidence-based medicine. The need to integrate findings from many studies ensures that meta-analytic research is desirable and the large body of research now generated makes the conduct of this research feasible.

Important medical questions are typically studied more than once, often by different research teams in different locations. In many instances, the results of these multiple small studies of an issue are diverse and conflicting, which makes the clinical decision-making difficult. The need to arrive at decisions affecting clinical practise fostered the momentum toward "evidence-based medicine" 1 – 2 . Evidence-based medicine may be defined as the systematic, quantitative, preferentially experimental approach to obtaining and using medical information. Therefore, meta-analysis, a statistical procedure that integrates the results of several independent studies, plays a central role in evidence-based medicine. In fact, in the hierarchy of evidence ( Figure 1 ), where clinical evidence is ranked according to the strength of the freedom from various biases that beset medical research, meta-analyses are in the top. In contrast, animal research, laboratory studies, case series and case reports have little clinical value as proof, hence being in the bottom.

An external file that holds a picture, illustration, etc.
Object name is hippokratia-14-29-g001.jpg

Meta-analysis did not begin to appear regularly in the medical literature until the late 1970s but since then a plethora of meta-analyses have emerged and the growth is exponential over time ( Figure 2 ) 3 . Moreover, it has been shown that meta-analyses are the most frequently cited form of clinical research 4 . The merits and perils of the somewhat mysterious procedure of meta-analysis, however, continue to be debated in the medical community 5 – 8 . The objectives of this paper are to introduce meta-analysis and to discuss the rationale for this type of research and other general considerations.

An external file that holds a picture, illustration, etc.
Object name is hippokratia-14-30-g001.jpg

Meta-Analysis and Systematic Review

Glass first defined meta-analysis in the social science literature as "The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings" 9 . Meta-analysis is a quantitative, formal, epidemiological study design used to systematically assess the results of previous research to derive conclusions about that body of research. Typically, but not necessarily, the study is based on randomized, controlled clinical trials. Outcomes from a meta-analysis may include a more precise estimate of the effect of treatment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. Identifying sources of variation in responses; that is, examining heterogeneity of a group of studies, and generalizability of responses can lead to more effective treatments or modifications of management. Examination of heterogeneity is perhaps the most important task in meta-analysis. The Cochrane collaboration has been a long-standing, rigorous, and innovative leader in developing methods in the field 10 . Major contributions include the development of protocols that provide structure for literature search methods, and new and extended analytic and diagnostic methods for evaluating the output of meta-analyses. Use of the methods outlined in the handbook should provide a consistent approach to the conduct of meta-analysis. Moreover, a useful guide to improve reporting of systematic reviews and meta-analyses is the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-analyses) statement that replaced the QUOROM (QUality Of Reporting of Meta-analyses) statement 11 – 13 .

Meta-analyses are a subset of systematic review. A systematic review attempts to collate empirical evidence that fits prespecified eligibility criteria to answer a specific research question. The key characteristics of a systematic review are a clearly stated set of objectives with predefined eligibility criteria for studies; an explicit, reproducible methodology; a systematic search that attempts to identify all studies that meet the eligibility criteria; an assessment of the validity of the findings of the included studies (e.g., through the assessment of risk of bias); and a systematic presentation and synthesis of the attributes and findings from the studies used. Systematic methods are used to minimize bias, thus providing more reliable findings from which conclusions can be drawn and decisions made than traditional review methods 14 , 15 . Systematic reviews need not contain a meta-analysisthere are times when it is not appropriate or possible; however, many systematic reviews contain meta-analyses 16 .

The inclusion of observational medical studies in meta-analyses led to considerable debate over the validity of meta-analytical approaches, as there was necessarily a concern that the observational studies were likely to be subject to unidentified sources of confounding and risk modification 17 . Pooling such findings may not lead to more certain outcomes. Moreover, an empirical study showed that in meta-analyses were both randomized and non-randomized was included, nonrandomized studies tended to show larger treatment effects 18 .

Meta-analyses are conducted to assess the strength of evidence present on a disease and treatment. One aim is to determine whether an effect exists; another aim is to determine whether the effect is positive or negative and, ideally, to obtain a single summary estimate of the effect. The results of a meta-analysis can improve precision of estimates of effect, answer questions not posed by the individual studies, settle controversies arising from apparently conflicting studies, and generate new hypotheses. In particular, the examination of heterogeneity is vital to the development of new hypotheses.

Individual or Aggregated Data

The majority of meta-analyses are based on a series of studies to produce a point estimate of an effect and measures of the precision of that estimate. However, methods have been developed for the meta-analyses to be conducted on data obtained from original trials 19 , 20 . This approach may be considered the "gold standard" in metaanalysis because it offers advantages over analyses using aggregated data, including a greater ability to validate the quality of data and to conduct appropriate statistical analysis. Further, it is easier to explore differences in effect across subgroups within the study population than with aggregated data. The use of standardized individual-level information may help to avoid the problems encountered in meta-analyses of prognostic factors 21 , 22 . It is the best way to obtain a more global picture of the natural history and predictors of risk for major outcomes, such as in scleroderma 23 – 26 .This approach relies on cooperation between researchers who conducted the relevant studies. Researchers who are aware of the potential to contribute or conduct these studies will provide and obtain additional benefits by careful maintenance of original databases and making these available for future studies.

Literature Search

A sound meta-analysis is characterized by a thorough and disciplined literature search. A clear definition of hypotheses to be investigated provides the framework for such an investigation. According to the PRISMA statement, an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes and study design (PICOS) should be provided 11 , 12 . It is important to obtain all relevant studies, because loss of studies can lead to bias in the study. Typically, published papers and abstracts are identified by a computerized literature search of electronic databases that can include PubMed ( www.ncbi.nlm.nih.gov./entrez/query.fcgi ), ScienceDirect ( www.sciencedirect.com ), Scirus ( www.scirus.com/srsapp ), ISI Web of Knowledge ( http://www.isiwebofknowledge.com ), Google Scholar ( http://scholar.google.com ) and CENTRAL (Cochrane Central Register of Controlled Trials, http://www.mrw.interscience.wiley.com/cochrane/cochrane_clcentral_articles_fs.htm ). PRISMA statement recommends that a full electronic search strategy for at least one major database to be presented 12 . Database searches should be augmented with hand searches of library resources for relevant papers, books, abstracts, and conference proceedings. Crosschecking of references, citations in review papers, and communication with scientists who have been working in the relevant field are important methods used to provide a comprehensive search. Communication with pharmaceutical companies manufacturing and distributing test products can be appropriate for studies examining the use of pharmaceutical interventions.

It is not feasible to find absolutely every relevant study on a subject. Some or even many studies may not be published, and those that are might not be indexed in computer-searchable databases. Useful sources for unpublished trials are the clinical trials registers, such as the National Library of Medicine's ClinicalTrials.gov Website. The reviews should attempt to be sensitive; that is, find as many studies as possible, to minimize bias and be efficient. It may be appropriate to frame a hypothesis that considers the time over which a study is conducted or to target a particular subpopulation. The decision whether to include unpublished studies is difficult. Although language of publication can provide a difficulty, it is important to overcome this difficulty, provided that the populations studied are relevant to the hypothesis being tested.

Inclusion or Exclusion Criteria and Potential for Bias

Studies are chosen for meta-analysis based on inclusion criteria. If there is more than one hypothesis to be tested, separate selection criteria should be defined for each hypothesis. Inclusion criteria are ideally defined at the stage of initial development of the study protocol. The rationale for the criteria for study selection used should be clearly stated.

One important potential source of bias in meta-analysis is the loss of trials and subjects. Ideally, all randomized subjects in all studies satisfy all of the trial selection criteria, comply with all the trial procedures, and provide complete data. Under these conditions, an "intention-totreat" analysis is straightforward to implement; that is, statistical analysis is conducted on all subjects that are enrolled in a study rather than those that complete all stages of study considered desirable. Some empirical studies had shown that certain methodological characteristics, such as poor concealment of treatment allocation or no blinding in studies exaggerate treatment effects 27 . Therefore, it is important to critically appraise the quality of studies in order to assess the risk of bias.

The study design, including details of the method of randomization of subjects to treatment groups, criteria for eligibility in the study, blinding, method of assessing the outcome, and handling of protocol deviations are important features defining study quality. When studies are excluded from a meta-analysis, reasons for exclusion should be provided for each excluded study. Usually, more than one assessor decides independently which studies to include or exclude, together with a well-defined checklist and a procedure that is followed when the assessors disagree. Two people familiar with the study topic perform the quality assessment for each study, independently. This is followed by a consensus meeting to discuss the studies excluded or included. Practically, the blinding of reviewers from details of a study such as authorship and journal source is difficult.

Before assessing study quality, a quality assessment protocol and data forms should be developed. The goal of this process is to reduce the risk of bias in the estimate of effect. Quality scores that summarize multiple components into a single number exist but are misleading and unhelpful 28 . Rather, investigators should use individual components of quality assessment and describe trials that do not meet the specified quality standards and probably assess the effect on the overall results by excluding them, as part of the sensitivity analyses.

Further, not all studies are completed, because of protocol failure, treatment failure, or other factors. Nonetheless, missing subjects and studies can provide important evidence. It is desirable to obtain data from all relevant randomized trials, so that the most appropriate analysis can be undertaken. Previous studies have discussed the significance of missing trials to the interpretation of intervention studies in medicine 29 , 30 . Journal editors and reviewers need to be aware of the existing bias toward publishing positive findings and ensure that papers that publish negative or even failed trials be published, as long as these meet the quality guidelines for publication.

There are occasions when authors of the selected papers have chosen different outcome criteria for their main analysis. In practice, it may be necessary to revise the inclusion criteria for a meta-analysis after reviewing all of the studies found through the search strategy. Variation in studies reflects the type of study design used, type and application of experimental and control therapies, whether or not the study was published, and, if published, subjected to peer review, and the definition used for the outcome of interest. There are no standardized criteria for inclusion of studies in meta-analysis. Universal criteria are not appropriate, however, because meta-analysis can be applied to a broad spectrum of topics. Published data in journal papers should also be cross-checked with conference papers to avoid repetition in presented data.

Clearly, unpublished studies are not found by searching the literature. It is possible that published studies are systemically different from unpublished studies; for example, positive trial findings may be more likely to be published. Therefore, a meta-analysis based on literature search results alone may lead to publication bias.

Efforts to minimize this potential bias include working from the references in published studies, searching computerized databases of unpublished material, and investigating other sources of information including conference proceedings, graduate dissertations and clinical trial registers.

Statistical analysis

The most common measures of effect used for dichotomous data are the risk ratio (also called relative risk) and the odds ratio. The dominant method used for continuous data are standardized mean difference (SMD) estimation. Methods used in meta-analysis for post hoc analysis of findings are relatively specific to meta-analysis and include heterogeneity analysis, sensitivity analysis, and evaluation of publication bias.

All methods used should allow for the weighting of studies. The concept of weighting reflects the value of the evidence of any particular study. Usually, studies are weighted according to the inverse of their variance 31 . It is important to recognize that smaller studies, therefore, usually contribute less to the estimates of overall effect. However, well-conducted studies with tight control of measurement variation and sources of confounding contribute more to estimates of overall effect than a study of identical size less well conducted.

One of the foremost decisions to be made when conducting a meta-analysis is whether to use a fixed-effects or a random-effects model. A fixed-effects model is based on the assumption that the sole source of variation in observed outcomes is that occurring within the study; that is, the effect expected from each study is the same. Consequently, it is assumed that the models are homogeneous; there are no differences in the underlying study population, no differences in subject selection criteria, and treatments are applied the same way 32 . Fixed-effect methods used for dichotomous data include most often the Mantel-Haenzel method 33 and the Peto method 34 (only for odds ratios).

Random-effects models have an underlying assumption that a distribution of effects exists, resulting in heterogeneity among study results, known as τ2. Consequently, as software has improved, random-effects models that require greater computing power have become more frequently conducted. This is desirable because the strong assumption that the effect of interest is the same in all studies is frequently untenable. Moreover, the fixed effects model is not appropriate when statistical heterogeneity (τ2) is present in the results of studies in the meta-analysis. In the random-effects model, studies are weighted with the inverse of their variance and the heterogeneity parameter. Therefore, it is usually a more conservative approach with wider confidence intervals than the fixed-effects model where the studies are weighted only with the inverse of their variance. The most commonly used random-effects method is the DerSimonian and Laird method 35 . Furthermore, it is suggested that comparing the fixed-effects and random-effect models developed as this process can yield insights to the data 36 .

Heterogeneity

Arguably, the greatest benefit of conducting metaanalysis is to examine sources of heterogeneity, if present, among studies. If heterogeneity is present, the summary measure must be interpreted with caution 37 . When heterogeneity is present, one should question whether and how to generalize the results. Understanding sources of heterogeneity will lead to more effective targeting of prevention and treatment strategies and will result in new research topics being identified. Part of the strategy in conducting a meta-analysis is to identify factors that may be significant determinants of subpopulation analysis or covariates that may be appropriate to explore in all studies.

To understand the nature of variability in studies, it is important to distinguish between different sources of heterogeneity. Variability in the participants, interventions, and outcomes studied has been described as clinical diversity, and variability in study design and risk of bias has been described as methodological diversity 10 . Variability in the intervention effects being evaluated among the different studies is known as statistical heterogeneity and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects varying by more than the differences expected among studies that would be attributable to random error alone. Usually, in the literature, statistical heterogeneity is simply referred to as heterogeneity.

Clinical variation will cause heterogeneity if the intervention effect is modified by the factors that vary across studies; most obviously, the specific interventions or participant characteristics that are often reflected in different levels of risk in the control group when the outcome is dichotomous. In other words, the true intervention effect will differ for different studies. Differences between studies in terms of methods used, such as use of blinding or differences between studies in the definition or measurement of outcomes, may lead to differences in observed effects. Significant statistical heterogeneity arising from differences in methods used or differences in outcome assessments suggests that the studies are not all estimating the same effect, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity indicates that studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this may not always be the case.

The scope of a meta-analysis will largely determine the extent to which studies included in a review are diverse. Meta-analysis should be conducted when a group of studies is sufficiently homogeneous in terms of subjects involved, interventions, and outcomes to provide a meaningful summary. However, it is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. Combining studies that differ substantially in design and other factors can yield a meaningless summary result, but the evaluation of reasons for the heterogeneity among studies can be very insightful. It may be argued that these studies are of intrinsic interest on their own, even though it is not appropriate to produce a single summary estimate of effect.

Variation among k trials is usually assessed using Cochran's Q statistic, a chi-squared (χ 2 ) test of heterogeneity with k-1 degrees of freedom. This test has relatively poor power to detect heterogeneity among small numbers of trials; consequently, an α-level of 0.10 is used to test hypotheses 38 , 39 .

Heterogeneity of results among trials is better quantified using the inconsistency index I 2 , which describes the percentage of total variation across studies 40 . Uncertainty intervals for I 2 (dependent on Q and k) are calculated using the method described by Higgins and Thompson 41 . Negative values of I 2 are put equal to zero, consequently I 2 lies between 0 and 100%. A value >75% may be considered substantial heterogeneity 41 . This statistic is less influenced by the number of trials compared with other methods used to estimate the heterogeneity and provides a logical and readily interpretable metric but it still can be unstable when only a few studies are combined 42 .

Given that there are several potential sources of heterogeneity in the data, several steps should be considered in the investigation of the causes. Although random-effects models are appropriate, it may be still very desirable to examine the data to identify sources of heterogeneity and to take steps to produce models that have a lower level of heterogeneity, if appropriate. Further, if the studies examined are highly heterogeneous, it may be not appropriate to present an overall summary estimate, even when random effects models are used. As Petiti notes 43 , statistical analysis alone will not make contradictory studies agree; critically, however, one should use common sense in decision-making. Despite heterogeneity in responses, if all studies had a positive point direction and the pooled confidence interval did not include zero, it would not be logical to conclude that there was not a positive effect, provided that sufficient studies and subject numbers were present. The appropriateness of the point estimate of the effect is much more in question.

Some of the ways to investigate the reasons for heterogeneity; are subgroup analysis and meta-regression. The subgroup analysis approach, a variation on those described above, groups categories of subjects (e.g., by age, sex) to compare effect sizes. The meta-regression approach uses regression analysis to determine the influence of selected variables (the independent variables) on the effect size (the dependent variable). In a meta-regresregression, studies are regarded as if they were individual patients, but their effects are properly weighted to account for their different variances 44 .

Sensitivity analyses have also been used to examine the effects of studies identified as being aberrant concerning conduct or result, or being highly influential in the analysis. Recently, another method has been proposed that reduces the weight of studies that are outliers in meta-analyses 45 . All of these methods for examining heterogeneity have merit, and the variety of methods available reflects the importance of this activity.

Presentation of results

A useful graph, presented in the PRISMA statement 11 , is the four-phase flow diagram ( Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is hippokratia-14-33-g001.jpg

This flow-diagram depicts the flow of information through the different phases of a systematic review or meta-analysis. It maps out the number of records identified, included and excluded, and the reasons for exclusions. The results of meta-analyses are often presented in a forest plot, where each study is shown with its effect size and the corresponding 95% confidence interval ( Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is hippokratia-14-34-g001.jpg

The pooled effect and 95% confidence interval is shown in the bottom in the same line with "Overall". In the right panel of Figure 4 , the cumulative meta-analysis is graphically displayed, where data are entered successively, typically in the order of their chronological appearance 46 , 47 . Such cumulative meta-analysis can retrospectively identify the point in time when a treatment effect first reached conventional levels of significance. Cumulative meta-analysis is a compelling way to examine trends in the evolution of the summary-effect size, and to assess the impact of a specific study on the overall conclusions 46 . The figure shows that many studies were performed long after cumulative meta-analysis would have shown a significant beneficial effect of antibiotic prophylaxis in colon surgery.

Biases in meta-analysis

Although the intent of a meta-analysis is to find and assess all studies meeting the inclusion criteria, it is not always possible to obtain these. A critical concern is the papers that may have been missed. There is good reason to be concerned about this potential loss because studies with significant, positive results (positive studies) are more likely to be published and, in the case of interventions with a commercial value, to be promoted, than studies with non-significant or "negative" results (negative studies). Studies that produce a positive result, especially large studies, are more likely to have been published and, conversely, there has been a reluctance to publish small studies that have non-significant results. Further, publication bias is not solely the responsibility of editorial policy as there is reluctance among researchers to publish results that were either uninteresting or are not randomized 48 . There are, however, problems with simply including all studies that have failed to meet peer-review standards. All methods of retrospectively dealing with bias in studies are imperfect.

It is important to examine the results of each meta-analysis for evidence of publication bias. An estimation of likely size of the publication bias in the review and an approach to dealing with the bias is inherent to the conduct of many meta-analyses. Several methods have been developed to provide an assessment of publication bias; the most commonly used is the funnel plot. The funnel plot provides a graphical evaluation of the potential for bias and was developed by Light and Pillemer 49 and discussed in detail by Egger and colleagues 50 , 51 . A funnel plot is a scatterplot of treatment effect against a measure of study size. If publication bias is not present, the plot is expected to have a symmetric inverted funnel shape, as shown in Figure 5A .

An external file that holds a picture, illustration, etc.
Object name is hippokratia-14-35-g001.jpg

In a study in which there is no publication bias, larger studies (i.e., have lower standard error) tend to cluster closely to the point estimate. As studies become less precise, such as in smaller trials (i.e., have a higher standard error), the results of the studies can be expected to be more variable and are scattered to both sides of the more precise larger studies. Figure 5A shows that the smaller, less precise studies are, indeed, scattered to both sides of the point estimate of effect and that these seem to be symmetrical, as an inverted funnel-plot, showing no evidence of publication bias. In contrast to Figure 5A , Figure 5B shows evidence of publication bias. There is evidence of the possibility that studies using smaller numbers of subjects and showing an decrease in effect size (lower odds ratio) were not published.

Asymmetry of funnel plots is not solely attributable to publication bias, but may also result from clinical heterogeneity among studies. Sources of clinical heterogeneity include differences in control or exposure of subjects to confounders or effect modifiers, or methodological heterogeneity between studies; for example, a failure to conceal treatment allocation. There are several statistical tests for detecting funnel plot asymmetry; for example, Eggers linear regression test 50 , and Begg's rank correlation test 52 but these do not have considerable power and are rarely used. However, the funnel plot is not without problems. If high precision studies really are different than low precision studies with respect to effect size (e.g., due different populations examined) a funnel plot may give a wrong impression of publication bias 53 . The appearance of the funnel plot plot can change quite dramatically depending on the scale on the y-axis - whether it is the inverse square error or the trial size 54 .

Other types of biases in meta-analysis include the time lag bias, selective reporting bias and the language bias. The time lag bias arises from the published studies, when those with striking results are published earlier than those with non-significant findings 55 . Moreover, it has been shown that positive studies with high early accrual of patients are published sooner than negative trials with low early accrual 56 . However, missing studies, either due to publication bias or time-lag bias may increasingly be identified from trials registries.

The selective reporting bias exists when published articles have incomplete or inadequate reporting. Empirical studies have shown that this bias is widespread and of considerable importance when published studies were compared with their study protocols 29 , 30 . Furthermore, recent evidence suggests that selective reporting might be an issue in safety outcomes and the reporting of harms in clinical trials is still suboptimal 57 . Therefore, it might not be possible to use quantitative objective evidence for harms in performing meta-analyses and making therapeutic decisions.

Excluding clinical trials reported in languages other than English from meta-analyses may introduce the language bias and reduce the precision of combined estimates of treatment effects. Trials with statistically significant results have been shown to be published in English 58 . In contrast, a later more extensive investigation showed that trials published in languages other than English tend to be of lower quality and produce more favourable treatment effects than trials published in English and concluded that excluding non-English language trials has generally only modest effects on summary treatment effect estimates but the effect is difficult to predict for individual meta-analyses 59 .

Evolution of meta-analyses

The classical meta-analysis compares two treatments while network meta-analysis (or multiple treatment metaanalysis) can provide estimates of treatment efficacy of multiple treatment regimens, even when direct comparisons are unavailable by indirect comparisons 60 . An example of a network analysis would be the following. An initial trial compares drug A to drug B. A different trial studying the same patient population compares drug B to drug C. Assume that drug A is found to be superior to drug B in the first trial. Assume drug B is found to be equivalent to drug C in a second trial. Network analysis then, allows one to potentially say statistically that drug A is also superior to drug C for this particular patient population. (Since drug A is better than drug B, and drug B is equivalent to drug C, then drug A is also better to drug C even though it was not directly tested against drug C.)

Meta-analysis can also be used to summarize the performance of diagnostic and prognostic tests. However, studies that evaluate the accuracy of tests have a unique design requiring different criteria to appropriately assess the quality of studies and the potential for bias. Additionally, each study reports a pair of related summary statistics (for example, sensitivity and specificity) rather than a single statistic (such as a risk ratio) and hence requires different statistical methods to pool the results of the studies 61 . Various techniques to summarize results from diagnostic and prognostic test results have been proposed 62 – 64 . Furthermore, there are many methodologies for advanced meta-analysis that have been developed to address specific concerns, such as multivariate meta-analysis 65 – 67 , and special types of meta-analysis in genetics 68 but will not be discussed here.

Meta-analysis is no longer a novelty in medicine. Numerous meta-analyses have been conducted for the same medical topic by different researchers. Recently, there is a trend to combine the results of different meta-analyses, known as a meta-epidemiological study, to assess the risk of bias 79 , 70 .

Conclusions

The traditional basis of medical practice has been changed by the use of randomized, blinded, multicenter clinical trials and meta-analysis, leading to the widely used term "evidence-based medicine". Leaders in initiating this change have been the Cochrane Collaboration who have produced guidelines for conducting systematic reviews and meta-analyses 10 and recently the PRISMA statement, a helpful resource to improve reporting of systematic reviews and meta-analyses has been released 11 . Moreover, standards by which to conduct and report meta-analyses of observational studies have been published to improve the quality of reporting 71 .

Meta-analysis of randomized clinical trials is not an infallible tool, however, and several examples exist of meta-analyses which were later contradicted by single large randomized controlled trials, and of meta-analyses addressing the same issue which have reached opposite conclusions 72 . A recent example, was the controversy between a meta-analysis of 42 studies 73 and the subsequent publication of the large-scale trial (RECORD trial) that did not support the cardiovascular risk of rosiglitazone 74 . However, the reason for this controversy was explained by the numerous methodological flaws found both in the meta-analysis and the large clinical trial 75 .

No single study, whether meta-analytic or not, will provide the definitive understanding of responses to treatment, diagnostic tests, or risk factors influencing disease. Despite this limitation, meta-analytic approaches have demonstrable benefits in addressing the limitations of study size, can include diverse populations, provide the opportunity to evaluate new hypotheses, and are more valuable than any single study contributing to the analysis. The conduct of the studies is critical to the value of a meta-analysis and the methods used need to be as rigorous as any other study conducted.

  • Open access
  • Published: 12 April 2024

Risk of conversion to mild cognitive impairment or dementia among subjects with amyloid and tau pathology: a systematic review and meta-analysis

  • Zsolt Huszár 1 , 2 ,
  • Marie Anne Engh 1 ,
  • Márk Pavlekovics 1 , 3 ,
  • Tomoya Sato 1 ,
  • Yalea Steenkamp 1 ,
  • Bernard Hanseeuw 4 , 5 ,
  • Tamás Terebessy 1 ,
  • Zsolt Molnár 1 , 6 , 7 ,
  • Péter Hegyi 1 , 8 , 9 , 10 &
  • Gábor Csukly 1 , 2  

Alzheimer's Research & Therapy volume  16 , Article number:  81 ( 2024 ) Cite this article

Metrics details

Measurement of beta-amyloid (Aβ) and phosphorylated tau (p-tau) levels offers the potential for early detection of neurocognitive impairment. Still, the probability of developing a clinical syndrome in the presence of these protein changes (A+ and T+) remains unclear. By performing a systematic review and meta-analysis, we investigated the risk of mild cognitive impairment (MCI) or dementia in the non-demented population with A+ and A- alone and in combination with T+ and T- as confirmed by PET or cerebrospinal fluid examination.

A systematic search of prospective and retrospective studies investigating the association of Aβ and p-tau with cognitive decline was performed in three databases (MEDLINE via PubMed, EMBASE, and CENTRAL) on January 9, 2024. The risk of bias was assessed using the Cochrane QUIPS tool. Odds ratios (OR) and Hazard Ratios (HR) were pooled using a random-effects model. The effect of neurodegeneration was not studied due to its non-specific nature.

A total of 18,162 records were found, and at the end of the selection process, data from 36 cohorts were pooled ( n = 7,793). Compared to the unexposed group, the odds ratio (OR) for conversion to dementia in A+ MCI patients was 5.18 [95% CI 3.93; 6.81]. In A+ CU subjects, the OR for conversion to MCI or dementia was 5.79 [95% CI 2.88; 11.64]. Cerebrospinal fluid Aβ42 or Aβ42/40 analysis and amyloid PET imaging showed consistent results. The OR for conversion in A+T+ MCI subjects (11.60 [95% CI 7.96; 16.91]) was significantly higher than in A+T- subjects (2.73 [95% CI 1.65; 4.52]). The OR for A-T+ MCI subjects was non-significant (1.47 [95% CI 0.55; 3.92]). CU subjects with A+T+ status had a significantly higher OR for conversion (13.46 [95% CI 3.69; 49.11]) than A+T- subjects (2.04 [95% CI 0.70; 5.97]). Meta-regression showed that the ORs for Aβ exposure decreased with age in MCI. (beta = -0.04 [95% CI -0.03 to -0.083]).

Conclusions

Identifying Aβ-positive individuals, irrespective of the measurement technique employed (CSF or PET), enables the detection of the most at-risk population before disease onset, or at least at a mild stage. The inclusion of tau status in addition to Aβ, especially in A+T+ cases, further refines the risk assessment. Notably, the higher odds ratio associated with Aβ decreases with age.

Trial registration

The study was registered in PROSPERO (ID: CRD42021288100).

Affecting 55 million people worldwide, dementia is one of the leading causes of years spent with disability and one of the costliest long-term illnesses in society. The most common cause of dementia is Alzheimer's disease (AD), responsible for 60-80% of cases [ 1 , 2 ].

Two specific protein aggregates play a crucial role in the pathophysiology of AD. One is the amyloid plaque formation in the extracellular space, predominantly by Aβ aggregation. These plaques, among other pathological effects, inhibit the signaling function of neurons [ 3 ]. The other protein change is the appearance of neurofibrillary tangles within the neurons, which are formed by the phosphorylation of tau proteins (p-tau) and inhibit the axonal transport inside the cell [ 4 ]. Whereas the specific pathology could only be confirmed by autopsy in the past, in vivo tests are available today. Parallelly to this development, the diagnostic definitions of AD have evolved significantly over time, moving from purely clinical assessments and post-mortem examinations to the integration of in vivo amyloid and later p-tau biomarkers, emphasizing the role of preclinical stages [ 5 , 6 , 7 , 8 ]. Accordingly, researchers are increasingly trying to link the diagnosis of the disease to biological parameters. However, in general, the clinical practice only considers the quality of the symptoms of dementia and the fact of neurodegeneration confirmed by radiology when establishing an AD diagnosis.

The International Working Group (IWG) [ 5 ] emphasizes that diagnosis should align with clinical symptoms. However, for researchers in the field, the U.S. National Institute on Aging – Alzheimer’s Association (NIA-AA) has issued a new framework recommendation [ 6 ]. This recommendation defines AD purely in terms of specific biological changes based on the Aβ (A) and p-tau (T) protein status, while neurodegeneration (N) is considered a non-specific marker that can be used for staging. In the recommendation, the category ‘Alzheimer’s disease continuum’ is proposed for all A+ cases, ‘Alzheimer’s pathological changes’ for A+T- cases, and ‘Alzheimer’s disease’ for A+T+ cases. A-(TN)+ cases are classified as ‘non-Alzheimer pathological changes’.

Aβ and p-tau proteins have long been known to be associated with AD development, and their accumulation can begin up to 15-20 years before the onset of cognitive symptoms [ 9 ]. Pathological amyloid changes are highly prevalent in dementia: 88% of those clinically diagnosed with AD and between 12 and 51% of those with non-AD are A+, according to a meta-analysis [ 10 ]. At the same time, the specificity of the abnormal beta-amyloid level for AD and its central role in its pathomechanism have been questioned [ 11 ]. Their use as a preventive screening target is a subject of ongoing discourse [ 12 ]. Yet it is still unclear to what extent their presence accelerates cognitive decline. What are the predictive prospects for an individual with abnormal protein levels who is otherwise cognitively healthy or with only mild cognitive impairment (MCI), meaning cases where there is a detectable decline in cognitive ability with maintained ability to perform most activities of daily living independently? [ 13 ] Research on non-demented populations shows substantial variation; for example, studies have shown OR values for conversion to dementia ranging from 2.25 [95% CI 0.71; 7.09] [ 14 ] to 137.5 [95% CI 17.8; 1059.6] [ 15 ]. Comparing conversion data systematically is necessary to provide a clearer picture.

In the CU population over 50 years, the prevalence of being A+ ranges from 10 to 44%, while in MCI it ranges from 27 to 71%, depending on age. Taking this into consideration [ 16 ], we aim to investigate the effect of Aβ alone and in combination with p-tau on the conversion to MCI and dementia, through a systematic review and meta-analysis of the available literature. Knowing the prognostic effect can highlight the clinical potential of this current research framework, given that, at present, the therapy of MCI or dementia can only slow down the decline. Prevention starting at an early stage or even before symptoms appear, provides the best chance against the disease.

Study registration

Our study was registered in the PROSPERO database (ID: CRD42021288100), with a pre-defined research plan and detailed objectives, is reported strictly in accordance with the recommendation of the PRISMA 2020 guideline and was performed following the guidance of the Cochrane Handbook [ 17 ].

We aimed to determine the change in odds of progression to MCI or dementia among non-demented subjects based on abnormal Aβ levels alone, or in combination with abnormal p-tau levels.

Search and selection

We included longitudinal prospective and retrospective studies that used the NIA-AA 2018 recommended measurement of Aβ and p-tau (for Aβ: amyloid PET, CSF Aβ42, or Aβ42/40 ratio; for p-tau: tau PET, or CSF p-tau) and investigated the role of Aβ and +/- p-tau in CU and MCI subjects in progression to MCI or dementia. Case reports and case series were excluded. Overlapping populations were taken into account during the data extraction. Our search key was run in the Medline, Embase, and Central databases on 31 October 2021, and the search was updated on 9 January 2024 (see Supplementary Material, Appendix 1 ). After removing duplicates, we screened publications by title and abstract, and in the second round by full text. Two independent reviewers conducted the selection (ZH, MP), and a third reviewer (GC) resolved disagreements. The degree of the agreement was quantified using Cohen’s kappa statistics at each selection stage.

As part of the selection process, articles that only examined the ADNI database [ 18 ] were excluded, as patient-level data were used instead (see Supplementary Material Appendix 2 for details of the patient-level data analysis of the ADNI).

A standardized Excel (Microsoft Corporation, Redmond, Washington, USA) document sheet was used for data extraction (for one special case of data extraction see Supplementary Material Appendix 3 ). Where data were available in graphical form only, we used an online software (Plot Digitizer) [ 19 , 20 ]. The following data were extracted: source of data used in the studies (place of clinical trial or name of database), baseline characteristics of the population (age, gender, APOE status, and education level), type of exposure (Aβ, p-tau, and neurodegeneration), measurement technique of the exposure, data on cognitive impairment separately for the different exposure groups).

Data synthesis

Generally, where several studies used the same population sample or cohort, only data from the study with the largest sample size were used. Conversion to Alzheimer’s dementia and to unspecified dementia was assessed together, as the definition of Alzheimer’s dementia varied between the studies, and the diagnosis was based on neurocognitive tests. If conversion to both types of dementia was given, the value of the conversion to unspecified dementia was used. The population with subjective cognitive symptoms was scored jointly with the CU population, as these subpopulations could not be differentiated objectively.

Odds ratio and hazard ratio values were used or calculated based on the available information (for details on the methodology, see Supplementary Material Appendix 4 ). Considering that studies report their results on different age groups, a meta-regression analysis was performed to investigate how age affects the likelihood of developing dementia based on Aβ levels.

Studies applied different analysis methods to identify Aβ positivity. Where multiple amyloid categories were being considered, the preferred method was amyloid PET. When relying on CSF analysis, the Aβ42/40 ratio was given precedence over Aβ42 since the 42/40 ratio has a higher concordance with amyloid PET [ 21 ]. To estimate the confounding effect caused by different amyloid measurement techniques a subgroup analysis was performed. For the assessment of p-tau, studies measured p-tau181 levels from CSF samples, or employed tau PET. While there is also a limited number of tau PET measurements in the ADNI, in order to ensure consistency in the analyses, we used exclusively the CSF p-tau181 levels from the ADNI database.

For the OR analysis, studies with varying follow-up times were pooled. To estimate the resulting bias, a meta-regression analysis was performed to explore how follow-up time affected the results.

Statistical analysis

Statistical analyses were performed in the R programming environment (version 4.1.2) using the “meta” software package version 5.2-0. To visualize synthesized data, we used forest plots showing ORs or HRs and corresponding confidence intervals for each individual study and pooled effect sizes in terms of ORs and HRs. For dichotomous outcomes, odds ratios and hazard ratios with 95% confidence intervals (CI) were used as effect measures. To calculate odds ratios, the total number of patients in each study and the number of patients with the event of interest in each group were extracted from each study. Raw data from the selected studies were pooled using a random-effects model with the Mantel-Haenszel method [ 22 , 23 , 24 ]. The random-effects model was used as we assumed that the true effect would vary between studies due to differences in demographics and clinical measures, such as age or baseline cognitive impairment.

Heterogeneity was assessed by calculating I 2 , tau 2 , and the prediction interval. I 2 is defined as the percentage of variability in the effect size that is not caused by sampling error, whereas tau 2 is the square root of the standard deviation of the true effect size. As I 2 is heavily dependent on the precision of the studies and tau 2  is sometimes hard to interpret (as it is insensitive to the number of the studies and their precision), the prediction interval has also been calculated. The great advantage of the prediction interval is that this measure is easy to interpret: if the interval does not include zero, further studies are expected to show a similar result.

Sensitivity analysis

We performed outlier detection according to Viechtbauer et al. [ 25 ]. A study is considered an outlier if the confidence interval of the study does not overlap with the confidence interval of the pooled effect. The idea behind is to detect effect sizes that differ significantly from the overall effect. As a sensitivity analysis, we repeated the analyses after removing any outliers and then we compared the pooled effects before and after the exclusion, in order to detect if outliers would have a substiantial impact on the overall effect.

Risk of bias assement

The risk of bias was assessed according to the recommendation of the Cochrane Collaboration; using the QUIPS tool [ 26 ], two investigators (ZH and YS) independently assessed the quality of the studies, and a third author solved disagreements. Publication bias was examined using the Peter’s regression test [ 27 ] and visual inspection of the adjusted Funnel-plots.

Search results

During the systematic search (Fig. 1 ), 18,162 records were found, and finally, 46 eligible articles were obtained (Supplementary Material eTable 1 ); While some of the articles analyzed the same cohorts, we were able to pool data from 36 different cohorts or centres. The Cohens’s kappa was 0.91 for the title and abstract, and 0.86 for the full-text selection. Given the amount of data found, we decided to examine the targeted outcomes separately and focus only on the conversion data in this report.

figure 1

PRISMA flowchart of selection. Flowchart of the study screening process following the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) 2020 statement

The investigated studies expressed their results in different ways. They calculated unadjusted or adjusted hazard ratios or presented the number of conversions for the different follow-up periods. In the latter case, we calculated odds ratios for the defined time periods. The measured exposures also differed: data were given only for Aβ or in combination with p-tau or neurodegeneration. There were also differences in the techniques used to measure exposure, with CSF sample being used in some cases and PET scan in others.

During data extraction, one [ 28 ] article was excluded because of inconsistently reported conversion data, and four [ 15 , 29 , 30 , 31 ] were excluded from the A/T analysis because the definition of the pathologic Aβ and p-tau was based on Aβ/p-tau ratio, which did not comply with the NIA-AA 2018 recommendation.

The eligible studies investigated three groups: CU, MCI, and mixed - in which the results were collectively expressed for both the MCI and CU groups. The CU group comprised either cognitively healthy subjects or individuals with only subjective cognitive complaints. To define the MCI group, all studies followed the Petersen criteria [ 32 ]. Four studies examined mixed groups. Since all of them studied large samples ( n >180), it was considered more valuable to jointly analyze them with MCI, since the outcome was also the conversion to dementia. As a result of the joint analysis, our findings are based on a substantially larger sample. To support this decision, we performed a subgroup analysis comparing the Aβ positive MCI and mixed population studies. The OR differed significantly from the unexposed group in both the MCI (OR 5.83 [3.80; 8.93]) and the mixed (4.64 [95% CI 1.16; 18.61]) subgroups, and there was no significant difference between the two subgroups ( p =0.55) (Supplementary Material eFigure 1 ).

Conversion from MCI to dementia

Aβ exposition - in or.

Based on a mixed model meta-analysis of 3,576 subjects (Table 1 ), we observed a significant association between Aβ positivity and higher conversion rates. Compared to the unexposed, the OR for conversion to dementia in the amyloid positives were 5.18 [95% CI 3.93; 6.81]; t(21)=12.47; ( p <0.0001). The I 2 - test for heterogeneity revealed that 44.8% of the variance across studies was due to heterogeneity (Fig. 2 A). As a result of the outlier detection we excluded the Balassa study and found a very similar overall effect and a reduced heterogeneity (5.05 [95% CI 3.98; 6.40]; t(20) = 14.2; p < 0.0001; I 2 = 31.4%). Meta-regression analysis of mean age showed a statistically significant decrease in OR values with increasing age (R 2 = 59.05%, beta = -0.04, SE = 0.019, [95% CI = -0.03 to -0.083], df = 18, t = -2.27, p = 0.036) (Fig. 2 B). The Hartunk-Knapp method was applied to adjust test statistics and confidence intervals to reduce the risk of false positives.

figure 2

Conversion of Aβ exposed MCI groups to dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  OR for Aβ exposition; B  meta-regression of age and ORs for conversion regarding Aβ exposure. The size of the circle is proportional to the weight of each study in the meta-analysis. The line corresponds to meta-regression with age as covariate, and beta represents the slope of ORs by mean age

Beta-amyloid was determined by CSF Aβ42, CSF Aβ42/40 ratio or amyloid PET. When the three groups were compared in a subgroup analysis, the OR was 5.87 (2.83; 12.19) for CSF Aβ42, 5.00 (3.31; 7.55) for CSF Aβ42/40 ratio, and 5.32 (2.53; 11.18) for amyloid PET. The difference between the subgroups was not significant ( p =0.88) (Supplementary Material eFigure 2 ).

The meta-regression analysis performed to examine the role of follow-up time showed no association with respect to the ORs (R 2 = 0%, beta = -0.002, SE = 0.07, [95% CI = -0.02 - 0.01], df = 11, p = 0.77) (Supplementary Material eFigure 3 A).

We used a funnel plot to examine publication bias (Supplementary Material eFigure 4 A). Most of the studies with large sample sizes lie close to the midline, which confirms that the pooled effect size seems valid. However, the visual inspection of the plot raised the possibility of some publication bias in two ways: (1) Studies in the bottom right corner of the plot have significant results despite having large standard errors (2) The absence of studies in the bottom left corner (blank area in the figure) may indicate that studies with nonsignificant results were not published. In order to quantify funnel plot asymmetry, the Peter’s regression test was applied. The test results were not significant ( t = 1.7, df = 20, p = 0.11) so no asymmetry was proven in the funnel plot.

The effect of Aβ exposition in terms of HR

Several studies reported their results in HRs instead of or in addition to ORs (Supplementary Material eTable 2 ). The advantage of the HR value is that this measure is independent of the length of follow-up times of the studies. For these reasons, we also considered it important to analyze the results expressed in HR. Based on pooled data of patients studied ( n =1,888), the HR for conversion to dementia was 3.16 [95% CI 2.07; 4.83], p < 0.001 (Fig. 3 A).

figure 3

Conversion of Aβ exposed MCI groups to dementia in HR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  HR for Aβ exposition; B  sub-group analysis of studies with adjusted and unadjusted HR values

To investigate the effect of adjustment, we conducted a subgroup analysis between the unadjusted and adjusted measurements. Although there was a trend for higher unadjusted HR values compared to the adjusted HRs, the difference did not reach statistical significance (unadjusted HR : 5.07 [95% CI 2.77 - 9.26], adjusted HR 2.86 [95% CI 1.70 - 4.83] p =0.055) (Fig. 3 B). We could not analyze HR in the A+T-, A+T+, and A-T+ subgroups, due to the low number of available studies.

The effect of Aβ and p-tau exposition in terms of OR

We examined the combined effect of p-tau and Aβ (Table 2 ), and compared A+T+, A+T-, and A-T+ exposures to A-T-. Based on pooled data for patients studied (n=1,327), the OR for conversion to dementia in A+T- was 2.73 [95% CI 1.65; 4.52], and the odds ratio was significantly higher in the presence of both exposures (A+T+) ( p <0.001), with an OR of 11.60 [95% CI 7.96; 16.91]. The effect of A-T+ exposure on conversion was not significant (OR: 1.47 [0.55; 3.92]) (Fig. 4 A).

figure 4

Conversion of Aβ and p-tau exposed MCI groups to dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares’ area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  Aβ and p-tau expositions in OR; B  sub-group analysis of comparisons between the A+T+ and A+T- groups; C  sub-group analysis of comparisons between the A+T- and A-T+ groups

Subgroup analyses showed that the A+T+ group had a significantly higher odds of conversion compared to the A+T- group ( p <0.001), while the A+T- and A-T+ groups did not differ significantly ( p =0.15) (Fig. 4 B and C).

Conversion from CU to MCI or dementia

The effect of aβ exposition in terms of or.

Analyses on the CU population ( n = 4,217) yielded very similar results to the MCI sample. The OR for conversion to MCI or dementia was 5.79 [95% CI 2.88; 11.64] (t(13) = 5.43; p = 0.0001), the results of the studies did however show a high degree of heterogeneity (I 2 = 73% [55%; 84%]) (Table 3 , Fig. 5 A). As a result of the outlier detection we removed the Aruda study and found a very similar overall effect (6.33 [95% CI 3.42; 11.71]; t(12) = 6.54; p < 0.0001; I 2 = 72.1%).

figure 5

Conversion of Aβ and p-tau exposed CU groups to MCI or dementia in OR. The squares and bars represent the mean values and 95% CIs of the effect sizes, and the squares' area reflects the weight of the studies. Diamonds represent the combined effects, and the vertical dotted line represents the line of no association. A  Aβ exposition in OR. B  Aβ and p-tau expositions in OR

Meta-regression analysis of mean age did not show a significant association with OR. (R 2 = 8.22%, beta = -0.05, SE = 0.05, [95% CI = -0.17 – 0.7], df = 11, t =, p = 0.37).

Meta-regression analysis also showed no association between follow-up time and ORs (R 2 = 0.35%, beta = -0.014, SE = 0.024, [95% CI = -0.07 - 0.04], df = 8, p = 0.58) (Supplementary Material eFigure 3 B).

We applied a funnel plot to examine publication bias (Supplementary Material eFigure 4 B).Most of the studies with large sample sizes lie close to the midline, which reaffirms the pooled effect size’s validity. In order to quantify funnel plot asymmetry, Peter’s regression test was applied. The test results were not significant ( t = 0.9, df = 12, p = 0.31) indicating that no asymmetry was demonstrated in the funnel plot.

Four cohorts provided HRs for the CU population ( n =2700) with one cohort (ADNI) representing the 55.3% of the total sample (weight: 78.5%) (Supplementary Material eTable 3 ). The pooled HR for conversion was 2.33 [95% CI 1.88; 2.88] ( p =0.001) (Supplementary Material eFigure 5 )

The combined effect of Aβ and p-tau exposition in terms of OR

Using data from a total of 2228 subjects, we investigated the effect of p-tau in combination with Aβ (Table 4 ) in the CU population. The OR for conversion is 2.04 [95% CI 0.70; 5.97] for A+T-, and 13.46 [95% CI 3.69; 49.11] for the A+T+, compared to the A-T- group The OR shows a trend level increased risk (t=2.1, P =0.12) for the A+T- group compared to the A-T- group.

Similarly to the MCI population, subgroup analyses showed that the A+T+ group had significantly higher OR for conversion compared to the A+T- group ( p <0.01). The analysis could not be performed for A-T+ due to the low number of these cases.

Risk of bias assessment

The risk of bias was assessed separately for the analyses discussed above. The overall risk of the studies ranged from low to moderate, except in three cases: twice we found a high risk of bias due to attrition of above 50% [ 59 , 60 ], and once due to a focus on monozygotic twins [ 61 ] (Supplementary Material, eFigure 6 ). These articles ( n =197) were excluded from all analyses.

Summary and context

A pathological Aβ state are strongly correlated with the risk of clinical progression. The odds ratio for conversion is 5.18 in the MCI population and 5.79 in the CU population. Therefore, measuring Aβ levels alone can identify a population at high risk. The OR for conversion to dementia differs significantly between the A+T+ and A+T- groups in both the MCI and CU populations: while the OR is 2.73 [95% CI 1.65; 4.52] for MCI and 2.04 [95% CI 0.70; 5.97] for CU subjects in the A+T- group, it increases to 11.60 [95% CI 7.96; 16.91] for MCI and 14.67 [95% CI 3.69; 49.11] for CU in the A+T+ group. Note that in the case of A+T- at CU population, only a trend-level statistical correlation is visible.

The results of the meta-regression show a decrease in OR with mean age (Fig. 2 B). Based on this result it seems that the impact of Amyloid positivity on conversion is decreasing with age. The fact that age is a risk factor for dementia and vascular and other neurodegenerative damage are more frequent in elderly age is a possible explanation to this finding. Our findings combined with the results of Rodrigue et al. [ 62 ] suggests that amyloid burden increases with age, while its impact on conversion rates slightly decreases with age.

The appearance of Aβ is assumed to be one of the earliest signs of AD [ 63 , 64 ]. Our results fit into this picture by showing that only the A+T+ and A+T- groups showed an increased risk for conversion compared to A-T-, the A-T+ group did not. Thus, Aβ alone is suitable for detecting the population at risk, while p-tau alone is not as effective in the prediction conversion. Our result is in line with previous studies showing that the A-T+ group has a weaker association with cognitive decline compared to the A+T- or A+T+ groups [ 65 , 66 ]. However, it is important to emphasize that previous results showing that T+ status is closely associated with neurodegeneration and the A-T+ group is related to frontotemporal dementia [ 67 ]. More research is needed to fully explain the significance of the A-T+ group.

The PET scan is known to be a more sensitive tool for detecting Amyloid positivity compared to CSF sampling [ 68 ]. However, from a prognostic point of view, our results did not show a significant difference ( p =0.73) between PET measurements (OR: 6.02) and the more cost-effective but invasive CSF Aβ42 measurements (OR: 5.11). It is important to note here that the present meta-analysis is underpowered for detecting prognostic differences between these methods. Due to the heterogeneity among studies, the impact of confounding factors, and standardised studies are required to evaluate the comparative prognostic value of these biomarkers accurately.

Our results based on ORs are further strengthened by the HR analyses giving similar results for Aβ exposure in the MCI (HR: 3.16) and CU (HR: 2.33) populations. It should be noted that in the HR analysis of the CU group, ADNI accounts for 78.5% of the weight, which is a limitation of this meta-analysis. This disproportionate representation may affect the overall result. Regarding the statistical trend-level association with a higher unadjusted HR, it should be noted that in the presence of a random distribution of other risk factors (e.g. baseline MMSE score or educational level), the unadjusted value may overestimate the HR. As in the case of a non-random distribution, the adjusted value underestimates the HR. With this in mind, we recommend reporting both values in the future.

Our analyses were performed on CU and MCI populations. Including mixed populations with the MCI population was a practical simplification, as several studies with a large number of cases gave their results combining MCI subjects with CU subjects, and we aimed to answer the set of questions based on the largest population. To investigate the potential bias of this method, we performed subgroup analysis comparing the mixed and MCI populations, and the result was not significant. The Aβ OR based on the mixed-only group is 4.64 [95% CI 1.16; 18.61], and the OR calculated on the MCI-only studies is 5.83 [95% CI 3.80; 8.93]. Thus, the inclusion of the mixed population in the pool decreases the OR of the main analysis (5.21 [95% CI 3.93; 6.90]) slightly (Supplementary Material eFigure 1 ).

Strengths and limitations

There are several limitations to consider when interpreting our results. The study populations differ in several aspects; for cognitive status, the population ranges from those with no cognitive symptoms through those with subjective cognitive symptoms (these two groups were considered CU) to MCI groups. Therefore, the distance from the cognitive state corresponding to MCI or dementia also varies. Due to the different cut-offs used in the studies, subjects with grey area scores may oscillate between A- and A+ groups, increasing heterogeneity. Our study could not examine the role of other risk factors such as education, cardiovascular status, obesity, diabetes, depression, social and physical activity [ 69 ], or genetic status [ 70 , 71 ], which may also contribute to heterogeneity. Furthermore, there is a considerable heterogeneity by mean age, and our meta-regression analysis of MCI group showed a significant decreasing effect of mean age on ORs.

In the OR analysis of Aβ in the CU group, in the context of the outlier value of the Arruda study, the possibility of a statistical extreme value can be assumed due to the small number of A+ subjects and the much larger A- group. Similarly, in the case of the Grontvedt [ 14 ] and Hanseeuw [ 41 ] studies, which show exceptionally high values, the A+ and A- groups show a similar uneven distribution. Similarly, the outliers in the MCI amyloid OR analysis are also associated with small sample sizes. For the Aβ HR analysis in the CU group, the interpretability of the result is strongly influenced by one specific cohort (ADNI), which accounts for 78% of the overall weight. In the A+T+/A+T-/A-T+ analyses, no outliers were found in either the MCI or CU groups.

Furthermore, we note that although the Aβ OR analyses could be confirmed by also calculating the HRs, the inability to analyze the effect of p-tau on HR due to the low number of studies limits the completeness of the A/T analysis.

We pooled studies reporting AD-type dementia conversion and studies reporting conversion to unspecified dementia. This simplification was necessary because different studies defined Alzheimer’s dementia differently, generally considering the amnestic clinical symptoms rather than biomarkers.

The fact that the studies used different neuropsychology tests to define MCI may contribute to the heterogeneity in the pooled sample. Another contributing factor would be the heterogeneity in the definition of MCI, however among the studies in our pool, only one, by Riemschneider et al. [ 48 ] (sample size = 28), precedes the 2003 ‘Key Symposium’ [ 72 ] that transformed the MCI concept. All other studies were published subsequent to it. While MCI subgroups were deifned after the 2003 Symposium, the definition of MCI (objective cognitive impairment, essentially preserved general cognitive functioning, preserved independence in functional abilities) did not change afterwards. Furthermore, most of the studies pooled in the analyses were published after 2010.

Another source of heterogeneity is the relatively small sample size of some studies, leading to a higher variability of results. However, we thought that including studies with lower sample sizes was also important to get a complete picture.

It is essential to discuss the difference in the follow-up times between studies. The follow-up times ranged from 20 months to more than 10 years. Follow-up times were given in different ways, either as mean, median or up to a certain point. While naturally, the odds of conversion increase over time, our meta-regression analysis suggests that there is no significant difference in the odds ratios over (follow-up) time. The moderate heterogeneity of the studies also points in this direction. We also note here that hazard ratios independent of follow-up time showed similar results to OR analyses. Finally, yet importantly, we would like to point out that pathological protein changes can begin up to 20 years before the appearance of symptoms [ 6 ]. Such an extended follow-up is very difficult to carry out; therefore, all studies were shorter than that.

The results for Aβ are based on 7,793 individuals, and the combined analyses of Aβ and p-tau are based on data of over 3,500 individuals. Studies using CSF sampling or amyloid/tau PET to detect Aβ and p-tau were pooled together, despite using different kits and thresholds for positivity, contributing to the heterogeneity of results. This variation is acknowledged in Tables 1 , 2 , 3 and  4 , where the cut-off values are provided. Previous large population studies have indicated that amyloid and tau PET scans exhibit slightly higher sensitivity compared to CSF sampling techniques [ 73 , 74 , 68 ]. Nonetheless, the concordance between these diagnostic methods remains substantial. Moreover, findings from prior research (Lee et al. [ 75 ], Toledo et al. [ 76 ], Palmqvist et al. [ 77 ]) demonstrating high concordance across different amyloid CSF and amyloid PET measurements suggest that the impact of methodological differences on heterogeneity may be limited, All techniques are recommended by the National Institute on Aging-Alzheimer’s Association (NIA-AA) [ 6 ] for measurement.

Future directions

Conversion to Alzheimer’s disease could not be analyzed specifically, as most of the articles examining conversion either did not define Alzheimer’s disease or the definition was based on neuropsychological testing but not on biomarkers (i.e., Aβ and p-tau status were assessed only at baseline). According to the NIA-AA guideline [ 6 ] and our results, we recommend biomarker-based studies to assess conversion rates to Alzheimer’s disease.

In view of the Aβ and p-tau status, the most endangered population can be identified before the appearance of cognitive symptoms or at least at a mild stage. While the significance of Aβ in conversion is clear, it appears that its ability to predict the onset decreases with age. If we consider the current therapeutic limitations and the importance of early prevention, we believe that the initiation of non-pharmacological and pharmacological treatments should be related to Aβ and p-tau status rather than cognitive status.

Identifying the most endangered population also makes research more effective. The efficacy of different dementia prevention approaches can be more accurately assessed by knowing the Aβ and p-tau status of the patient. As the population targeted by the interventions can be more homogeneous, the effectiveness can be measured more precisely by identifying the population most at risk of conversion.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Non-pathologic levels of beta-amyloid

Pathologic levels of beta-amyloid

  • Beta-amyloid
  • Alzheimer’s disease

Alzheimer’s Disease Neuroimaging Initiative

Confidance interval

Cognitively unimpaired

Cerebrospinal fluid

Hazard ratio

  • Mild cognitive impairment

Absence of neurodegeneration

Presence of neurodegeneration

National Institute on Aging Alzheimer’s Association

Positron emission tomography

  • Phosphorylated tau

Non-pathologic levels of phosphorylated tau

Pathologic levels of phosphorylated tau

Risk Reduction of Cognitive Decline and Dementia: WHO Guidelines. Geneva: World Health Organization; 2019. Available from: https://www.ncbi.nlm.nih.gov/books/NBK542796/ .

Gauthier S, Rosa-Neto P, Morais JA, & Webster C. 2021. World Alzheimer Report 2021: Journey through the diagnosis of dementia. London: Alzheimer’s Disease International.

De Strooper B. The cellular phase of Alzheimer’s disease. Cell. 2016;164(4):603–15. https://doi.org/10.1016/j.cell.2015.12.056 .

Article   CAS   PubMed   Google Scholar  

Scheltens P, De Strooper B, Kivipelto M, et al. Alzheimer’s disease. The Lancet. 2021;397(10284):1577–90. https://doi.org/10.1016/s0140-6736(20)32205-4 .

Article   CAS   Google Scholar  

Dubois B, Villain N, Frisoni GB, et al. Clinical diagnosis of Alzheimer’s disease: recommendations of the international working group. Lancet Neurol. 2021;20(6):484–96. https://doi.org/10.1016/s1474-4422(21)00066-1 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jack CR Jr, Bennett DA, Blennow K, et al. NIA-AA Research framework: toward a biological definition of alzheimer’s disease. Alzheimers Dement. 2018;14(4):535–62. https://doi.org/10.1016/j.jalz.2018.02.018 .

Article   PubMed   Google Scholar  

McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on alzheimer’s disease. Neurology. 1984;34(7):939–44. https://doi.org/10.1212/wnl.34.7.939 .

McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimers Dement. 2011;7(3):263–9. https://doi.org/10.1016/j.jalz.2011.03.005 .

Rowe CC, Ellis KA, Rimajova M, et al. Amyloid imaging results from the australian imaging, biomarkers and lifestyle (AIBL) study of aging. Neurobiol Aging. 2010;31(8):1275–83. https://doi.org/10.1016/j.neurobiolaging.2010.04.007 .

Ossenkoppele R, Jansen WJ, Rabinovici GD, et al. Prevalence of amyloid PET positivity in dementia syndromes. JAMA. 2015;313(19):1939. https://doi.org/10.1001/jama.2015.4669 .

Article   PubMed   PubMed Central   Google Scholar  

Morris GP, Clark IA, Vissel B. Questions concerning the role of amyloid-β in the definition, aetiology and diagnosis of Alzheimer’s disease. Acta Neuropathol. 2018;136(5):663–89. https://doi.org/10.1007/s00401-018-1918-8 .

Van Der Flier WM, Scheltens P. The ATN framework—moving preclinical Alzheimer disease to clinical relevance. JAMA Neurology. 2022;79(10):968. https://doi.org/10.1001/jamaneurol.2022.2967 .

Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E. Mild cognitive impairment: clinical characterization and outcome. Arch Neurol. 1999;56(3):303–8. https://doi.org/10.1001/archneur.56.3.303 .

Grøntvedt GR, Lauridsen C, Berge G, et al. The amyloid, tau, and neurodegeneration (A/T/N) classification applied to a clinical research cohort with long-term follow-up. J Alzheimers Dis. 2020;74(3):829–37. https://doi.org/10.3233/jad-191227 .

Balasa M, Sánchez-Valle R, Antonell A, et al. Usefulness of biomarkers in the diagnosis and prognosis of early-onset cognitive impairment. J Alzheimer’s Di. 2014;40(4):919–27. https://doi.org/10.3233/JAD-132195 .

Article   Google Scholar  

Jansen WJ, Ossenkoppele R, Knol DL, et al. Prevalence of cerebral amyloid pathology in persons without dementia: a meta-analysis. Jama. 2015;313(19):1924–38. https://doi.org/10.1001/jama.2015.4668 .

Page MJ, McKenzie JE, Bossuyt PM, The PRISMA, et al. statement: an updated guideline for reporting systematic reviews. BMJ. 2020;2021: n71. https://doi.org/10.1136/bmj.n71 .

Weiner MW. Alzheimer’s disease neuroimaging initiative. Available from: https://adni.loni.usc.edu/ .

Aydin O, Yassikaya MY. Validity and reliability analysis of the plotdigitizer software program for data extraction from single-case graphs. Perspect Behav Sci. 2022;45(1):239–57. https://doi.org/10.1007/s40614-021-00284-0 .

Huwaldt, J. A., & Steinhorst, S. (2020). Plot digitizer 2.6.9.PlotDigitizer-Software. http://plotdigitizer.sourceforge.net/ .

Lewczuk P, Matzen A, Blennow K, et al. Cerebrospinal Fluid Aβ42/40 Corresponds better than Aβ42 to amyloid PET in Alzheimer’s disease. J Alzheimers Dis. 2017;55(2):813–22. https://doi.org/10.3233/jad-160722 .

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22(4):719–48.

CAS   PubMed   Google Scholar  

Robins J, Greenland S, Breslow NE. A general estimator for the variance of the Mantel-Haenszel odds ratio. Am J Epidemiol. 1986;124(5):719–23. https://doi.org/10.1093/oxfordjournals.aje.a114447 .

Thompson SG, Turner RM, Warn DE. Multilevel models for meta-analysis, and their application to absolute risk differences. Stat Methods Med Res. 2001;10(6):375–92. https://doi.org/10.1177/096228020101000602 .

Viechtbauer W, Cheung MW. Outlier and influence diagnostics for meta-analysis. Res Synth Methods. 2010;1(2):112–25. https://doi.org/10.1002/jrsm.11 .

Hayden JA, van der Windt DA, Cartwright JL, Côté P, Bombardier C. Assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6. https://doi.org/10.7326/0003-4819-158-4-201302190-00009 .

Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of two methods to detect publication bias in meta-analysis. Jama. 2006;295(6):676–80. https://doi.org/10.1001/jama.295.6.676 .

Kemppainen NM, Scheinin NM, Koivunen J, et al. Five-year follow-up of 11C-PIB uptake in Alzheimer’s disease and MCI. Eur J Nucl Med Mol Imaging. 2014;41(2):283–9. https://doi.org/10.1007/s00259-013-2562-0 .

Buchhave P, Minthon L, Zetterberg H, Wallin AK, Blennow K, Hansson O. Cerebrospinal fluid levels of β-amyloid 1–42, but not of tau, are fully changed already 5 to 10 years before the onset of Alzheimer dementia. Arch Gen Psychiatry. 2012;69(1):98–106. https://doi.org/10.1001/archgenpsychiatry.2011.155 .

Forlenza OV, Radanovic M, Talib LL, et al. Cerebrospinal fluid biomarkers in Alzheimer’s disease: diagnostic accuracy and prediction of dementia. Alzheimers Dement (Amst). 2015;1(4):455–63. https://doi.org/10.1016/j.dadm.2015.09.003 .

Hansson O, Buchhave P, Zetterberg H, Blennow K, Minthon L, Warkentin S. Combined rCBF and CSF biomarkers predict progression from mild cognitive impairment to Alzheimer’s disease. Neurobiol Aging. 2009;30(2):165–73. https://doi.org/10.1016/j.neurobiolaging.2007.06.009 .

Petersen RC. Mild cognitive impairment as a diagnostic entity. J Intern Med. 2004;256(3):183–94. https://doi.org/10.1111/j.1365-2796.2004.01388.x .

Arruda F, Rosselli M, Mejia Kurasz A, et al. Stability in cognitive classification as a function of severity of impairment and ethnicity: a longitudinal analysis. Article in Press. Appl Neuropsychol Adult. 2023:1-14. https://doi.org/10.1080/23279095.2023.2222861 .

Baldeiras I, Silva-Spínola A, Lima M, et al. Alzheimer’s disease diagnosis based on the amyloid, tau, and neurodegeneration scheme (ATN) in a real-life multicenter cohort of general neurological centers. J Alzheimer’s Dis. 2022;90(1):419–32. https://doi.org/10.3233/JAD-220587 .

Bos I, Verhey FR, Ramakers I, et al. Cerebrovascular and amyloid pathology in predementia stages: the relationship with neurodegeneration and cognitive decline. Alzheimers Res Ther. 2017;9(1):101. https://doi.org/10.1186/s13195-017-0328-9 .

Cerami C, Della Rosa PA, Magnani G, et al. Brain metabolic maps in Mild cognitive impairment predict heterogeneity of progression to dementia. Neuroimage Clin. 2015;7:187–94. https://doi.org/10.1016/j.nicl.2014.12.004 .

de Wilde A, Reimand J, Teunissen CE, et al. Discordant amyloid-β PET and CSF biomarkers and its clinical consequences. Alzheimers Res Ther. 2019;11(1):78. https://doi.org/10.1186/s13195-019-0532-x .

Eckerström C, Svensson J, Kettunen P, Jonsson M, Eckerström M. Evaluation of the ATN model in a longitudinal memory clinic sample with different underlying disorders. Alzheimers Dement (Amst). 2021;13(1): e12031. https://doi.org/10.1002/dad2.12031 .

Frölich L, Peters O, Lewczuk P, et al. Incremental value of biomarker combinations to predict progression of mild cognitive impairment to Alzheimer’s dementia. Alzheimers Res Ther. 2017;9(1):84. https://doi.org/10.1186/s13195-017-0301-7 .

Groot C, Cicognola C, Bali D, et al. Diagnostic and prognostic performance to detect alzheimer’s disease and clinical progression of a novel assay for plasma p-tau217. Article Alzheimer’s Res Ther. 2022;14(1):67. https://doi.org/10.1186/s13195-022-01005-8 .

Hanseeuw BJ, Malotaux V, Dricot L, et al. Defining a Centiloid scale threshold predicting long-term progression to dementia in patients attending the memory clinic: an [(18)F] flutemetamol amyloid PET study. Eur J Nucl Med Mol Imaging. 2021;48(1):302–10. https://doi.org/10.1007/s00259-020-04942-4 .

Herukka SK, Hallikainen M, Soininen H, Pirttilä T. CSF Aβ42 and tau or phosphorylated tau and prediction of progressive mild cognitive impairment. Article Neurology. 2005;64(7):1294–7. https://doi.org/10.1212/01.WNL.0000156914.16988.56 .

Jiménez-Bonilla JF, Quirce R, De Arcocha-Torres M, et al. A 5-year longitudinal evaluation in patients with mild cognitive impairment by 11C-PIB PET/CT: a visual analysis. Nucl Med Commun. 2019;40(5):525–31. https://doi.org/10.1097/mnm.0000000000001004 .

Lopez OL, Becker JT, Chang Y, et al. Amyloid deposition and brain structure as long-term predictors of MCI, dementia, and mortality. Neurology. 2018;90(21):E1920–8. https://doi.org/10.1212/WNL.0000000000005549 .

Okello A, Koivunen J, Edison P, et al. Conversion of amyloid positive and negative MCI to AD over 3 years: an 11C-PIB PET study. Neurology. 2009;73(10):754–60. https://doi.org/10.1212/WNL.0b013e3181b23564 .

Orellana A, García-González P, Valero S, et al. Establishing in-house cutoffs of CSF Alzheimer’s disease biomarkers for the AT(N) stratification of the Alzheimer center barcelona cohort. Int J Mol Sci. 2022;23(13):6891. https://doi.org/10.3390/ijms23136891 .

Ortega RL, Dakterzada F, Arias A, et al. Usefulness of CSF biomarkers in predicting the progression of amnesic and nonamnesic mild cognitive impairment to Alzheimer’s disease. Curr Aging Sci. 2019;12(1):35–42. https://doi.org/10.2174/1874609812666190112095430 .

Riemenschneider M, Lautenschlager N, Wagenpfeil S, Diehl J, Drzezga A, Kurz A. Cerebrospinal fluid tau and beta-amyloid 42 proteins identify Alzheimer disease in subjects with mild cognitive impairment. Arch Neurol. 2002;59(11):1729–34. https://doi.org/10.1001/archneur.59.11.1729 .

Rizzi L, Missiaggia L, Schwartz IVD, Roriz-Cruz M. Value of CSF biomarkers in predicting risk of progression from aMCI to ADD in a 5-year follow-up cohort. SN Compr Clin Med. 2020;2(9):1543–50. https://doi.org/10.1007/s42399-020-00437-3 .

Roberts RO, Aakre JA, Kremers WK, et al. Prevalence and Outcomes of amyloid positivity among persons without dementia in a longitudinal population-based setting. JAMA Neurol. 2018;75(8):970–9. https://doi.org/10.1001/jamaneurol.2018.0629 .

Villemagne VL, Pike KE, Chételat G, et al. Longitudinal assessment of Aβ and cognition in aging and Alzheimer disease. Ann Neurol. 2011;69(1):181–92. https://doi.org/10.1002/ana.22248 .

Hansson O, Zetterberg H, Buchhave P, Londos E, Blennow K, Minthon L. Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up study. Lancet Neurol. 2006;5(3):228–34. https://doi.org/10.1016/s1474-4422(06)70355-6 .

Dang C, Harrington KD, Lim YY, et al. Relationship Between amyloid-β positivity and progression to mild cognitive impairment or dementia over 8 years in cognitively normal older adults. J Alzheimers Dis. 2018;65(4):1313–25. https://doi.org/10.3233/jad-180507 .

Ebenau JL, Timmers T, Wesselman LMP, et al. ATN classification and clinical progression in subjective cognitive decline: The SCIENCe project. Neurology. 2020;95(1):e46–58. https://doi.org/10.1212/wnl.0000000000009724 .

Hatashita S, Wakebe D. Amyloid β deposition and glucose metabolism on the long-term progression of preclinical Alzheimer’s disease. Future Sci OA. 2019;5(3):Fso356. https://doi.org/10.4155/fsoa-2018-0069 .

Ossenkoppele R, Pichet Binette A, Groot C, et al. Amyloid and tau PET-positive cognitively unimpaired individuals are at high risk for future cognitive decline. Nature Medicine. 2022;28(11):2381–7. https://doi.org/10.1038/s41591-022-02049-x .

Strikwerda-Brown C, Hobbs DA, Gonneaud J, et al. Association of elevated amyloid and tau positron emission tomography signal with near-term development of alzheimer disease symptoms in older adults without cognitive impairment. JAMA Neurology. 2022;79(10):975. https://doi.org/10.1001/jamaneurol.2022.2379 .

Vos SJ, Xiong C, Visser PJ, et al. Preclinical Alzheimer’s disease and its outcome: a longitudinal cohort study. Lancet Neurol. 2013;12(10):957–65. https://doi.org/10.1016/s1474-4422(13)70194-7 .

Blom ES, Giedraitis V, Zetterberg H, et al. Rapid progression from mild cognitive impairment to Alzheimer’s disease in subjects with elevated levels of tau in cerebrospinal fluid and the APOE epsilon4/epsilon4 genotype. Dement Geriatr Cogn Disord. 2009;27(5):458–64. https://doi.org/10.1159/000216841 .

Hong YJ, Park JW, Lee SB, et al. The influence of amyloid burden on cognitive decline over 2 years in older adults with subjective cognitive decline: a prospective cohort study. Dement Geriatr Cogn Disord. 2021;50(5):437–45. https://doi.org/10.1159/000519766 .

Tomassen J, den Braber A, van der Landen SM, et al. Abnormal cerebrospinal fluid levels of amyloid and tau are associated with cognitive decline over time in cognitively normal older adults: A monozygotic twin study. Alzheimers Dement (N Y). 2022;8(1): e12346. https://doi.org/10.1002/trc2.12346 .

Rodrigue KM, Kennedy KM, Devous MD Sr, et al. β-Amyloid burden in healthy aging: regional distribution and cognitive consequences. Neurology. 2012;78(6):387–95. https://doi.org/10.1212/WNL.0b013e318245d295 .

Donohue MC, Jacqmin-Gadda H, Le Goff M, et al. Estimating long-term multivariate progression from short-term data. Alzheimers Dement. 2014;10(5 Suppl):S400–10. https://doi.org/10.1016/j.jalz.2013.10.003 .

Young AL, Oxtoby NP, Daga P, et al. A data-driven model of biomarker changes in sporadic Alzheimer’s disease. Brain. 2014;137(Pt 9):2564–77. https://doi.org/10.1093/brain/awu176 .

Oberstein TJ, Schmidt MA, Florvaag A, et al. Amyloid-β levels and cognitive trajectories in non-demented pTau181-positive subjects without amyloidopathy. Brain. 2022;145(11):4032–41. https://doi.org/10.1093/brain/awac297 .

Wisse LEM, Butala N, Das SR, et al. Suspected non-AD pathology in mild cognitive impairment. Neurobiol Aging. 2015;36(12):3152–62. https://doi.org/10.1016/j.neurobiolaging.2015.08.029 .

Pouclet-Courtemanche H, Nguyen TB, Skrobala E, et al. Frontotemporal dementia is the leading cause of “true” A-/T+ profiles defined with Aβ(42/40) ratio. Alzheimers Dement (Amst). 2019;11:161–9. https://doi.org/10.1016/j.dadm.2019.01.001 .

Vos SJB, Gordon BA, Su Y, et al. NIA-AA staging of preclinical Alzheimer disease: discordance and concordance of CSF and imaging biomarkers. Neurobiol Aging. 2016;44:1–8. https://doi.org/10.1016/j.neurobiolaging.2016.03.025 .

Livingston G, Huntley J, Sommerlad A, et al. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet. 2020;396(10248):413–46. https://doi.org/10.1016/s0140-6736(20)30367-6 .

Lourida I, Hannon E, Littlejohns TJ, et al. Association of lifestyle and genetic risk with incidence of dementia. JAMA. 2019;322(5):430–7. https://doi.org/10.1001/jama.2019.9879 .

Licher S, Ahmad S, Karamujić-Čomić H, et al. Genetic predisposition, modifiable-risk-factor profile and long-term dementia risk in the general population. Nat Med. 2019;25(9):1364–9. https://doi.org/10.1038/s41591-019-0547-7 .

Winblad B, Palmer K, Kivipelto M, et al. Mild cognitive impairment–beyond controversies, towards a consensus: report of the international working group on mild cognitive impairment. J Intern Med. 2004;256(3):240–6. https://doi.org/10.1111/j.1365-2796.2004.01380.x .

La Joie R, Bejanin A, Fagan AM, et al. Associations between [(18)F]AV1451 tau PET and CSF measures of tau pathology in a clinical sample. Neurology. 2018;90(4):e282–90. https://doi.org/10.1212/wnl.0000000000004860 .

Wolters EE, Ossenkoppele R, Verfaillie SCJ, et al. Regional [(18)F]flortaucipir PET is more closely associated with disease severity than CSF p-tau in Alzheimer’s disease. Eur J Nucl Med Mol Imaging. 2020;47(12):2866–78. https://doi.org/10.1007/s00259-020-04758-2 .

Lee J, Jang H, Kang SH, et al. Cerebrospinal fluid biomarkers for the diagnosis and classification of Alzheimer’s disease spectrum. J Korean Med Sci. 2020;35(44):361. https://doi.org/10.3346/jkms.2020.35.e361 .

Toledo JB, Brettschneider J, Grossman M, et al. CSF biomarkers cutoffs: the importance of coincident neuropathological diseases. Acta Neuropathol. 2012;124(1):23–35. https://doi.org/10.1007/s00401-012-0983-7 .

Palmqvist S, Zetterberg H, Mattsson N, et al. Detailed comparison of amyloid PET and CSF biomarkers for identifying early Alzheimer disease. Neurology. 2015;85(14):1240–9. https://doi.org/10.1212/wnl.0000000000001991 .

Download references

Acknowledgements

Not applicable.

Open access funding provided by Semmelweis University. 1. Supported by the GINOP-2.3.4-15-2020-00008 project. The project is co-financed by the European Union and the European Regional Development Fund.

2. This is an EU Joint Programme- Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organization under the aegis of JPND - www.jpnd.eu (National Research, Development and Innovation, Hungary, 2019-2.1.7-ERA-NET-2020-00006).

3. Supported by the National Research, Development and Innovation Office (NKFI/OTKA FK 138385).

Role of funding source: The sponsor(s), did not participate in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Author information

Authors and affiliations.

Centre for Translational Medicine, Semmelweis University, Üllői út 26, Budapest, 1085, Hungary

Zsolt Huszár, Marie Anne Engh, Márk Pavlekovics, Tomoya Sato, Yalea Steenkamp, Tamás Terebessy, Zsolt Molnár, Péter Hegyi & Gábor Csukly

Department of Psychiatry and Psychotherapy, Semmelweis University, Balassa utca 6, Budapest, 1083, Hungary

Zsolt Huszár & Gábor Csukly

Department of Neurology, Jahn Ferenc Teaching Hospital, Köves utca 1, Budapest, 1204, Hungary

Márk Pavlekovics

Department of Neurology and Institute of Neuroscience, Cliniques Universitaires Saint-Luc, Université Catholique de Louvain, Brussels, 1200, Belgium

Bernard Hanseeuw

Department of Radiology, Gordon Center for Medical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02155, USA

Department of Anesthesiology and Intensive Therapy, Semmelweis University, Üllői út 78/A, Budapest, Hungary

Zsolt Molnár

Department of Anesthesiology and Intensive Therapy, Poznan University of Medical Sciences, 49 Przybyszewskiego St, Poznan, Poland

Institute for Translational Medicine, Medical School, University of Pécs, Pécs, 7624, Hungary

Péter Hegyi

Institute of Pancreatic Diseases, Semmelweis University, Tömő 25-29, Budapest, 1083, Hungary

Translational Pancreatology Research Group, Interdisciplinary Centre of Excellence for Research Development and Innovation University of Szeged, Budapesti 9, Szeged, 6728, Hungary

You can also search for this author in PubMed   Google Scholar

Contributions

ZH: conceptualisation, project administration, methodology, formal analysis, writing – original draft; ME: conceptualisation, methodology, formal analysis, writing – review and editing; MP: conceptualisation, formal analysis, writing - review and editing; TS formal analysis, writing – review and editing; YS: formal analysis, writing – review and editing; BH: writing - review and editing; TT: conceptualisation, writing – review and editing; ZM: conceptualisation, supervision, writing - review and editing; PH: conceptualisation, supervision, writing - review and editing; GCs: conceptualization, methodology, formal analysis, supervision, writing – original draft, visualization. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Gábor Csukly .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Huszár, Z., Engh, M., Pavlekovics, M. et al. Risk of conversion to mild cognitive impairment or dementia among subjects with amyloid and tau pathology: a systematic review and meta-analysis. Alz Res Therapy 16 , 81 (2024). https://doi.org/10.1186/s13195-024-01455-2

Download citation

Received : 07 July 2023

Accepted : 08 April 2024

Published : 12 April 2024

DOI : https://doi.org/10.1186/s13195-024-01455-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Alzheimer's Research & Therapy

ISSN: 1758-9193

research on meta analysis

  • Introduction
  • Conclusions
  • Article Information

RCT indicates randomized clinical trial.

Bold copy emphasizes the random-effects model and common effects model. Square size indicates the weight of the study; diamonds, the total weight. ADE indicates adverse drug effect; RR, risk ratio.

a High-dose psilocybin.

b Moderate-dose psilocybin.

eFigure 1. Sensitivity Analysis

eFigure 2. Funnel Plots

Data Sharing Statement

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Yerubandi A , Thomas JE , Bhuiya NMMA , Harrington C , Villa Zapata L , Caballero J. Acute Adverse Effects of Therapeutic Doses of Psilocybin : A Systematic Review and Meta-Analysis . JAMA Netw Open. 2024;7(4):e245960. doi:10.1001/jamanetworkopen.2024.5960

Manage citations:

© 2024

  • Permissions

Acute Adverse Effects of Therapeutic Doses of Psilocybin : A Systematic Review and Meta-Analysis

  • 1 Department of Clinical and Administrative Pharmacy, College of Pharmacy, University of Georgia, Athens
  • 2 Department of Clinical and Administrative Sciences, Larkin University, Miami, Florida
  • 3 Lloyd L. Gregory School of Pharmacy, Palm Beach Atlantic University, West Palm Beach, Florida

Question   What are the notable acute adverse effects for therapeutic doses of psilocybin in the treatment of depression and anxiety?

Findings   In this meta-analysis of 6 randomized, double-blind clinical trials with 528 patients, headaches, nausea, anxiety, dizziness, and fluctuations in blood pressure occurred significantly more frequently with psilocybin vs comparators. Psilocybin use was not associated with risk of paranoia and transient thought disorder.

Meaning   The findings of this study suggest a tolerable acute adverse effect profile for therapeutic doses of psilocybin, but rare and long-term adverse effects need to be further elucidated.

Importance   Psilocybin has been studied in the treatment of depression and anxiety disorders. Clinical studies have mainly focused on efficacy, with systematic reviews showing favorable efficacy; however, none have primarily focused on psilocybin safety.

Objective   To evaluate the acute adverse effects of psilocybin at therapeutic doses in the treatment of depression and anxiety.

Data Sources   MEDLINE via PubMed, Web of Science, and ClinicalTrials.gov were searched for publications available between 1966 and November 30, 2023.

Study Selection   Randomized, double-blind clinical trials that reported adverse effects of psilocybin in patients treated for depression and anxiety were screened.

Data Extraction and Synthesis   Data were independently extracted by 2 authors and verified by 2 additional authors following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guideline. The inverse variance method with the Hartung-Knapp adjustment for the random-effects model was used, with a continuity correction of 0.5 for studies with 0 cell frequencies. Sensitivity analysis was conducted by sequentially removing 1 study at a time to assess the robustness of the results.

Main Outcomes and Measures   The primary outcome was considered as the adverse effects of psilocybin at high and moderate (ie, therapeutic) dose regimens and compared with placebo, low-dose psilocybin, or other comparator in the treatment of depression and/or anxiety.

Results   Six studies met the inclusion criteria with a total sample of 528 participants (approximately 51% female; median age 39.8 years; IQR, 39.8-41.2). Seven adverse effects were reported in multiple studies and included in the analysis. Among these, headache (relative risk [RR], 1.99; 95% CI 1.06-3.74), nausea (RR, 8.85; 95% CI, 5.68-13.79), anxiety (RR, 2.27; 95% CI, 1.11-4.64), dizziness (RR, 5.81; 95% CI, 1.02-33.03), and elevated blood pressure (RR, 2.29; 95% CI, 1.15- 4.53) were statistically significant. Psilocybin use was not associated with risk of paranoia and transient thought disorder.

Conclusions and Relevance   In this meta-analysis, the acute adverse effect profile of therapeutic single-dose psilocybin appeared to be tolerable and resolved within 48 hours. However, future studies need to more actively evaluate the appropriate management of adverse effects.

Psilocybin is classified as a serotonergic psychedelic and a prodrug of psilocin (4-hydroxy-dimethyltryptamine), which converts to the active form once ingested. 1 The theoretical mechanism of action involves binding to serotonin 2A (5-HT 2a ) predominantly in the amygdala, thalamus, and prefrontal cortex. The psychopharmacologic profile of psilocybin was examined in the 1960s. It was proposed that oral administration of approximately 10 mg was needed to induce psychological effects, with more potent effects developing with increasing doses. 2 - 4 Psilocybin’s psychological effects were comparable to those of lysergic acid diethylamide, but thought to be more vividly visual, less emotionally intense, more euphoric, and less likely to cause panic attacks or paranoia. 4 Clinical studies suggest psilocybin produces an antidepressant benefit in patients with treatment-resistant depression. 5 This impact is believed to be connected to its affinity for the serotonergic pathway in the brain, which is essential in controlling mood. 5

In recent years, there has been renewed interest in the therapeutic potential of psilocybin in the treatment of mental health (eg, depression, anxiety) disorders. 6 - 8 Psilocybin-assisted therapy typically involves 1 or 2 dosing sessions with individuals encouraged to explore their thoughts and emotions with the support of a therapist. Clinical studies have focused on psilocybin efficacy, resulting in studies pooling and presenting aggregate results. 6 , 7 One recent meta-analysis investigated psilocybin adverse effects as a secondary aim, using a dose-dependent approach focused on select adverse effects. 8 However, these studies have not primarily focused on or explored the adverse effect profile of psilocybin in depth. 6 - 8 Therefore, the purpose of this study was to summarize and examine the relative risk (RR) of acute adverse effects of therapeutic doses of psilocybin in patients with depression and anxiety.

A systematic review of the literature following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses ( PRISMA ) reporting guideline 9 was conducted to identify studies involving participants receiving psilocybin in the treatment of major depressive disorder or depression associated with other related disorders (eg, cancer-related anxiety and depression). Studies included randomized clinical trials comparing psilocybin with either placebo or another comparator (eg, niacin, escitalopram, low-dose psilocybin). Doses were grouped into low (1-3 mg), moderate (10-20 mg), and high (20-30 mg) categories. These dosing ranges were based on previous clinical data. 10 , 11 All studies were evaluated for adverse effects of psilocybin in the treatment of depression and anxiety in study participants. When selecting adverse event profile rates, the shortest time period available was selected and analyzed (eg, day 1 instead of day 30) since the half-life of psilocin is 3 ± 1.1 hours when taken orally and the duration of action can range between 3 to 12 hours. 12 , 13 Therefore, it is expected that psilocin concentrations would be minimal by 24 hours.

Published studies were identified by conducting a search of MEDLINE via PubMed, Web of Science, and ClinicalTrials.gov for publications available between 1966 and November 30, 2023. Search terms included psilocybin , side effects , depression , anxiety , adverse effects , and adverse side effects . Only randomized, double-blind clinical trials published in the English language were included in the systematic review. Studies were included in the analysis if they reported the adverse effects of psilocybin. Studies were excluded if they reported the same adverse events from previous studies or follow-up studies in which adverse event data were already described (eg, deferring adverse event profile to most recent exposure). When appropriate and based on our expertise, adverse effects having different terminology but similar definitions were grouped. These included nausea and transient nausea, headache and transient headache, anxiety and transient anxiety, paranoia and paranoid ideation, and transient thought disorder and abnormal thinking.

Data were independently extracted by 2 of us (A.Y. and N.M.M.A.B.) and verified by an additional 2 of us (J.E.T. and J.C.). The primary outcome was considered the adverse effects of psilocybin at high- and moderate-dose regimens (ie, therapeutic doses) and compared with placebo, low-dose psilocybin, or another comparator in the treatment of depression and anxiety with other related disorders.

The meta-analysis was conducted using R statistical software, version 4.3.1 (R Foundation for Statistical Computing). We used the inverse variance method with the Hartung-Knapp adjustment for the random-effects model and used a continuity correction of 0.5 for studies with 0 cell frequencies. Sensitivity analysis was conducted by sequentially removing 1 study at a time to assess the robustness of our results. For outcomes with 4 or less studies, a common effect model was considered. Heterogeneity among the studies was quantified using the I 2 measure. Additionally, for each study, a funnel plot was created to evaluate publication bias. Study quality was assessed by the risk-of-bias tool for randomized trials (RoB2). 14 Two of us (A.Y. and N.M.M.A.B) determined initial RoB2 scores for the included studies. An additional 2 of us (J.C. and J.E.T.) then independently assessed and verified initial scores. Any discrepancies were discussed and rescored as needed. With 2-sided analysis, the significance threshold was P  ≤ .05.

Overall, of 70 published studies identified through PubMed, Web of Science, and ClinicalTrials.gov, 64 studies were excluded ( Figure 1 ). Therefore, 6 studies (total sample of 528 participants; approximately 51% female; 49% male; median age, 39.8 [IQR, 39.8-41.2] years) met our inclusion criteria ( Table ). 11 , 15 - 19 In general, the population was middle-aged adults and more than 90% of the participants were White. The RoB2 assessment tool showed an overall low risk of bias for all included studies. Several adverse effects were identified throughout the studies. Comparators identified in these studies included placebo, niacin, escitalopram, and low-dose psilocybin (1-3 mg). In general, participants experienced adverse effects immediately or within 24 hours after administration of various doses of psilocybin.

Safety data ( Figure 2 and Figure 3 ) from the randomized clinical trials to assess risk ratio via meta-analysis were available for the following 7 adverse effects: headache, nausea, anxiety, dizziness, paranoia, transient thought disorder, and elevated blood pressure. Overall, psilocybin was associated with a greater risk of adverse effects of headache (RR, 1.99; 95% CI, 1.06-3.74; P  = .04), nausea (RR, 8.85; 95% CI, 5.68-13.79; P  < .001), anxiety (RR, 2.27; 95% CI, 1.11-4.64; P  = .02), dizziness (RR, 5.81; 95% CI, 1.02-33.03; P  = .047), and elevated blood pressure (RR, 2.29; 95% CI, 1.15-4.53; P  = .02) compared with control. Psilocybin use was not associated with risk of paranoia and transient thought disorder. Overall, 2 adverse effects appeared in all 6 studies, including headache, with an incidence ranging from 2% to 66% and nausea with an occurrence varying from 4% to 48%. 11 , 15 - 19 Anxiety was documented in 3 studies, with an incidence ranging from 4% to 26%. 11 , 16 , 17 All adverse effects had an estimated I 2 value of less than 50%, except elevated blood pressure ( I 2  = 78%), suggesting most results were not affected by heterogeneity. In the sensitivity analysis, minor adjustments in RR were observed for headache and nausea, while anxiety, dizziness, paranoia, and transient thought disorder showed unchanged RR values. This consistency, despite limited studies for some conditions, highlights the robustness of our findings. Sensitivity and funnel plot analysis are located in eFigure 1 and eFigure 2 in Supplement 1 .

Additionally, there were 6 acute adverse effects that appeared in greater than 5% of the population, including elevated heart rate (76%), visual perceptual effects (44%), physical discomfort (21%), fatigue (approximately 6%), euphoric mood (approximately 5%), and mood alteration (approximately 5%). 11 , 15 - 17 However, these adverse effects were identified in only 1 study and therefore not included in the meta-analysis. The studies reported none of the adverse events listed were considered serious. A summary of the studies is presented in the Table .

A summary of the acute adverse effects of psilocybin in treating depression and anxiety is needed for health care professionals to identify expected adverse effects and provide effective patient counseling. This study focused on therapeutic doses to clarify the expected adverse effects in potential future practice. The results overall suggest a statistically significant incidence of headache, nausea, anxiety, dizziness, and elevated blood pressure. However, caution is advised in interpreting elevated blood pressure due to its heterogeneity ( I 2  = 78%), indicating potential variability. Given the psilocybin mechanism of action, these adverse effects are expected as they are similar among serotonergic antidepressants. 1 The adverse effects were also anticipated based on previous survey data from adult participants who ingested active doses of psilocybin mushrooms. 20 Additionally, data on adverse effect severity appear to align with documented psychedelic adverse effects over the past 60 years or more. 21 - 23

All 6 studies identified headache as a statistically significant adverse effect of psilocybin, with an incidence ranging from 2% to 66%. 11 , 15 - 19 Headaches were typically mild to moderate in severity, and none required medications for relief. Literature reports corroborate headaches as a known adverse effect of psilocybin. A recent study found a significant dose-response relationship, with the RR of headaches/migraines increasing by 1.42% for each unit increase in psilocybin dose. 8 Moreover, a small double-blind study in healthy individuals indicated a dose-related response to psilocybin with respect to headache occurrence, duration, and intensity. 24 These headaches subsided within 24 hours of psilocybin administration.

All 6 studies identified nausea as an adverse effect. 11 , 15 - 19 Incidence varied between 4% and 48%. 15 , 18 One study reported nausea at 22% with a high dose, but the rate decreased with lower doses (eg, 7% with moderate dose, 1% with low dose). 11 An additional study reported 15% of participants experienced nausea at a high dose and none with a low dose. 16 Recent data support a dose-response association with a relative risk of 1.25% ( P  < .001). 8 Five studies stated nausea was not severe and resolved within 60 minutes. 11 , 16 - 19 While the severity of nausea was not discussed in the studies, none reported using any pharmacologic agent to assist with nausea. Additionally, medications to alleviate any severe nausea or vomiting, if needed, were not identified in the study protocols. There are anecdotal reports suggesting eating 1 to 2 hours before taking psilocybin or taking with a small snack, using lemon juice and/or ginger, and hydrating well may be helpful. 25 , 26 Others advise patients can concentrate on themselves in an act of surrendering to the psychedelic experience. 27 , 28 However, there are no clinical studies to support these suggestions or allopathic treatments and caution is warranted. For example, ginger may potentiate the effects of psilocybin due to its ability to increase serotonin and produce negative consequences. 29 The impact a therapist had in assisting patients in managing nausea is also unknown. For example, anecdotal evidence suggests a patient fully immerses themselves into their experience through the guide of a therapist to alleviate nausea dates to the late 1950s-1960s but has not been validated. 27 , 28

Three studies identified anxiety as an adverse effect. 11 , 16 , 17 According to 1 study, anxiety was reported in 4% of participants administered high-dose psilocybin, 8% with a moderate dose, and none with a low dose. 11 However, another study stated 26% of participants with high-dose psilocybin and 15% with low-dose psilocybin experienced an anxiety episode. 16 All 3 studies identifying anxiety stated that, similar to nausea, anxiety resolved between 24 and 48 hours. While the severity of anxiety was not thoroughly discussed in any of the studies, the studies mentioned that anxiety was not serious. In the data set reviewed, 1 case was identified in which a patient received a pharmacologic intervention (ie, lorazepam, 2 mg) after taking high-dose psilocybin and experiencing acute anxiety. 11 In general, the study protocols discussed using medications to treat anxiety not resolving after nonpharmacologic interventions (eg, guidance from therapist). For example, Goodwin et al 11 noted benzodiazepine anxiolytics, such as lorazepam or alprazolam, given orally may be preferred due to rapid onset, short duration of action, and the possibility that another route (eg, intravenous injection) may exacerbate anxiety. Two additional studies listed oral diazepam, 5 to 10 mg, and oral olanzapine, 5 to 10 mg, as rescue medications for severe adverse psychological distress or severe anxiety in the protocols. 17 , 19 Another study mentioned diazepam and risperidone as rescue medications for anxiety or psychosis, with no dosing range. 15 Carhart-Harris et al 18 listed lorazepam (oral or injectable) as a rescue medication for treating events of severe panic that would place people at risk after not responding to psychological intervention. Perhaps having the therapist present may assist in decreasing or managing anxiety through simple arm-holding for patients experiencing anxiety 18 ; however, given the increase in anxiety, there may be a need to provide such pharmacologic transparency on protocols designed to manage anxiety not controlled by the therapist.

Although a recent meta-analysis focused on dose-dependent response did not report dizziness as an adverse effect of psilocybin, 8 our analysis found it statistically significant. Two studies identified dizziness as an adverse effect. 11 , 19 One study reported 6% dizziness with high-dose psilocybin, 1% with moderate dose and none with low dose. 11 The other study reported 8% of participants with dizziness after administration of psilocybin. 19 Both studies stated dizziness resolved between 24 and 48 hours. Neither study used any medication to treat dizziness, and both reported it was a nonserious adverse effect. None of the study protocols defined any medications to treat severe dizziness. Similar to their interventions with other adverse effects, the role a therapist has in managing dizziness (eg, asking patients to lie down, encouraging them to trust the experience) is also unknown. Some protocols state that a patient should lie down with eye coverings, 11 , 16 , 17 , 19 and therefore such measures may be enough to decrease the severity of the dizziness and avoid requiring any medications.

Two studies reported elevated blood pressure and heart rate. 16 , 17 In one trial, 76% of participants experienced elevated blood pressure at a therapeutic dose of 21 mg. 17 In another study, 34% of participants had elevated systolic blood pressure (>160 mm Hg) and 13% had elevated diastolic blood pressure (>100 mm Hg) with high-dose psilocybin. 16 However, patients taking low-dose psilocybin (in control group) also had elevated systolic (17%) and diastolic (2%) blood pressure. There is a possibility the blood pressure increases in the low-dose psilocybin (part of control group) in one study 16 but absent in the other 17 (using niacin as the control) may have contributed to the high heterogeneity. The observed high I 2 statistic points to heterogeneity among the studies because only 2 studies contributed to this result. The limited number of studies and difference in control groups may inflate the heterogeneity measure; therefore, the findings should be interpreted within the context of this limitation. However, recent findings suggest an increased blood pressure dose-related response with a relative risk of 1.04% ( P  = .04) appear to support our results. 8 Peak heart rate was 71 beats/min at 300 minutes postdosing in one study, while peak heart rate was 84 beats/min postdosing using high-dose psilocybin in another. 16 , 17 Elevated heart rate in both studies was not considered serious and resolved within 24 hours. Additionally, 1 of the studies noted elevated blood pressure with the moderate dose, which was reported as nonsignificantly different from placebo. 19 However, the sample size of this adverse effect was not described in the study or supplemental materials and therefore not included in the meta-analysis. Also, 4 of the 6 studies 11 , 15 - 17 excluded patients with uncontrolled hypertension or elevated blood pressure at baseline and, therefore, the effects of psilocybin on blood pressure need to be further explored. At this time, we are not aware of any medications used to treat this effect, even though elevated heart rate has been described in the literature. 17 Despite limited data on listing medications to treat increased blood pressure or heart rate, based on its mechanism of action and pharmacokinetic profile, clonidine may be an option for psychedelic-induced increased blood pressure and/or heart rate but has only been studied in mice. 30 Protocols and guidelines note oral nifedipine, 10 mg, or intravenous labetalol as rescue medications for hypertension. 19 , 23 Data suggest both clonidine and nifedipine are equally effective to treat urgent hypertension in general. 31 While dosing per se has not been studied in this target population, based on studies and the pharmacokinetic profile of psilocybin, clonidine, 0.1 to 0.2 mg, orally or nifedipine, 10 to 20 mg, orally per dose appears reasonable and recommended for short-term resolution. 31 , 32

Other adverse effects analyzed in the meta-analysis but not showing a significant difference included paranoia and transient thought disorder. Three studies reported a total of 3 cases of paranoia with high-dose psilocybin across 128 patients. 15 - 17 Additionally, 5 patients in 2 studies totaling 103 patients experienced transient thought disorder with psilocybin (1 with high dose, 4 with moderate dose). 11 , 17 One study listed risperidone, while another reported olanzapine and diazepam as agents to treat psychological distress. 11 , 17 Risperidone can also be a potential option in treating acute psychological distress. 11 While the incidence of both paranoia and transient thought disorder appears to be low, this may be an adverse effect worth monitoring in the future and supported by a recent study suggesting a dose-response relationship. 8 All 6 studies used a therapist/facilitator to assist patients during treatment. The use of these therapists may have played a role in supporting these cases and preventing increased severity or complications. Published guidelines provide recommendations for study personnel to assist participants during their psychedelic experience. 23 Additionally, there are a number of certificate programs designed to provide psychedelic psychotherapy training. 33 None of the studies stated whether any of the therapists had any specific psychedelic certifications. Therefore, the utility of such certifications also merits further study in managing patient response to adverse effects.

Single studies reported other acute adverse effects, including visual perceptive effects (44%), physical discomfort (21%), fatigue (approximately 6%), euphoric mood (approximately 5%), and mood alteration (approximately 5%). 11 , 15 , 16 While these adverse effects were only identified in single studies, it is unknown whether they were specifically evaluated across other studies, and it is difficult to speculate any dose-related response. Regardless, all the adverse effects identified in single trials were not considered serious and resolved within 24 hours with no pharmacologic treatment needed. Visual perceptual effects, an expected adverse effect of psilocybin, was only identified in 1 study. 15 While the incidence occurred in 44% of the participants, 6% continued to experience symptoms after the dosing day and resolved by day 9.

Overall, the strengths of this study include the ability to evaluate the adverse event profile of therapeutic doses of psilocybin using a meta-analysis approach when possible given the currently small number of studies with limited sample sizes. However, these studies had a low risk of bias. There are several limitations to our study results. First, our meta-analysis is based on 6 randomized controlled studies published only in English, which have less sample sizes for analysis to conclude the potential adverse effects caused by psilocybin. Additionally, the studies focus more on acute adverse effects, usually concentrating on the first 48 hours and appear to be less stringent with time. There are some adverse effects that are mentioned in only 1 study and cannot be further analyzed. A bias within the studies for focusing on certain adverse effects and not others may be possible. Selection bias may also be a limitation. Participants in these studies have been predominantly White adults without comorbidities that may be exacerbated (eg, hypertension) with psilocybin use. Also, since psilocybin appears to act as an antidepressant, future studies need to evaluate suicidality that, although rare, may pose a risk in younger adults and is a boxed warning on all antidepressants. This is of particular interest since 5 of the 6 studies appeared to exclude patients with potential suicide risk. 11 , 15 , 16 , 18 , 19 Additionally, there is a lack of recent research data discussing the treatment of psychedelic adverse effects. One study describes safety guidelines for hallucinogen research, and while rescue medications are briefly mentioned, the guidelines primarily focus on participant selection and preparation, study personnel and appropriate conduct, physical environment, and postsession procedures. It is also unknown whether future studies exploring higher doses or more frequent use of psilocybin may carry additional adverse effects or increase the severity of symptoms. Furthermore, it is important to assess the impact a therapist may have in mitigating any of these adverse effects.

In this systematic review and meta-analysis, therapeutic doses of psilocybin appeared to produce tolerable acute adverse effects that typically resolved within 24 to 48 hours. However, less common adverse effects, such as paranoia and prolonged visual perceptual effects, warrant attention. Larger trials are necessary to fully assess these adverse effects, particularly in populations with comorbid health conditions. Recommendations for solicited acute adverse effects should, at a minimum, include headache, nausea, anxiety, dizziness, paranoia, blood pressure and/or heart rate changes, visual perceptual effects, physical discomfort, and mood changes. Although infrequent, the possibility of suicidality, prolonged paranoia, and persistent visual perceptual effects should be monitored over the long term. The effectiveness of medications and alternative treatments in managing these symptoms requires further investigation. Additionally, the role of licensed therapists in managing adverse effects presents an avenue for future research.

Accepted for Publication: February 12, 2024.

Published: April 10, 2024. doi:10.1001/jamanetworkopen.2024.5960

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Yerubandi A et al. JAMA Network Open .

Corresponding Author: Joshua Caballero, PharmD, BCPP, FCCP, University of Georgia, 250 W Green St, Athens, GA 30602 ( [email protected] ).

Author Contributions: Drs Joshua Caballero and Akhila Yerubandi had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Thomas, Harrington, Caballero.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Yerubandi, Thomas, Bhuiya, Caballero.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Harrington, Villa Zapata.

Administrative, technical, or material support: Yerubandi, Harrington, Villa Zapata, Caballero.

Supervision: Villa Zapata, Caballero.

Conflict of Interest Disclosures: None reported.

Data Sharing Statement: See Supplement 2 .

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

IMAGES

  1. PPT

    research on meta analysis

  2. Meta-Analysis Methodology for Basic Research: A Practical Guide

    research on meta analysis

  3. A practical Guide to do Primary research on Meta analysis Methodology

    research on meta analysis

  4. Meta-Analysis Methodology for Basic Research: A Practical Guide

    research on meta analysis

  5. 1 What is meta-analysis?

    research on meta analysis

  6. Meta-Analysis: A Quantitative Approach To Research Integration

    research on meta analysis

VIDEO

  1. Meta Analysis Research (मेटा विश्लेषण अनुसंधान) #ugcnet #ResearchMethodology #educationalbyarun

  2. Statistical Power of a Meta-Analysis

  3. Introduction to Meta-Analysis

  4. 7-6 How to do a systematic review or a meta-analysis with HubMeta: Outlier Analysis

  5. Working with existing research Meta analysis and Systematic Review

  6. 4-2 How to do a systematic review or a meta-analysis with HubMeta: Fulltext Screening in HubMeta

COMMENTS

  1. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  2. How to conduct a meta-analysis in eight steps: a practical guide

    2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.

  3. Meta-analysis and the science of research synthesis

    Meta-analysis is the quantitative, scientific synthesis of research results. Since the term and modern approaches to research synthesis were first introduced in the 1970s, meta-analysis has had a ...

  4. Meta-analysis

    Meta-analysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing an effect size across all of the studies; this involves extracting effect sizes and variance measures from various studies.

  5. PDF How to conduct a meta-analysis in eight steps: a practical guide

    Meta-analysis is a central method for knowledge accumulation in many scien-tic elds (Aguinis et al. 2011c; Kepes et al. 2013). Similar to a narrative review, it serves as a synopsis of a research question or eld. However, going beyond a narra-tive summary of key ndings, a meta-analysis adds value in providing a quantitative

  6. The 5 min meta-analysis: understanding how to read and ...

    A systematic review may or may not include a meta-analysis, which provides a statistical approach to quantitatively combine results of studies eligible for a systematic review topic [2,3,4,5 ...

  7. Methodological Guidance Paper: High-Quality Meta-Analysis in a

    The term meta-analysis was first used by Gene Glass (1976) in his presidential address at the AERA (American Educational Research Association) annual meeting, though Pearson (1904) used methods to combine results from studies on the relationship between enteric fever and mortality in 1904. The 1980s was a period of rapid development of statistical methods (Cooper & Hedges, 2009) leading to the ...

  8. Getting Started

    A systematic review may include a meta-analysis. For details about carrying out systematic reviews, see the Guides and Standards section of this guide. Is my research topic appropriate for systematic review methods? A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure.

  9. Understanding the Practice, Application, and Limitations of Meta-Analysis

    The literature search in a meta-analysis involves a unique approach to finding and obtaining materials. Unlike a typical literature review using a narrative or integrative approach, a meta-analysis requires a very well-defined, public, and systematic set of parameters for inclusion of materials (Cooper, 1989; Cooper & Hedges, 1994).In addition, the search parameters and methods require a high ...

  10. Chapter 10: Analysing data and undertaking meta-analyses

    Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions. Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses.

  11. Meta‐analysis and traditional systematic literature reviews—What, why

    The research question for a meta-analysis could be formulated around specific theory (e.g., regulatory fit theory; Motyka et al., 2014) or model (e.g., technology acceptance model; King & He, 2006)). Defining a research question in meta-analysis requires a deep understanding of the topic and literature, and entails specifying a valuable ...

  12. Meta-Analysis

    Definition. "A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning ...

  13. Research Guides: Study Design 101: Meta-Analysis

    Meta-analysis would be used for the following purposes: To establish statistical significance with studies that have conflicting results. To develop a more correct estimate of effect magnitude. To provide a more complex analysis of harms, safety data, and benefits. To examine subgroups with individual numbers that are not statistically significant.

  14. What is meta-analysis?

    Meta-analysis is a research process used to systematically synthesise or merge the findings of single, independent studies, using statistical methods to calculate an overall or 'absolute' effect. 2 Meta-analysis does not simply pool data from smaller studies to achieve a larger sample size. Analysts use well recognised, systematic methods ...

  15. A systematic review and multivariate meta-analysis of the ...

    A meaningful tool to make sense of this vast amount of research is through meta-analysis. While previous meta-analyses on this topic exist, they were limited in scope, focusing only on particular ...

  16. Meta-analysis

    Meta-analysis is an objective examination of published data from many studies of the same research topic identified through a literature search. Through the use of rigorous statistical methods, it ...

  17. Research Perspectives on Meta‐Analysis

    Summary This chapter contains sections titled: Overview of Meta-Analysis Meta-Analysis of Effect Sizes Meta-Analysis of Correlations Technical Issues in Meta-Analysis Final Remarks References Research Perspectives on Meta‐Analysis - Handbook of Research Methods in Industrial and Organizational Psychology - Wiley Online Library

  18. Meta-analysis in a digitalized world: A step-by-step primer

    To map Internet-based research and to reveal any research gap, we further synthesize meta-analyses on Internet-based research (15 articles containing 24 different meta-analyses, on 745 studies, with 1,601 effect sizes), resulting in the first mega meta-analysis of the field. We found a lack of individual participant data (e.g., age and ...

  19. The Role of Meta-Analysis in Psychology Research

    A meta-analysis can be a useful research tool in psychology. In addition to providing an accurate, big-picture view of a specific topic, the studies can also make it easier for policymakers and other decision-makers to see a summary of findings more quickly. Meta-analysis can run into problems with bias and may suggest that more research is ...

  20. Beta-blocker therapy in patients with COPD: a ...

    A meta-analysis was performed to assess the association of beta-blocker therapy with acute exacerbations of COPD (AECOPD), and a network meta-analysis was conducted to investigate the effects of individual beta-blockers on FEV1. Mortality, all-cause hospitalization, and quality of life outcomes were narratively synthesized.

  21. Meta-analysis in medical research

    Meta-Analysis and Systematic Review. Glass first defined meta-analysis in the social science literature as "The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings" 9.Meta-analysis is a quantitative, formal, epidemiological study design used to systematically assess the results of previous research to derive ...

  22. Is Phonological-Only Instruction Helpful for Reading?: A Meta-Analysis

    A Meta-Analysis. Melissa V. Stalega a Department of Educational Psychology, University of Connecticut Storrs, CT United States Correspondence [email protected] ... Related Research . People also read lists articles that other readers of this article have read.

  23. Risk of conversion to mild cognitive impairment or dementia among

    Measurement of beta-amyloid (Aβ) and phosphorylated tau (p-tau) levels offers the potential for early detection of neurocognitive impairment. Still, the probability of developing a clinical syndrome in the presence of these protein changes (A+ and T+) remains unclear. By performing a systematic review and meta-analysis, we investigated the risk of mild cognitive impairment (MCI) or dementia ...

  24. Mycorrhizal fungi modify decomposition: a meta-analysis

    This approach employs a machine learning-based method for conducting meta-analysis using a random forests algorithm (Van Lissa, 2020). We computed an initial model with the effect size as a function of the variables C : N ratio, absolute latitude, time, mycorrhizal type, technique of exclusion, substrate, biome, experimental setting, and method.

  25. Acute Adverse Effects of Therapeutic Doses of Psilocybin

    The meta-analysis was conducted using R statistical software, version 4.3.1 (R Foundation for Statistical Computing). ... One study describes safety guidelines for hallucinogen research, and while rescue medications are briefly mentioned, the guidelines primarily focus on participant selection and preparation, study personnel and appropriate ...