How to Do a Meta-Analysis: A Beginner's Guide

A meta-analysis is a statistical technique that combines the results of two or more independent studies on the same question into a single pooled estimate. When done correctly, it provides a more precise estimate of the treatment effect than any individual study can offer, because it synthesizes data across a larger total sample. When done incorrectly, it combines studies that should not be combined and produces a pooled estimate with false precision.

This guide covers every decision in the process: when to pool, which effect size to use, which model to apply, and how to assess heterogeneity. It also covers how to rate certainty using GRADE. It reflects the January 2024 Cochrane RevMan update, which changed the default random-effects estimator from DerSimonian-Laird to REML and added prediction intervals as a standard output.

REML, prediction intervals, HKSJ confidence interval adjustment, and reproducible R or Stata code are standard outputs, not add-ons, at ScribeLab Writer's meta-analysis service, starting from $750.

Quick Answer:

A meta-analysis has eight core steps: confirm that pooling is clinically and methodologically appropriate, choose the correct effect size measure (risk ratio, odds ratio, mean difference, or SMD), choose the statistical model (almost always random effects for clinical reviews), estimate heterogeneity using I-squared, tau-squared, and prediction intervals, produce the forest plot, assess publication bias (funnel plot and Egger's test if ten or more studies), run subgroup and sensitivity analyses, and rate certainty using GRADE. Since January 2024, Cochrane RevMan has used REML as the default tau-squared estimator and includes prediction intervals as a standard forest plot output. A meta-analysis that still uses DerSimonian-Laird without documented justification does not meet current Cochrane Handbook v6.5 standards.

What a Meta-Analysis Is and When to Conduct One

A meta-analysis is appropriate when at least two studies address the same research question with comparable populations, interventions, comparators, and outcomes. Their data must also be combinable into a single pooled estimate.

Pooling is not always appropriate, even when two or more studies exist. Narrative synthesis is preferred when there is substantial clinical heterogeneity that makes the pooled estimate uninterpretable, methodological heterogeneity introducing systematic bias, or insufficient data for reliable calculations.

The decision to pool should be pre-specified in the systematic review protocol. A meta-analysis that was decided upon after seeing the data, because the results looked consistent enough to pool, is post-hoc and should be clearly labelled as exploratory.

Step 1: Confirm That Pooling Is Appropriate

Before writing any code or opening any software, confirm three things.

Clinical similarity. The included studies must address the same PICO question in a sufficiently similar way. Pooling cardiovascular outcomes from trials in South Asian populations with trials in North American populations requires a pre-specified clinical rationale for why the effect is expected to be consistent across those populations. If no such rationale exists, a subgroup analysis by geography is more defensible than a pooled estimate.

Statistical compatibility. The included studies must report the outcome in a format that can be pooled. Studies reporting mean and standard deviation for a continuous outcome can be pooled using mean difference or standardized mean difference. Studies that report median and interquartile range for the same outcome cannot be pooled directly using standard methods without data transformation.

Sufficient sample size for a reliable estimate. A meta-analysis of two small trials with a total of 80 participants is not meaningless, but its confidence interval will be wide and its prediction interval will be wider. The findings should be presented with appropriate uncertainty and labelled as preliminary.

Step 2: Choose the Right Effect Size Measure

The effect size measure determines how the results from each study are extracted, calculated, and combined. The choice depends on the outcome type.

Table 1: Effect Size Measures in Meta-Analysis by Outcome Type

Effect Size	Outcome Type	When to Use	Interpretation Example
Risk Ratio (RR)	Binary (dichotomous)	RCTs and cohort studies with a control group. Most Cochrane intervention reviews use RR as the primary effect measure.	RR 0.72 (95% CI 0.60–0.86): the intervention reduces the risk of the outcome by 28% compared to control.
Odds Ratio (OR)	Binary (dichotomous)	Case-control studies, logistic regression outputs. Can be used in RCTs but may overestimate the effect when the outcome is common (>10%).	OR 0.65 (95% CI 0.50–0.84): the odds of the outcome are 35% lower in the intervention group than in the control group.
Mean Difference (MD)	Continuous	All studies measure the same outcome on the same scale with the same units (e.g., all measure systolic blood pressure in mmHg).	MD -8.2 mmHg (95% CI -12.4 to -4.0): the intervention reduced systolic blood pressure by 8.2 mmHg compared to control.
Standardized Mean Difference (SMD)	Continuous	Studies measure the same construct on different scales (e.g., depression measured by PHQ-9 in some studies and HAM-D in others). SMD standardizes results to a common unit.	SMD -0.52 (95% CI -0.78 to -0.26): the intervention reduced depression scores by 0.52 standard deviations — generally considered a moderate effect.
Hazard Ratio (HR)	Time-to-event (survival)	Outcomes where timing matters: mortality, time to disease progression, time to readmission. Accounts for censored observations.	HR 0.78 (95% CI 0.68–0.89): the instantaneous risk of the event at any given time is 22% lower in the intervention group.
Diagnostic Odds Ratio (DOR)	Diagnostic accuracy	Meta-analysis of diagnostic test accuracy studies where sensitivity and specificity vary across studies. The DOR combines both into a single measure of discriminative ability.	DOR 25.4 (95% CI 15.3–42.1): the test is 25 times more likely to give a positive result in patients with the condition than in those without it.

Risk ratio (relative risk). Used for binary outcomes in cohort studies and RCTs where the reference group is a control or unexposed group. A risk ratio of 0.75 means the intervention group has 75 percent of the risk of the control group, a 25 percent reduction.

Odds ratio. Used for binary outcomes in case-control studies, logistic regression, and some RCTs. The odds ratio is not the same as the risk ratio, and the difference matters when outcome prevalence is high. An odds ratio is interpreted as the odds of the outcome in the intervention group divided by the odds in the control group.

Mean difference (MD). Used when two or more studies measure the same continuous outcome on the same scale with the same units. If all studies measure systolic blood pressure in mmHg, the mean difference is interpretable in those units.

Standardized mean difference (SMD). Used when studies measure the same construct on different scales. Depression measured by the PHQ-9 and HAM-D cannot be pooled as mean differences but can be pooled as SMDs (the difference in means divided by the pooled standard deviation), which converts all measurements to a common unit of standard deviations.

Hazard ratio. Used for time-to-event outcomes where the timing of the event matters. The hazard ratio measures the instantaneous risk of the event in the intervention group relative to the control group at any given time point.

Step 3: Choose the Statistical Model

The choice between a fixed-effect model and a random-effects model determines how between-study variation is handled in the pooled estimate.

A fixed-effect model assumes that all included studies estimate the same true effect and that observed differences between study results are due to sampling variation alone. This assumption is almost never plausible in clinical systematic reviews, where different patient populations, intervention doses, follow-up periods, and outcome measures introduce genuine between-study variation in the true effect.

A random-effects model assumes the true effects vary across studies and that the included studies represent a sample from a distribution of true effects. The model estimates both the average true effect and the between-study variance (tau-squared). This is the appropriate model for most clinical systematic reviews.

Since January 2024, Cochrane RevMan uses REML (restricted maximum likelihood) as the default estimator for tau-squared. REML replaced the DerSimonian-Laird moment estimator because DerSimonian-Laird performs poorly in meta-analyses with few studies, which is a common situation in Cochrane reviews. The R metafor package (Viechtbauer, Journal of Statistical Software, 2010) also uses REML as its default. A meta-analysis submitted to a methods-focused journal in 2026 that uses DerSimonian-Laird without a documented reason for the choice may receive a reviewer comment.

Step 4: Estimate and Interpret Heterogeneity

Heterogeneity in meta-analysis is the variation in results between studies beyond what would be expected from sampling variation alone. It is assessed using three complementary statistics.

I-squared. I-squared estimates the proportion of the total variance in the meta-analysis that is due to between-study variation. An I-squared of 0 percent indicates no detected between-study variation. An I-squared of 75 percent indicates that most of the observed variation is not due to chance. I-squared does not measure the magnitude of heterogeneity, only the proportion. Two meta-analyses can have the same I-squared but very different magnitudes of between-study variation.

Tau-squared and tau. Tau-squared is the estimated variance of the true effects across studies. Tau (its square root) is in the same units as the effect size and provides an interpretable measure of how much the true effect varies across studies. A tau of 0.5 for a log odds ratio indicates that most true effects fall within a range of approximately 1 standard deviation either side of the average, which is substantial variation.

The 95 percent prediction interval. The prediction interval estimates the range within which the true effect would be expected to fall in a new study drawn from the same distribution as the included studies. It is derived from the average effect, the tau estimate, and the uncertainty around both. Under Cochrane Handbook v6.5, prediction intervals are now a standard output of all random-effects meta-analyses and must be reported and interpreted. A confidence interval of 0.65 to 0.85 for a risk ratio suggests a consistent beneficial effect. A prediction interval of 0.30 to 1.20 for the same meta-analysis suggests that the effect may be harmful in some settings, even though the pooled estimate is beneficial. These two statistics tell very different stories.

Step 5: Produce the Forest Plot

The forest plot shows the effect estimate and confidence interval from each study as a horizontal line with a box at the point estimate. The pooled estimate appears as a diamond at the bottom. The width of the diamond represents the confidence interval for the pooled estimate.

Each study's contribution to the pooled estimate is indicated by the weight, expressed as a percentage. In a random-effects model, weights are more evenly distributed. The model down-weights very large studies and up-weights smaller ones compared to a fixed-effect approach.

The forest plot should display study effect sizes and confidence intervals, study weights, the pooled estimate and its confidence interval, the prediction interval, I-squared, tau-squared, and total events and participants.

Step 6: Assess Publication Bias

Publication bias occurs when studies with statistically significant or positive results are more likely to be published than studies with null or negative results. This creates a selective literature that overstates the true effect size.

The funnel plot displays each study's effect size on the horizontal axis and its precision (standard error or sample size) on the vertical axis. In the absence of publication bias, the plot should resemble a symmetric inverted funnel. Asymmetry, particularly the absence of studies in the lower left corner of the plot (small studies with negative results), is consistent with publication bias.

Cochrane Handbook v6.5 recommends funnel plots only when ten or more studies contribute to the meta-analysis. Plots with fewer studies have low power to detect asymmetry. Egger's test provides a statistical test of funnel plot asymmetry, but like the visual assessment, it lacks power with fewer than ten studies.

The trim-and-fill method estimates the number of missing studies and recalculates the pooled estimate after imputing them. It provides an adjusted pooled estimate that accounts for suspected publication bias.

Step 7: Conduct Subgroup and Sensitivity Analyses

Subgroup analyses investigate whether the effect differs between pre-specified groups of studies or participants. Common subgroup variables include age group, severity of the condition at baseline, dose of the intervention, risk of bias category, and study design. Subgroup analyses must be pre-specified in the protocol. Post-hoc subgroup analyses, conducted after seeing the results, carry a high risk of false-positive findings and should be clearly labelled as exploratory.

Sensitivity analyses test whether the pooled estimate is robust to specific methodological decisions. Common sensitivity analyses include re-running the meta-analysis excluding high-risk-of-bias studies, excluding outlier studies with effect sizes far from the pooled estimate, and re-running under a fixed-effect model to compare with the random-effects result.

Both subgroup and sensitivity analyses should be reported in a dedicated section of the manuscript, with the primary analysis clearly distinguished from exploratory investigations.

Need a meta-analysis run to current Cochrane standards with reproducible code and GRADE tables?
ScribeLab Writer's meta-analysis service covers model selection, REML tau-squared estimation, I², tau², and prediction intervals, forest plots with reproducible R or Stata code, funnel plots, subgroup and sensitivity analyses, and GRADE Summary of Findings tables. The service starts from $750 with a free itemized quote within 24 hours. Submit your project details and a PhD methodologist will respond within 2-4 hours.

Need a meta-analysis run to current Cochrane standards with reproducible code and GRADE tables?

ScribeLab Writer's meta-analysis service covers model selection, REML tau-squared estimation, I², tau², and prediction intervals, forest plots with reproducible R or Stata code, funnel plots, subgroup and sensitivity analyses, and GRADE Summary of Findings tables. The service starts from $750 with a free itemized quote within 24 hours. Submit your project details and a PhD methodologist will respond within 2-4 hours.

Step 8: Apply GRADE to the Meta-Analytic Evidence

GRADE certainty ratings for meta-analytic evidence start at High (for RCTs) or Very Low (for observational studies) and are adjusted based on five downgrading criteria and three upgrading criteria.

Downgrading reasons:

Risk of bias: if most of the included studies have serious or critical risk-of-bias concerns, certainty is downgraded by one or two levels.

Inconsistency: if I-squared is high (above 50 percent), the prediction interval crosses the null, and no pre-specified explanation resolves the heterogeneity, certainty is downgraded.

Indirectness: if the population, intervention, comparator, or outcome in the included studies differs from the population and question of interest, certainty is downgraded.

Imprecision: if the 95 percent confidence interval of the pooled estimate crosses a meaningful threshold (the minimal important difference or the null), certainty is downgraded.

Publication bias: if the funnel plot or other evidence suggests important publication bias, certainty is downgraded.

Upgrading reasons for observational evidence:

Very large effect size (risk ratio greater than 2.0 or less than 0.5), dose-response relationship, or all plausible confounders would increase the observed effect. Each can upgrade certainty by one level if present.

Common Errors in Published Meta-Analyses

Table 2: Common Meta-Analysis Errors and How to Avoid Them in 2026

Error	Why It Matters	How to Avoid It
DerSimonian-Laird without justification	Produces biased tau² estimates in small meta-analyses. Not the Cochrane default since January 2024.	Use REML as the default. Document the estimator choice. If using DerSimonian-Laird, justify why REML is not appropriate for your specific situation.
Reporting I² only	I² measures the proportion, not the magnitude, of heterogeneity. Without tau² and prediction intervals, the reader cannot assess the clinical significance of between-study variation.	Always report I², tau², and the 95% prediction interval together. All three are standard outputs under the Cochrane Handbook v6.5.
Funnel plot with fewer than 10 studies	Insufficient power to detect asymmetry. A symmetric funnel plot with 5 studies cannot be interpreted as evidence against publication bias.	Only produce funnel plots and run Egger's test when 10 or more studies contribute to the analysis. For fewer studies, acknowledge the inability to formally assess publication bias.
Fixed-effect model without justification	Assumes all included studies estimate the same true effect. Almost never satisfied in clinical reviews. Produces artificially narrow confidence intervals.	Use random-effects as the default. Document the clinical and methodological rationale if a fixed-effect model is chosen. A zero I² is not sufficient justification.
Post-hoc subgroup analyses presented as confirmatory	Subgroup analyses not pre-specified in the protocol have a high false-positive rate and are exploratory, not confirmatory.	Pre-specify all subgroup analyses in the PROSPERO protocol. Label any post-hoc analyses clearly as exploratory. Interpret with corresponding caution in the discussion.
Pooling studies with incompatible populations or interventions	Produces a pooled estimate that does not correspond to any real clinical scenario and may mislead clinical decision-making.	Assess clinical similarity before pooling. If populations or interventions differ substantially, use narrative synthesis or pre-specified subgroup analyses rather than a pooled estimate.
Missing GRADE Summary of Findings tables	Most clinical journals and all Cochrane reviews require GRADE certainty ratings. A meta-analysis without GRADE will be returned for revision at the majority of Tier 1 journals.	Use GRADEpro GDT (gradepro.org) to construct Summary of Findings tables. Rate certainty for each pre-specified outcome. Include the completed GRADE tables as supplementary material if journal word limits prevent inclusion in the main text.

Using DerSimonian-Laird without justification. Since the January 2024 RevMan update, REML is the Cochrane-recommended default. DerSimonian-Laird underperforms in meta-analyses with few studies, which describes many Cochrane reviews.

Reporting I-squared alone. I-squared measures the proportion, not the magnitude, of heterogeneity. Without tau-squared and a prediction interval, the reader cannot assess whether the between-study variation is large enough to matter clinically.

Running a funnel plot with fewer than ten studies. Funnel plots have insufficient power to detect asymmetry reliably with fewer than ten studies. Reporting a symmetric funnel plot with five or six studies as evidence against publication bias is not valid.

Treating a non-significant Egger's test as evidence of no publication bias. A non-significant Egger's test does not prove the absence of publication bias. It means the test lacked sufficient statistical power to detect it, which is the expected result in most small meta-analyses.

Applying a fixed-effect model without justification. A fixed-effect model assumes all included studies estimate the same true effect. This assumption is almost never satisfied in clinical systematic reviews. A fixed-effect model used because the I-squared is zero is not a justified choice: a zero I-squared in a small meta-analysis does not mean the true effects are identical, only that the estimate of between-study variance is imprecise.

Post-hoc subgroup analyses are presented as confirmatory. Subgroup analyses that were not pre-specified in the protocol and are conducted after seeing the overall results have a very high false-positive rate. These analyses should be clearly labelled as exploratory and interpreted with corresponding caution.

Software for Meta-Analysis in 2026

RevMan 6 (Cochrane's review management software) is the standard platform for Cochrane reviews. The January 2024 update added REML, HKSJ confidence intervals, and prediction intervals to the random-effects output. RevMan 6 is available free for Cochrane authors.

The R metafor package (Viechtbauer, 2010) is the most flexible platform for meta-analysis in R. It handles a wide range of effect sizes, random-effects models, moderator analyses, and diagnostic plots. The package defaults to REML. The metafor package syntax for a basic random-effects meta-analysis is rma(yi, vi, data=dataset, method="REML").

Stata meta-analysis commands (meta, metan) provide similar functionality for researchers in a Stata environment. Stata is widely used in health economics, epidemiology, and clinical research.

Frequently Asked Questions

What is the difference between a fixed-effect and a random-effects meta-analysis?

A fixed-effect model assumes all included studies estimate the same true underlying effect. All observed variation is attributed to sampling variation. A random-effects model assumes that the true effect varies across studies and that the included studies are a sample from a distribution of true effects. In clinical systematic reviews, the random-effects model is almost always the more appropriate choice because clinical and methodological variation across studies means true effects are unlikely to be identical.

Why did Cochrane change from DerSimonian-Laird to REML in 2024?

DerSimonian-Laird produces biased estimates of between-study variance (tau-squared) when the number of included studies is small, which is common in Cochrane reviews. REML produces less biased tau-squared estimates and, when combined with HKSJ confidence interval adjustment, produces more accurate confidence intervals in meta-analyses with few studies. The January 2024 RevMan update reflects years of methodological development documented in the Cochrane Handbook.

How many studies do I need to conduct a meta-analysis?

There is no minimum threshold, but meta-analyses with two or three studies should be interpreted very cautiously. With so few studies, the tau-squared estimate is unreliable, the prediction interval is very wide, and the pooled estimate can be unduly influenced by a single outlier study. Some methodologists recommend a minimum of five studies for a reliable random-effects estimate. A meta-analysis with fewer studies can still be reported, but should be clearly labelled as preliminary.

What should I do when I-squared is very high?

A high I-squared (above 75 percent) indicates substantial between-study variation. The appropriate response is not to exclude studies arbitrarily until I-squared falls to an acceptable level. Instead, investigate the source of heterogeneity using pre-specified subgroup analyses. If no explanation is found, consider whether the included studies are sufficiently similar to be pooled at all. A narrative synthesis may be more appropriate when clinical or methodological heterogeneity is large and unexplained.

Can I use SPSS for meta-analysis?

SPSS does not have built-in meta-analysis functions. Researchers using SPSS-only environments should consider using R (free, with the metafor package), RevMan (free for Cochrane authors), or Stata (paid license). SPSS macros for meta-analysis have been published, but they do not support the full range of analyses required for a current Cochrane Handbook-compliant meta-analysis, including REML and prediction intervals.

Running a Meta-Analysis That Meets Peer Review Standards

A meta-analysis that uses REML, reports tau-squared and prediction intervals, documents pre-specified subgroup analyses, and produces a GRADE Summary of Findings table meets the current Cochrane Handbook v6.5 and PRISMA 2020 standards. A meta-analysis that uses DerSimonian-Laird, reports I-squared only, and omits the prediction interval does not meet that standard and will receive reviewer comments requesting these statistics at any methods-focused journal.

ScribeLab Writer's meta-analysis service is led by credentialed researchers with published systematic reviews in the biomedical literature. The team delivers reproducible R or Stata code, REML-based models, I-squared, tau-squared, prediction intervals, subgroup and sensitivity analyses, and GRADE Summary of Findings tables. The service starts from $750. Submit your project details, and a PhD methodologist will respond within 24 hours.

All Articles Start Your Project

How to Do a Meta-Analysis: A Beginner's Guide With Examples