Stuck on My Systematic Review? Diagnose and Move Forward

If you are stuck on a systematic review, you are not in an unusual position. The mean completion time for a systematic review is 67.3 weeks with a team of five (Borah et al., BMJ Open, 2017). Demetres and colleagues' 2023 analysis of Weill Cornell Medicine's systematic review service found that the average review took 642 days from initiation to publication, with the longest taking 1,408 days. Studies tracking PROSPERO-registered reviews estimate that approximately half of all registered systematic reviews are never published.

Long timelines and stalled projects are not failures of effort. They are almost always failures of infrastructure, team composition, or methodology. Each stall point has a specific cause and a specific fix.

This article covers the six most common stall points in a systematic review, which diagnostic question identifies each one, and the specific action to take at each stage.

Most systematic reviews stall at one of six identifiable points, and each stall point has a specific cause and a specific fix that does not require restarting the entire review. ScribeLab Writer's systematic review service provides modular support for researchers who are stuck at any single stage, without requiring full-review engagement.

Quick Answer:

The six most common systematic review stall points are: (1) PICO not defined precisely enough to build a search, (2) search volume overwhelm after results retrieval, (3) screening stalled because no second reviewer is available, (4) data extraction complexity or missing data, (5) risk-of-bias assessment using the wrong tool or completed incorrectly, and (6) meta-analysis or GRADE assessment requiring statistical expertise the team does not have. Diagnose your stall point by identifying the last action you completed and asking why the next step has not started. Each stall point has a modular fix that does not require restarting from the protocol.

Why Stalling Is Normal, and Why That Does Not Help

The data on systematic review completion timelines is striking. Borah et al.'s 2017 BMJ Open analysis of 195 PROSPERO-registered reviews found a mean of 67.3 weeks to publication and a range of 6 to 186 weeks. The mean team size was five authors. At the Weill Cornell systematic review service, the average of 642 days to publication spans nearly two academic years. One review in that dataset took 1,408 days.

Unpublished completion rates add a more sobering dimension. A cohort study of PROSPERO-registered pain and anesthesiology systematic reviews found that approximately half were unpublished at 1.3 years after registration. A separate meta-epidemiological analysis found 26% of PROSPERO protocols remained unpublished at five or more years after registration. The most common abandonment point, per the Weill Cornell dataset, was the title/abstract screening phase.

These figures establish something important: a systematic review that is stalled is not unusual. But they do not tell you what to do next. For that, you need to diagnose which stall point you have hit.

The Diagnostic Question

Before anything else, write down the answer to this question: what is the last complete action you have taken, and what specifically has prevented the next step from starting?

The answer almost always maps onto one of six stall points. The diagnosis is not about motivation or time management. It is about infrastructure: what specifically is missing, wrong, or blocked.

Stall Point 1: PICO Is Not Precise Enough to Build a Search

What it looks like. You have a topic, a general interest area, and a sense of what you want to examine. But when you sit down to design the search strategy or define the eligibility criteria, you cannot make the inclusion and exclusion criteria specific enough to screen consistently.

Why it happens. The PICO question is still at the research-interest level rather than the eligibility-criterion level. "The effect of exercise on mental health in older adults" is a research interest. "The effect of structured aerobic exercise programs of at least 12 weeks duration on depression scores (PHQ-9 or equivalent) in community-dwelling adults aged 65 and older" is a PICO question that can generate eligibility criteria.

The fix. Return to the PICO and apply specificity tests to each component. For P (population): what age range, diagnosis, setting, and country qualify? For I (intervention): what type, duration, frequency, and setting? For C (comparison): usual care, waitlist, active control, or any comparator? For O (outcome): which specific measures, at which timepoints? For T: what study duration threshold qualifies? Write one inclusion criterion and one exclusion criterion for each component before moving to the search.

Stall Point 2: Search Volume Overwhelm

What it looks like. You have run the searches, the results have been imported, and the number of records is much larger than expected. Screening 12,000, 20,000, or 40,000 records without a structured plan and appropriate tools feels impossible. The review has stopped here.

Why it happens. A search strategy that uses broad MeSH terms without appropriate Boolean narrowing, or that searches too many databases without deduplication, generates volumes that exceed what an individual researcher can screen in a reasonable timeframe. A search retrieving 40,000 records before deduplication is not inherently wrong, but it requires a screening infrastructure that most individual researchers have not built.

The fix. There are three parallel actions. First, run deduplication in your reference manager (Zotero, EndNote, or a dedicated platform) before importing to the screening tool. Records imported from multiple databases commonly overlap 30% to 60%. Second, implement AI-assisted prioritization using Abstrackr or ASReview. Van de Schoot et al.'s 2021 Nature Machine Intelligence study found a mean work savings of 83% (range 67% to 92%) at 95% recall using active-learning tools. For a 20,000-record set, this can reduce the screening workload to 3,400 to 6,600 records while retaining 95% of eligible studies. Third, revisit the search strategy with a research librarian to identify whether terms can be narrowed without losing sensitivity. Removing irrelevant subfield terms often reduces volume substantially without changing the PICO scope.

Stall Point 3: No Second Reviewer for Screening

What it looks like. You have completed title/abstract screening alone. You know you need a second reviewer for the full-text stage, or for both stages. You do not have one. The review has stopped at the screening stage because the MECIR C39 standard requires dual independent full-text inclusion decisions, and you do not know how to find or train a qualified second reviewer.

Why it happens. Most individual researchers who start a systematic review outside a team context do not arrange a second reviewer at the protocol stage. Finding one after searching has begun is more difficult because the second reviewer must be trained on eligibility criteria that may not be fully documented.

The fix. Our dedicated article on finding and calibrating a second reviewer covers this in full. The practical options are a research librarian, a colleague who can be trained through the 30 to 50 record pilot calibration process, or a professional second-reviewer service. Before anything else, write out the eligibility criteria as explicit inclusion and exclusion rules. This document is what the second reviewer will work from.

Stall Point 4: Data Extraction Complexity or Missing Data

What it looks like. Included studies are diverse in how they report outcomes, populations, and methods. Some studies report the outcomes you need; others report related but different outcomes. Some report means and standard deviations; others report medians and ranges. Some are missing critical data elements. The extraction process has stalled because the data does not map cleanly onto the extraction form.

Why it happens. A data extraction form that was designed during the protocol stage for ideally structured studies encounters the reality of heterogeneous reporting. Missing data requires contacting authors or making judgments that the extraction form does not accommodate. Studies using incompatible outcome measures require a transformation or conversion decision that requires methodological expertise.

The fix. Three specific actions address most data extraction stalls. First, expand the extraction form to capture what is actually reported, not only what was ideally planned. Add fields for the actual reported measure, the actual sample descriptor, and notes on data quality for each study. Second, for missing outcome data, draft a standardized author contact email requesting the specific data elements you need. Many authors will respond within two to four weeks. Third, for incompatible outcome measures, decide at the extraction stage which transformation or conversion rule you will apply and document it. Converting standard error to standard deviation, or extracting log-transformed data, requires a documented decision, not a workaround.

Stall Point 5: Risk-of-Bias Assessment Problems

What it looks like. You have reached the risk-of-bias stage but are uncertain which tool applies to your study designs, how to answer the domain-level questions, or how to reconcile disagreements between reviewers on RoB judgments.

Why it happens. Risk-of-bias tool selection is specific to study design, and the domains within each tool require methodological judgment that is not always self-evident from the tool's instructions. ROBINS-I V2 (updated November 2025) introduced algorithm-based domain judgments and explicit handling of immortal-time bias and prevalent-user bias. These are not easy to apply without prior methodological training. Applying the Newcastle-Ottawa Scale to randomized controlled trials, or applying RoB 2 to non-randomized studies, are both reviewable errors that peer reviewers flag.

The fix. Confirm the correct tool for each study design in your review: RoB 2 for RCTs (applied per outcome, not per study); ROBINS-I V2 for non-randomized studies of interventions; QUADAS-2 for diagnostic accuracy studies; AMSTAR-2 for included systematic reviews; QUIPS for prognosis studies. If your review includes multiple study designs, apply the appropriate tool separately for each design and present the results by tool in the manuscript. Our full guide on ROBINS-I V2 covers the November 2025 changes in detail.

Stall Point 6: Meta-Analysis, Heterogeneity, and GRADE

What it looks like. You have extracted the data, completed the risk-of-bias assessments, and are now facing the statistical synthesis and GRADE certainty rating. You do not have the statistical training to run the meta-analysis correctly, interpret the heterogeneity statistics, or complete the GRADE Summary of Findings tables. The review has stopped here.

Why it happens. Meta-analysis requires specific statistical knowledge: choice of effect measure, heterogeneity assessment (I², tau², prediction interval), model selection (fixed vs random effects, REML estimator), subgroup analysis, and sensitivity analysis. GRADE requires a structured judgmental process across five domains (risk of bias, inconsistency, indirectness, imprecision, publication bias) for each pre-specified outcome. Neither process is amenable to self-teaching at the analysis stage of a live review.

The fix. Meta-analysis is the stage where specialist statistical support has the clearest return. Our meta-analysis guide covers the methodology. If the synthesis is stalled because of statistical complexity, a biostatistician with systematic review experience can complete the analysis, run the forest plots, and produce the GRADE Summary of Findings tables as a standalone engagement. The key input they need is your extracted data table, your pre-specified outcomes, and your PROSPERO registration. The key output is the analysis file, forest plots, and GRADE SoF tables formatted to your target journal's specifications.

Table 1: Systematic Review Stall Point Diagnostic: Stage, Symptom, Cause, and Fix

Stall Point	Symptom	Root Cause	Specific Fix
1. PICO not specific	Cannot write inclusion/exclusion criteria that are specific enough to screen consistently	The research question is still at the "interest area" level, not operationalized to a specific population, intervention type, comparator, and outcome measure	Write one explicit inclusion criterion and one explicit exclusion criterion for each PICO component before proceeding to the search
2. Search volume overwhelm	The record set is larger than expected (10,000+); screening feels impossible; the review has not progressed since importing results	Broad search terms without appropriate Boolean narrowing, inadequate deduplication, and no AI-prioritization tool in place	Run deduplication first (30–60% of records typically removed); implement ASReview or Abstrackr for AI prioritization (mean 83% workload reduction at 95% recall); consult a librarian to narrow terms without changing PICO
3. No second reviewer	Title/abstract screening completed by one person; the full-text stage has not started because no second reviewer is available	The second reviewer was not arranged at the protocol stage; the team member who was to serve as the second reviewer is no longer available	Document eligibility criteria as explicit rules; recruit from: institutional colleagues, research librarians, professional services. Run 30–50 record pilot calibration before main full-text screening
4. Data extraction complexity	Studies report outcomes in incompatible formats; some studies are missing critical data; the extraction form does not accommodate what studies actually report	Extraction form designed for ideally structured studies encounters heterogeneous reporting; no pre-specified plan for missing data or outcome conversion	Expand the extraction form to capture what is actually reported; draft standardized author contact emails for missing data; document outcome conversion rules (e.g., SE to SD) before applying them
5. Risk-of-bias problems	Uncertain which tool to use; unable to answer domain-level questions; reviewer disagreements on RoB judgments cannot be resolved	Wrong tool applied to study design (e.g., NOS for RCTs); RoB 2 applied at study level rather than per outcome; ROBINS-I V2 complexity without prior training	Confirm correct tool per design: RoB 2 (RCTs, per outcome); ROBINS-I V2 (NRS interventions); QUADAS-2 (diagnostic); AMSTAR-2 (included SRs); QUIPS (prognosis). Contact a methodologist for ROBINS-I V2 training if needed
6. Meta-analysis / GRADE	Data is extracted, and RoB is complete; the review has stalled at the statistical synthesis stage because no team member has meta-analysis or GRADE expertise	Meta-analysis and GRADE require specific statistical training that was not identified as a requirement at the protocol stage; learning while doing in a live review is slow and error-prone	Commission a biostatistician with SR experience as a standalone engagement. Required inputs: extracted data table, pre-specified outcomes, PROSPERO registration. Outputs: forest plots, heterogeneity statistics, GRADE SoF tables

Stuck at a specific stage and need modular support rather than a full review engagement?
ScribeLab Writer provides standalone support for every stage of a systematic review: second-reviewer screening with kappa documentation, data extraction form development and completion, risk-of-bias assessment using the correct validated tool, meta-analysis and GRADE Summary of Findings table production, and methods section writing. You do not need to hand over the full review. Submit the stage you are stuck at, and we will scope the specific support required. Tell us where the review has stopped, and a methodologist will respond within 2-4 hours.

Stuck at a specific stage and need modular support rather than a full review engagement?

ScribeLab Writer provides standalone support for every stage of a systematic review: second-reviewer screening with kappa documentation, data extraction form development and completion, risk-of-bias assessment using the correct validated tool, meta-analysis and GRADE Summary of Findings table production, and methods section writing. You do not need to hand over the full review. Submit the stage you are stuck at, and we will scope the specific support required. Tell us where the review has stopped, and a methodologist will respond within 2-4 hours.

Should You Rescue the Review or Restart It?

Most stalled reviews are rescuable without restarting. The diagnostic question is simple: has the stall compromised the fundamental research question, methodology, or data foundation? Or has the review paused at a stage that needs specialist input to continue?

A review should be rescued when: the PICO is clearly defined; the search was conducted according to a pre-specified strategy; PROSPERO registration is in place; the screening process, even if incomplete, followed a documented protocol; and the data that exists is reliable.

A review may need to restart from a specific stage when: the search strategy was fundamentally flawed (wrong databases, incorrect terms, no reproducible syntax); the screening criteria were applied inconsistently without a documented resolution process; the data extraction form does not capture what the synthesis requires; or protocol deviations occurred that would need to be disclosed and are significant enough to compromise the review's credibility.

Restarting the full review from protocol is rarely necessary. Restarting from the search stage (with a corrected strategy) or from the screening stage (with a corrected process) is common and appropriate. The PROSPERO registration can be updated with a protocol deviation note explaining the reason and the change.

Table 2: Should You Rescue or Restart? Systematic Review Recovery Decision Framework

Condition	Rescue (continue from current stage)	Restart from a specific stage	Full restart from protocol
PICO and protocol status	PICO clearly defined; protocol documented and PROSPERO-registered	PICO defined; minor protocol deviation occurred; deviation can be disclosed, and the stage repeated	PICO was never clearly defined; the research question has fundamentally changed; no protocol exists
Search strategy integrity	Search was reproducible and multi-database; a full search log exists; the search date was documented	Search was adequate but outdated (more than 12 months ago); re-run the search from the original end date	Search was PubMed-only or Google Scholar; no search syntax was documented; search was conducted after registration
Screening process	Screening criteria were applied consistently; the second reviewer was available for the remaining stages	Single-reviewer title/abstract completed; recruit second reviewer for all remaining full-text decisions	Criteria were applied inconsistently, with no resolution protocol; no record of which studies were screened or excluded, and why
Extracted data quality	Extraction form captured required PICO elements; data is usable as-is or with minor additions	Extraction form was incomplete; expand form and add missing fields; re-extract missing elements from included studies	Extraction was unsystematic; outcome data is missing for most studies; no standardized form was used
PROSPERO record	Registration in place; any deviations can be noted as amendments	Registration in place; update the PROSPERO record with a protocol deviation note explaining the change and its reason	No PROSPERO registration exists, and the review is at a stage where post-hoc registration would be a significant deviation; a new registration is required

PROSPERO accepts protocol amendments and deviation notes throughout the review lifecycle. The PROSPERO record should be kept up to date, and any significant changes should be disclosed in the manuscript's methods section.

Frequently Asked Questions

How long is it normal for a systematic review to take?

Borah et al.'s 2017 analysis of 195 PROSPERO-registered reviews found a mean of 67.3 weeks to publication with a team of five, and a range of 6 to 186 weeks. The Demetres 2023 Weill Cornell analysis found an average of 642 days. Both studies describe completed and published reviews. Stalled reviews that are abandoned take longer by definition. A review that is progressing steadily through each stage with appropriate resources is likely to be completed in 9 to 18 months. A review conducted part-time by an individual researcher without a team commonly takes significantly longer.

Is it possible to publish a systematic review that took several years to complete?

Yes, but you need to address the implications in the discussion. A review with a search conducted three or more years ago should update the search before submission. If not, disclose the original search date and note that recent literature was not included. Most editors will ask when the search was last conducted. An undisclosed, outdated search is a peer-review flag; a disclosed and contextualized one is not.

I am stuck because I cannot find enough included studies. What should I do?

Fewer studies than expected at the eligibility stage typically have one of three causes: overly restrictive eligibility criteria, insufficient database coverage, or a genuine evidence gap. First, review the eligibility criteria and consider whether any can be broadened without changing the core PICO question. Second, add supplementary search strategies: citation searching of included studies' reference lists, forward citation searching, and trial registry searching. Third, if the evidence base is sparse, consider whether a scoping review or a narrative synthesis is a more appropriate methodology than a quantitative systematic review with meta-analysis.

My co-author stopped contributing. Can I complete the review alone?

You can complete the analysis and writing alone, but the dual independent screening and data extraction requirements remain. If your co-author was also your second reviewer, recruit a replacement for the remaining stages or document a verified-exclusion approach where a new reviewer screens all excluded records. Update the PROSPERO record to reflect the change in team composition.

How do I restart a search without invalidating my existing work?

A search update does not invalidate previous work. Run the updated search from the date the original search ended. Import new records through the same screening platform and screen them against the same eligibility criteria. Add eligible new studies to the existing extracted dataset. Report both search dates in the methods section. The PRISMA flow diagram will need a note indicating that the results reflect a combined search from two dates. This is standard practice for reviews that took longer than expected to complete.

The Problem Is Not the Review. It Is the Stage.

A systematic review that is stuck at the screening stage is not a failed project. It is a project that needs a second reviewer. A review stuck at the meta-analysis stage is not a failed project. It is a project that needs a biostatistician for a defined deliverable.

The stall points described in this article are modular. Each one requires specific input at a specific stage. None of them requires discarding the work that has already been done.

If your review has stopped and you know which stage it is stuck at, ScribeLab Writer's systematic review team can scope the specific support you need for that stage. Submit the stage and what you have completed so far, and a methodologist will respond within 2-4 hours with a specific quote.

All Articles Start Your Project

Stuck on My Systematic Review? Here Is How to Diagnose the Problem and Move Forward