AI in Systematic Reviews: What RAISE Requires in 2026

The use of artificial intelligence in systematic reviews moved from an experimental practice to a methodological standard faster than the reporting infrastructure could keep up with it. By 2024, tools such as ASReview, Rayyan, Covidence's AI screening features, and large language models were being used at multiple stages of systematic review production, screening, data extraction, risk-of-bias assessment, and manuscript drafting, without any consensus on what needed to be disclosed, how, or to whom.

In November 2025, four major evidence synthesis organizations published a joint position statement that established that consensus. Published simultaneously across Cochrane Database of Systematic Reviews, Campbell Systematic Reviews, JBI Evidence Synthesis, and Environmental Evidence, the paper is titled "Position Statement on Artificial Intelligence (AI) Use in Evidence Synthesis Across Cochrane, the Campbell Collaboration, JBI, and the Collaboration for Environmental Evidence 2025" (Flemyng et al., Campbell Systematic Reviews 2025;21:e70074). It formally endorses the RAISE framework, where RAISE stands for Responsible use of AI in evidence SynthEsis. For systematic review teams who need support writing RAISE-compliant methods sections or AI disclosure statements that meet Cochrane, Campbell, and JBI submission requirements, ScribeLab Writer's systematic review writing service covers the full disclosure documentation stage.

This guide explains what RAISE requires at each stage of a systematic review, which tools fall within its scope, what a compliant methods section looks like, and what happens when AI use goes undisclosed under current journal submission policies.

Quick Answer:

RAISE requires that any AI or automation tool that "makes or suggests judgments" at any stage of a systematic review must be fully and transparently disclosed in the methods section. This covers AI-assisted screening, data extraction, risk-of-bias assessment, and any other judgment-involving stage. The disclosure must name the specific tool and version, describe the human-AI workflow, and report validation data showing the tool's accuracy on a test set. RAISE does not prohibit AI use. It prohibits undisclosed AI use.

What RAISE Is and Who It Applies To

RAISE is a joint position statement from Cochrane, Campbell, JBI, and the Collaboration for Environmental Evidence, published in November 2025. It applies to all systematic reviews submitted to any of these four organizations' journals. It has been adopted as guidance by journals across the biomedical, social science, and environmental evidence synthesis fields.

The statement is not a blanket restriction on AI. Its core principle is transparency: "any use of AI or automation that makes or suggests judgments should be fully and transparently reported." The keyword in that sentence is "suggests." RAISE covers not only tools that make autonomous decisions but also tools that propose decisions for human approval or rejection. A machine learning classifier that ranks records as likely relevant and presents them to a human reviewer for confirmation is an AI tool that suggests judgments, and it falls within RAISE's disclosure scope.

RAISE applies to all stages of the systematic review process where AI is involved in a judgment-relevant action. It does not distinguish between major and minor AI use. It does not have a threshold below which disclosure is not required.

AI Tools at Each Stage of a Systematic Review

Stage 1: Database Searching

AI tools used in database searching include semantic search engines (such as those built into some PubMed interfaces), AI-assisted query expansion tools, and tools that generate Boolean search strategies from natural language descriptions of the research question.

RAISE requires disclosure of any AI tool used to develop, refine, or execute the search strategy. This includes the tool name, its version, and whether it was used to supplement or replace traditional Boolean searching. A search strategy developed by an AI tool that was then peer-reviewed using the PRESS checklist should document both the AI-generated strategy and the PRESS peer review in the methods section. Our search strategy development service builds and PRESS-reviews the strategy, and documents AI-assisted query development to RAISE standard.

What RAISE does not require: disclosure of standard database platform features (such as MeSH term suggestion in PubMed) that are built into the database infrastructure and not applied as a separate AI layer. The distinction between native platform features and supplementary AI tools is not perfectly defined in the current guidance and is likely to be clarified in future updates.

A specific caution on LLM-generated search strategies. Large language models will produce a complete Boolean search string from a plain-language research question, and the output reads as authoritative. It frequently is not. LLM-generated strategies commonly omit relevant controlled-vocabulary terms (MeSH in MEDLINE, Emtree in Embase), misapply field tags, or introduce syntax errors that silently return incomplete results for an entire concept, none of which is visible from the string itself. Under RAISE, an AI-generated or AI-refined search must be disclosed as such, but disclosure does not make the strategy sound. The strategy still has to be validated the conventional way: tested against a set of known relevant articles to confirm they are retrieved, and peer-reviewed against the PRESS checklist by a second information specialist before the search is run (McGowan et al., 2016). An AI-drafted search is a first draft, not a finished one. Our search strategy development service builds the strategy to PRESS standard, validates recall against known studies, and documents any AI-assisted query development so the search satisfies both PRISMA-S and RAISE.

Stage 2: Title and Abstract Screening

This is the stage where AI tool use is currently most widespread and where RAISE has the most immediate practical impact. Tools commonly used include ASReview (active learning-based prioritization), Rayyan (ML-assisted relevance classification), Covidence's AI screening assistance, and custom machine learning classifiers built for specific review topics.

RAISE requires disclosure of all of the following for any AI tool used in title and abstract screening:

The tool name and version. "AI-assisted screening" is insufficient. "ASReview v2.1.5 with TF-IDF feature extraction and logistic regression model" meets the standard.

The human-AI workflow. Did human reviewers screen all records or only records that the AI flagged as potentially relevant? At what confidence threshold did AI-flagged records proceed to human review versus automatic exclusion? A workflow that automatically excludes records below a certain AI confidence threshold without human review must be disclosed, and the threshold specified.

The stopping rule, if one was used. ASReview and similar active learning tools are often used with a stopping rule; the review stops when a defined number of consecutive records have been rejected by both the AI and the human reviewer. If a stopping rule was used, the rule must be specified.

Validation data. How accurate was the AI classification? This is typically reported as sensitivity (the proportion of relevant records that the AI correctly identified as relevant) and specificity (the proportion of irrelevant records that the AI correctly excluded), measured against a gold-standard human classification on a test set. The composition of the test set and its relationship to the main review dataset should be described.

Stage 3: Full-Text Screening

AI tools applied to full-text screening face a more demanding task than title-abstract screening because full-text documents are longer, more variable in structure, and more nuanced in their relevance to specific eligibility criteria. LLM-based tools have been applied to this stage, but validation data for full-text AI screening is less developed than for title-abstract screening.

The RAISE disclosure requirements are identical to those for title-abstract screening: tool, version, workflow, stopping rule if applicable, and validation data.

Stage 4: Data Extraction

AI-assisted data extraction tools prompt LLMs to extract specific data fields, population characteristics, intervention details, outcome data, follow-up duration, and funding sources from full-text papers. The extracted data is then reviewed by a human extractor.

RAISE requires disclosure of AI data extraction tool use with the same specificity as screening disclosure. Additionally, because data extraction errors directly affect the pooled estimates in a meta-analysis, the methods section should describe the validation procedure used to assess the accuracy of AI-extracted data against human extraction on a sample of included studies.

The single most consequential data extraction error in meta-analysis is the confusion of standard deviation and standard error. An AI tool that systematically misidentifies which measure of spread has been reported in a paper will introduce a systematic error into the meta-analysis that human verification of a small sample may not catch. The validation sample for AI data extraction should be large enough and representative enough to detect systematic misclassification of this kind. Because these extraction errors propagate directly into the pooled estimate, our meta-analysis service validates the extracted dataset before synthesis and delivers the analysis with reproducible R or Stata code.

Stage 5: Risk-of-Bias Assessment

Several studies have tested LLM performance on RoB 2 risk-of-bias assessment, with mixed results. LLMs perform reasonably on signaling questions when the trial report is complete and unambiguous, but produce confident, incorrect judgments on trials with incomplete reporting or complex designs.

RAISE requires disclosure of any AI tool used to generate, suggest, or assist with risk-of-bias judgments. The disclosure must specify which domains the AI tool was applied to, how human reviewers interacted with the AI-generated judgments (review and confirm, review and override, or independent parallel assessment), and what validation data exists for the AI tool's performance on the study designs included in this review.

A risk-of-bias assessment completed with AI assistance, where no validation data is reported, is a methodological weakness that peer reviewers at methodology-focused journals will flag. The validation requirement is particularly important for risk-of-bias because the certainty ratings in the GRADE Summary of Findings table depend directly on the quality of the risk-of-bias data.

Accountability stays with the authors, not the tool. The first principle of the RAISE recommendations is that evidence synthesists remain ultimately responsible for their synthesis, including the decision to use AI and the consequences of that use (Flemyng et al., 2025). This has a practical meaning that disclosure language can obscure: if an AI tool misreads a standard error as a standard deviation during extraction, or returns a confident but incorrect risk-of-bias judgment on an incompletely reported trial, the resulting error attaches to the named authors and, if it reaches publication, to any correction or retraction that follows. RAISE does not treat AI as a party that shares responsibility. It treats AI as a tool whose output the human team is obligated to verify. That is why a defined second-human check at each AI-assisted judgment stage, on extracted outcome data, on suggested risk-of-bias domains, and on prioritized screening decisions, is not administrative caution but the mechanism by which authors discharge accountability they cannot delegate. Where a team prefers that verification be performed by an independent methodologist rather than a second member of the same lab, our screening, data extraction, and risk-of-bias service supplies the independent human review and documents where each human judgment is entered into the process.

Need a RAISE-compliant methods section before your next submission deadline?
Writing a methods section that correctly discloses AI tool use across screening, data extraction, and risk-of-bias assessment with tool names, version numbers, human-AI workflow descriptions, stopping rules, and validation data requires applying RAISE to stages most teams complete before reading the guidance. ScribeLab Writer's systematic review writing team works with research teams to develop RAISE-compliant methods sections and AI disclosure statements for reviews targeting Cochrane, Campbell, JBI, and peer-reviewed journals.

Need a RAISE-compliant methods section before your next submission deadline?

Writing a methods section that correctly discloses AI tool use across screening, data extraction, and risk-of-bias assessment with tool names, version numbers, human-AI workflow descriptions, stopping rules, and validation data requires applying RAISE to stages most teams complete before reading the guidance. ScribeLab Writer's systematic review writing team works with research teams to develop RAISE-compliant methods sections and AI disclosure statements for reviews targeting Cochrane, Campbell, JBI, and peer-reviewed journals.

Stage 6: Manuscript Preparation

The use of AI tools for manuscript writing, grammar checking, sentence-level editing, prose drafting, or structural assistance falls outside the core RAISE framework, which is focused on judgment-making in the systematic review process itself. However, most high-impact clinical journals, including BMJ, Lancet, JAMA, and Annals of Internal Medicine, have separate author policies on AI use in manuscript preparation that require disclosure. These are distinct from RAISE and should be checked in the author guidelines of the specific target journal.

What a RAISE-Compliant Methods Section Looks Like

A RAISE-compliant methods section addresses AI disclosure at each stage where AI was used. Below is an example of what that disclosure looks like for a review that used AI-assisted screening.

"Title and abstract screening was conducted using ASReview v2.1.5 with active learning and TF-IDF feature extraction. All records were imported, and an initial training set of 50 records (25 relevant, 25 irrelevant) was manually classified to initialize the model. The AI tool then prioritized remaining records by predicted relevance, and two independent human reviewers screened records in the order presented by the tool. A stopping rule was applied: screening concluded when 200 consecutive records had been classified as irrelevant by both the AI model and both human reviewers. Sensitivity of the AI classification was validated against a gold-standard human classification of a random sample of 200 records drawn from the full dataset, achieving 96.4% sensitivity (95% CI 92.1 to 98.7). Two independent human reviewers conducted full-text screening without AI assistance."

This disclosure is specific, stage-appropriate, and provides the information a peer reviewer needs to evaluate the reliability of the screening process.

How to produce validation data

How the validation figures in that example are actually produced. The sensitivity statistic in the disclosure above is not generated automatically by the screening tool; it requires a deliberate validation step, and RAISE's guidance on evaluating AI tools treats this as central rather than optional (Thomas et al., 2025). To produce it, a random sample of records is drawn from the full retrieved set, screened independently by human reviewers who are blind to the AI's classification, and used as a gold standard against which the AI's decisions are compared to calculate sensitivity and specificity. Two design choices determine whether the validation is meaningful. The sample must be drawn from the whole dataset, not only from records the AI already ranked as relevant, because a validation set built from likely-relevant records cannot detect the relevant studies the AI wrongly excluded, which is precisely the error that matters. And the sample must be large enough that a low miss rate is measurable rather than an artefact of small numbers. A review that reaches submission without this step faces a choice between validating retrospectively on a drawn sample or disclosing that no validation was performed, and the latter invites a reviewer query at any journal applying RAISE. Our screening, data extraction, and risk-of-bias service builds this validation into the workflow and reports inter-rater agreement and AI sensitivity against a human-screened sample as standard.

Reporting Standards and the PRISMA 2020 Connection

RAISE operates alongside, not instead of, PRISMA 2020. The PRISMA 2020 checklist requires transparent reporting of the screening process (items 12 and 13), the search strategy (item 10), and the risk-of-bias assessment (item 19). AI use at any of these stages creates additional reporting obligations under RAISE that supplement the PRISMA items rather than replacing them.

Our complete PRISMA 2020 checklist guide covers all 27 items with the most common reporting failures and how to fix each before submission. AI disclosure under RAISE should be integrated into the relevant PRISMA items rather than added as a separate disclosure section.

What Happens When AI Use Is Undisclosed

Journal integrity processes are increasingly using AI detection tools at the pre-review stage. The pattern emerging from the 2025 Peer Review Congress data is that journals with targeted pre-submission screening, particularly for systematic reviews, are identifying undisclosed AI use at higher rates than previously expected. At one conference presentation, targeted AI screening of systematic review manuscripts raised the desk-rejection rate from 13% to 40% before any human review occurred.

Undisclosed AI use identified after publication creates a more serious problem. Correction and retraction processes for systematic reviews that used AI without disclosure are still developing, but the trend in journal editorial policy is toward treating undisclosed AI use with the same severity as other transparency failures. A systematic review retracted for undisclosed AI use affects not only the publication record but potentially the clinical guidelines and health policy decisions built on its conclusions. The cleanest way to avoid an AI-disclosure problem at a judgment-making stage is to have that stage conducted by trained human reviewers. ScribeLabWriter's screening, data extraction, and risk-of-bias work is performed by PhD-qualified methodologists, not automated tools, so the stages RAISE scrutinizes most carry a straightforward, defensible disclosure.

Understanding why systematic reviews get rejected increasingly includes this transparency dimension as journals formalize their AI policies.

Which AI Tools Are Currently in Use

The AI tools most commonly used in systematic review production as of 2026, and their primary application stages, are the following.

ASReview (Utrecht University, open-source): active learning-based screening prioritization. Used for title-abstract screening. Validated across multiple review domains. Free. Requires Python installation.

Rayyan (Qatar Computing Research Institute): ML-assisted screening with blind mode for dual-reviewer blinding. Used for title-abstract and initial full-text screening. Free tier available.

Covidence (Cochrane-affiliated): subscription-based review management platform with integrated AI screening assistance. Used across screening stages. Standard tool for Cochrane reviews.

Nested Knowledge (subscription): AI-assisted screening, data extraction, and qualitative coding. Used across multiple stages, including tagging and hierarchy building.

Large language models (GPT-4, Claude, Gemini): used ad hoc for data extraction, risk-of-bias assessment, and manuscript preparation. The most variable in terms of RAISE compliance because use is self-directed and validation is rarely conducted. LLM use at judgment-making stages requires the same validation and disclosure as purpose-built tools.

Frequently Asked Questions

Does RAISE apply to all systematic reviews or only those submitted to Cochrane and JBI?

RAISE was published jointly by Cochrane, Campbell, JBI, and CEE, but its adoption has extended beyond these four organizations. Most major biomedical journals have adopted or are in the process of adopting the RAISE framework, either explicitly by name or through equivalent AI disclosure requirements in their author guidelines. Before submission, check the current author instructions for your target journal regarding AI disclosure. The safest approach is to apply RAISE standards to all systematic reviews regardless of the target journal.

Do I need to disclose AI tools used only for convenience, not for judgments?

The RAISE threshold is whether the tool "makes or suggests judgments." Tools used for administrative convenience without influencing any review decision do not require RAISE disclosure. Examples include reference managers (Zotero, Mendeley), deduplication software, and formatting tools. However, AI features within reference managers or screening platforms that suggest relevance rankings or assist with decision-making fall within scope.

What if I used ChatGPT or another LLM to help write my methods section?

Manuscript writing assistance with LLMs falls under individual journal policies rather than RAISE specifically. RAISE covers AI use in the evidence synthesis process. Most high-impact journals now require disclosure of AI use in manuscript preparation, typically in an author statement. Check the specific journal's current policy before submission.

Is RAISE compliance retroactive for reviews already in progress?

Reviews already in progress should apply RAISE standards from the current stage forward and document what AI tools were used before RAISE was published. For reviews that used AI tools before November 2025 without documentation, a retrospective account of what tools were used, at which stages, and with what human oversight is better than omitting the disclosure.

Can I use AI to replace one of my two independent reviewers?

No. RAISE does not change the dual-independent-reviewer requirement for screening and data extraction. AI tools assist or prioritize human review; they do not substitute for independent human judgment. A single-reviewer process supplemented by AI is still a single-reviewer process and carries the same risk-of-bias designation. If your team lacks a second independent reviewer, our screening, data extraction, and risk-of-bias service provides the trained second reviewer AI cannot substitute for, with a documented inter-rater reliability log.

You May Also Find Useful

When Undisclosed AI Use Becomes a Submission Problem

A systematic review that used AI tools at any judgment-making stage without a RAISE-compliant disclosure statement now risks desk rejection at Cochrane, Campbell, JBI, and most major biomedical journals, or correction and retraction after publication when the gap is identified during post-publication review. The risk is not theoretical: the 2025 Peer Review Congress data shows that targeted pre-submission screening raised desk-rejection rates for systematic reviews from 13 to 40 percent when undisclosed AI use was identified. ScribeLab Writer's systematic review writing team, led by credentialed researchers with published systematic reviews in the biomedical literature, works with research teams to develop RAISE-compliant methods sections, stage-specific AI disclosure statements, and validation documentation for reviews using AI tools at any stage of the process. Submit your disclosure requirements and target journal through the inquiry form, and a member of the team will respond within 2-4 hours.

All Articles Start Your Project

AI Tools in Systematic Reviews: The Complete RAISE Compliance Guide for 2026