The use of artificial intelligence in systematic reviews moved from an experimental practice to a methodological standard faster than the reporting infrastructure could keep up with it. By 2024, tools such as ASReview, Rayyan, Covidence's AI screening features, and large language models were being used at multiple stages of systematic review production, screening, data extraction, risk-of-bias assessment, and manuscript drafting, without any consensus on what needed to be disclosed, how, or to whom.
In November 2025, four major evidence synthesis organizations published a joint position statement that established that consensus. Published simultaneously across Cochrane Database of Systematic Reviews, Campbell Systematic Reviews, JBI Evidence Synthesis, and Environmental Evidence, the paper is titled "Position Statement on Artificial Intelligence (AI) Use in Evidence Synthesis Across Cochrane, the Campbell Collaboration, JBI, and the Collaboration for Environmental Evidence 2025" (Flemyng et al., Campbell Systematic Reviews 2025;21:e70074). It formally endorses the RAISE framework, where RAISE stands for Responsible use of AI in evidence SynthEsis.
This guide explains what RAISE requires at each stage of a systematic review, which tools fall within its scope, what a compliant methods section looks like, and what happens when AI use goes undisclosed under current journal submission policies.
Quick Answer:
RAISE requires that any AI or automation tool that "makes or suggests judgements" at any stage of a systematic review must be fully and transparently disclosed in the methods section. This covers AI-assisted screening, data extraction, risk-of-bias assessment, and any other judgment-involving stage. The disclosure must name the specific tool and version, describe the human-AI workflow, and report validation data showing the tool's accuracy on a test set. RAISE does not prohibit AI use. It prohibits undisclosed AI use.
What RAISE Is and Who It Applies To
RAISE is a joint position statement from Cochrane, Campbell, JBI, and the Collaboration for Environmental Evidence, published in November 2025. It applies to all systematic reviews submitted to any of these four organizations' journals and has been adopted as guidance by journals across the biomedical, social science, and environmental evidence synthesis fields.
The statement is not a blanket restriction on AI. Its core principle is transparency: "any use of AI or automation that makes or suggests judgements should be fully and transparently reported." The keyword in that sentence is "suggests." RAISE covers not only tools that make autonomous decisions but also tools that propose decisions for human approval or rejection. A machine learning classifier that ranks records as likely relevant and presents them to a human reviewer for confirmation is an AI tool that suggests judgments, and it falls within RAISE's disclosure scope.
RAISE applies to all stages of the systematic review process where AI is involved in a judgement-relevant action. It does not distinguish between major and minor AI use. It does not have a threshold below which disclosure is not required.
AI Tools at Each Stage of a Systematic Review
Stage 1: Database Searching
AI tools used in database searching include semantic search engines (such as those built into some PubMed interfaces), AI-assisted query expansion tools, and tools that generate Boolean search strategies from natural language descriptions of the research question.
RAISE requires disclosure of any AI tool used to develop, refine, or execute the search strategy. This includes the tool name, its version, and whether it was used to supplement or replace traditional Boolean searching. A search strategy developed by an AI tool that was then peer-reviewed using the PRESS checklist should document both the AI-generated strategy and the PRESS peer review in the methods section.
What RAISE does not require: disclosure of standard database platform features (such as MeSH term suggestion in PubMed) that are built into the database infrastructure and not applied as a separate AI layer. The distinction between native platform features and supplementary AI tools is not perfectly defined in the current guidance and is likely to be clarified in future updates.
Stage 2: Title and Abstract Screening
This is the stage where AI tool use is currently most widespread and where RAISE has the most immediate practical impact. Tools commonly used include ASReview (active learning-based prioritization), Rayyan (ML-assisted relevance classification), Covidence's AI screening assistance, and custom machine learning classifiers built for specific review topics.
RAISE requires disclosure of all of the following for any AI tool used in title and abstract screening:
The tool name and version. "AI-assisted screening" is insufficient. "ASReview v2.1.5 with TF-IDF feature extraction and logistic regression model" meets the standard.
The human-AI workflow. Did human reviewers screen all records or only records that the AI flagged as potentially relevant? At what confidence threshold did AI-flagged records proceed to human review versus automatic exclusion? A workflow that automatically excludes records below a certain AI confidence threshold without human review must be disclosed, and the threshold specified.
The stopping rule, if one was used. ASReview and similar active learning tools are often used with a stopping rule; the review stops when a defined number of consecutive records have been rejected by both the AI and the human reviewer. If a stopping rule was used, the rule must be specified.
Validation data. How accurate was the AI classification? This is typically reported as sensitivity (the proportion of relevant records that the AI correctly identified as relevant) and specificity (the proportion of irrelevant records that the AI correctly excluded), measured against a gold-standard human classification on a test set. The composition of the test set and its relationship to the main review dataset should be described.
Stage 3: Full-Text Screening
AI tools applied to full-text screening face a more demanding task than title-abstract screening because full-text documents are longer, more variable in structure, and more nuanced in their relevance to specific eligibility criteria. LLM-based tools have been applied to this stage, but validation data for full-text AI screening is less developed than for title-abstract screening.
The RAISE disclosure requirements are identical to those for title-abstract screening: tool, version, workflow, stopping rule if applicable, and validation data.
Stage 4: Data Extraction
AI-assisted data extraction tools prompt LLMs to extract specific data fields, population characteristics, intervention details, outcome data, follow-up duration, and funding sources from full-text papers. The extracted data is then reviewed by a human extractor.
RAISE requires disclosure of AI data extraction tool use with the same specificity as screening disclosure. Additionally, because data extraction errors directly affect the pooled estimates in a meta-analysis, the methods section should describe the validation procedure used to assess the accuracy of AI-extracted data against human extraction on a sample of included studies.
The single most consequential data extraction error in meta-analysis is the confusion of standard deviation and standard error. An AI tool that systematically misidentifies which measure of spread has been reported in a paper will introduce a systematic error into the meta-analysis that human verification of a small sample may not catch. The validation sample for AI data extraction should be large enough and representative enough to detect systematic misclassification of this kind.
Stage 5: Risk-of-Bias Assessment
Several studies have tested LLM performance on RoB 2 risk-of-bias assessment, with mixed results. LLMs perform reasonably on signaling questions when the trial report is complete and unambiguous, but produce confident, incorrect judgments on trials with incomplete reporting or complex designs.
RAISE requires disclosure of any AI tool used to generate, suggest, or assist with risk-of-bias judgments. The disclosure must specify which domains the AI tool was applied to, how human reviewers interacted with the AI-generated judgments (review and confirm, review and override, or independent parallel assessment), and what validation data exists for the AI tool's performance on the study designs included in this review.
A risk-of-bias assessment completed with AI assistance, where no validation data is reported, is a methodological weakness that peer reviewers at methodology-focused journals will flag. The validation requirement is particularly important for risk-of-bias because the certainty ratings in the GRADE Summary of Findings table depend directly on the quality of the risk-of-bias data.
Stage 6: Manuscript Preparation
The use of AI tools for manuscript writing, grammar checking, sentence-level editing, prose drafting, or structural assistance falls outside the core RAISE framework, which is focused on judgment-making in the systematic review process itself. However, most high-impact clinical journals, including BMJ, Lancet, JAMA, and Annals of Internal Medicine, have separate author policies on AI use in manuscript preparation that require disclosure. These are distinct from RAISE and should be checked in the author guidelines of the specific target journal.
What a RAISE-Compliant Methods Section Looks Like
A RAISE-compliant methods section addresses AI disclosure at each stage where AI was used. Below is an example of what that disclosure looks like for a review that used AI-assisted screening.
"Title and abstract screening was conducted using ASReview v2.1.5 with active learning and TF-IDF feature extraction. All records were imported, and an initial training set of 50 records (25 relevant, 25 irrelevant) was manually classified to initialize the model. The AI tool then prioritized remaining records by predicted relevance, and two independent human reviewers screened records in the order presented by the tool. A stopping rule was applied: screening concluded when 200 consecutive records had been classified as irrelevant by both the AI model and both human reviewers. Sensitivity of the AI classification was validated against a gold-standard human classification of a random sample of 200 records drawn from the full dataset, achieving 96.4% sensitivity (95% CI 92.1 to 98.7). Full-text screening was conducted by two independent human reviewers without AI assistance."
This disclosure is specific, stage-appropriate, and provides the information a peer reviewer needs to evaluate the reliability of the screening process.
Reporting Standards and the PRISMA 2020 Connection
RAISE operates alongside, not instead of, PRISMA 2020. The PRISMA 2020 checklist requires transparent reporting of the screening process (items 12 and 13), the search strategy (item 10), and the risk-of-bias assessment (item 19). AI use at any of these stages creates additional reporting obligations under RAISE that supplement the PRISMA items rather than replacing them.
Our complete PRISMA 2020 checklist guide covers all 27 items with the most common reporting failures and how to fix each before submission. AI disclosure under RAISE should be integrated into the relevant PRISMA items rather than added as a separate disclosure section.
What Happens When AI Use Is Undisclosed
Journal integrity processes are increasingly using AI detection tools at the pre-review stage. The pattern emerging from the 2025 Peer Review Congress data is that journals with targeted pre-submission screening, particularly for systematic reviews, are identifying undisclosed AI use at higher rates than previously expected. At one conference presentation, targeted AI screening of systematic review manuscripts raised the desk-rejection rate from 13% to 40% before any human review occurred.
Undisclosed AI use identified after publication creates a more serious problem. Correction and retraction processes for systematic reviews that used AI without disclosure are still developing, but the trend in journal editorial policy is toward treating undisclosed AI use with the same severity as other transparency failures. A systematic review retracted for undisclosed AI use affects not only the publication record but potentially the clinical guidelines and health policy decisions built on its conclusions.
Understanding why systematic reviews get rejected increasingly includes this transparency dimension as journals formalize their AI policies.
Which AI Tools Are Currently in Use
The AI tools most commonly used in systematic review production as of 2026, and their primary application stages, are the following.
ASReview (Utrecht University, open-source): active learning-based screening prioritization. Used for title-abstract screening. Validated across multiple review domains. Free. Requires Python installation.
Rayyan (Qatar Computing Research Institute): ML-assisted screening with blind mode for dual-reviewer blinding. Used for title-abstract and initial full-text screening. Free tier available.
Covidence (Cochrane-affiliated): subscription-based review management platform with integrated AI screening assistance. Used across screening stages. Standard tool for Cochrane reviews.
Nested Knowledge (subscription): AI-assisted screening, data extraction, and qualitative coding. Used across multiple stages, including tagging and hierarchy building.
Large language models (GPT-4, Claude, Gemini): used ad hoc for data extraction, risk-of-bias assessment, and manuscript preparation. The most variable in terms of RAISE compliance because use is self-directed and validation is rarely conducted. LLM use at judgment-making stages requires the same validation and disclosure as purpose-built tools.
Frequently Asked Questions
Does RAISE apply to all systematic reviews or only those submitted to Cochrane and JBI?
RAISE was published jointly by Cochrane, Campbell, JBI, and CEE, but its adoption has extended beyond these four organizations. Most major biomedical journals have adopted or are in the process of adopting the RAISE framework, either explicitly by name or through equivalent AI disclosure requirements in their author guidelines. Before submission, check the current author instructions for your target journal regarding AI disclosure. The safest approach is to apply RAISE standards to all systematic reviews regardless of the target journal.
Do I need to disclose AI tools used only for convenience, not for judgments?
The RAISE threshold is whether the tool "makes or suggests judgements." Tools used for administrative convenience without influencing any review decision do not require RAISE disclosure. Examples include reference managers (Zotero, Mendeley), deduplication software, and formatting tools. However, AI features within reference managers or screening platforms that suggest relevance rankings or assist with decision-making fall within scope.
What if I used ChatGPT or another LLM to help write my methods section?
Manuscript writing assistance with LLMs falls under individual journal policies rather than RAISE specifically. RAISE covers AI use in the evidence synthesis process. Most high-impact journals now require disclosure of AI use in manuscript preparation, typically in an author statement. Check the specific journal's current policy before submission.
Is RAISE compliance retroactive for reviews already in progress?
Reviews already in progress should apply RAISE standards from the current stage forward and document what AI tools were used before RAISE was published. For reviews that used AI tools before November 2025 without documentation, a retrospective account of what tools were used, at which stages, and with what human oversight is better than omitting the disclosure.
Can I use AI to replace one of my two independent reviewers?
No. RAISE does not change the dual-independent-reviewer requirement for screening and data extraction. AI tools assist or prioritize human review; they do not substitute for independent human judgment. A single-reviewer process supplemented by AI is still a single-reviewer process and carries the same risk-of-bias designation.



