ROBINS-I V2 (2025): What Changed

ROBINS-I stands for Risk Of Bias In Non-randomized Studies of Interventions. The original tool was published in BMJ in 2016 by Sterne, Hernan, McAleenan, Reeves, and Higgins. It became the standard for assessing risk of bias in non-randomized intervention studies included in systematic reviews. Version 2 was first released in November 2024 and underwent a substantive revision in November 2025. Both developers, Jonathan Sterne and Julian Higgins, were involved in the V2 development.

The changes in ROBINS-I V2 are not cosmetic. The tool introduces an algorithm-based approach to domain-level judgment, new signaling question response categories, and explicit handling of two time-related biases not addressed in the original. The missing data domain has also been reconceived. Any systematic review submitted to a Tier 1 journal in 2026 that includes non-randomized studies of interventions should apply ROBINS-I V2, not the 2016 original.

ScribeLab Writer's risk-of-bias service applies ROBINS-I V2 per outcome, by two independent reviewers, using the November 2025 revision (the same version this article covers), starting from $400.

Quick Answer:

ROBINS-I V2 (November 2025 revision) changes four things in the original 2016 tool: (1) signaling questions now use "strong yes/no" and "weak yes/no" response categories, not just yes/no/probably yes/probably no; (2) domain-level judgments (Low/Moderate/Serious/Critical) are assigned using an algorithm based on signaling question responses rather than reviewer holistic judgment; (3) immortal-time bias and prevalent-user bias are now explicitly addressed with dedicated signaling questions; and (4) the missing data domain has been reconceived with new signaling questions. The scope of ROBINS-I V2 is cohort and follow-up study designs only. Cross-sectional studies, case-control studies, and before-and-after designs are excluded from V2's current scope, with separate tools under development. The tool is available at riskofbias.info.

What ROBINS-I Is and Why the Version Matters

ROBINS-I assesses seven domains of potential bias in non-randomized intervention studies. Non-randomized studies of interventions include cohort studies, registry-based studies, electronic health record studies, and other observational designs that evaluate the effect of an intervention without randomization. These study designs are common in systematic reviews of public health interventions, policy interventions, implementation science, and clinical areas where RCTs are not feasible.

The distinction between ROBINS-I and RoB 2 is a study design distinction, not a preference. RoB 2 is used for randomized controlled trials. ROBINS-I is used for non-randomized studies. Using ROBINS-I for an RCT, or RoB 2 for an observational cohort study, is a methodological error that peer reviewers at methods-focused journals will flag.

Why the version matters. PROSPERO's updated registration form (February 2025) now requires selection from the LATITUDES dropdown for risk-of-bias tools. The LATITUDES taxonomy distinguishes between ROBINS-I (original 2016 version) and ROBINS-I V2. Systematic reviews registered in 2025 or 2026 and applying the original 2016 ROBINS-I will be applying a superseded tool. Peer reviewers familiar with the V2 update will notice.

The original ROBINS-I (Sterne et al., BMJ, 2016;355:i4919) used holistic reviewer judgment to convert signaling question responses into domain-level judgments. This approach introduced variability between reviewers conducting the same assessments. ROBINS-I V2 replaces holistic judgment with an algorithm that maps specific signaling question response patterns to domain-level outcomes, reducing variability and improving reproducibility.

What Changed in ROBINS-I V2: Domain by Domain

Table 1: ROBINS-I V1 vs V2: Domain-by-Domain Changes

Domain	ROBINS-I V1 (2016)	ROBINS-I V2 (November 2025)	Key Change
D1: Confounding	Addressed confounding by measured variables. No specific questions for time-related bias.	Added signaling questions for immortal-time bias and prevalent-user bias. Algorithm-based domain judgment.	Immortal-time bias and prevalent-user bias now explicitly addressed with dedicated signaling questions.
D2: Selection of participants	Assessed whether participant selection depended on intervention or outcome status. Holistic judgment.	Strengthened signaling questions. Algorithm-based judgment. Strong/weak yes/no response options.	Algorithm replaces holistic judgment. New strong/weak response categories.
D3: Classification of interventions	Assessed whether intervention status was correctly classified. Standard yes/probably yes/probably no/no responses.	Updated signaling question language. Strong/weak yes/no responses. Algorithm judgment.	Language update and algorithm integration. New strong/weak response categories.
D4: Deviations from intended interventions	Included a co-interventions question for per-protocol effects. Assessed both ITT and per-protocol estimands.	More explicit ITT vs per-protocol distinction. Co-interventions question removed from this domain. Now addressed by confounding domain.	Co-interventions question dropped. ITT/per-protocol distinction made more explicit.
D5: Missing data	Assessed whether missing data could bias the result. Signaling questions considered less specific by developers.	Domain reconceived with new signaling questions. Better distinction between data missing at random vs missing by outcome status.	Reconceived domain with new signaling questions. Higher specificity than V1.
D6: Measurement of outcomes	Assessed whether outcome measurement could depend on intervention knowledge. Holistic judgment.	Updated signaling question language. Strong/weak yes/no responses. Algorithm-based domain judgment.	Algorithm replaces holistic judgment. New response categories.
D7: Selection of reported result	Assessed reporting bias. Holistic domain-level judgment after signaling questions.	Updated signaling questions. Algorithm-based judgment. Strong/weak response categories.	Algorithm replaces holistic judgment. New response categories.

Source: Sterne JAC, Higgins JPT. ROBINS-I V2. November 2025 revision. riskofbias.info

Domain 1: Confounding. The confounding domain in V2 retains its primary focus on whether all important confounders were measured and controlled. V2 adds explicit signaling questions for two specific types of confounding that were not addressed in V1:

Immortal-time bias occurs when some of the follow-up time before the exposure is assigned to the exposed group, inflating the apparent protective effect of the intervention. In pharmacy database studies, for example, patients must survive long enough to receive a second prescription. If this initial period of follow-up is classified as exposed time, the exposed group appears healthier at baseline than they truly are. V2 adds signaling questions asking whether the study design or analysis could have introduced immortal-time bias and whether this was appropriately handled.

Prevalent-user bias occurs when participants who have already been using the intervention at study entry are included in the study. Studies of drug effects that include prevalent users (rather than only new users) may underestimate adverse effects because people who experienced early adverse effects have already stopped using the drug. V2 adds signaling questions asking whether the study is restricted to new users of the intervention.

Domain 2: Selection of participants. V2 strengthens the signaling questions around whether selection into the study depends on the intervention or outcome status.

Domain 3: Classification of interventions. V2 retains the core question of whether the intervention was correctly classified, with updated signaling question language.

Domain 4: Deviations from intended interventions. V2 more explicitly distinguishes between the intention-to-treat effect (what happens when the intervention is offered) and the per-protocol effect (what happens when the intervention is taken as specified). The signaling questions vary depending on which effect is being estimated. V2 dropped a question about co-interventions that was present in some versions of the V1 tool for per-protocol effect assessment, on the basis that co-interventions are addressed by the confounding domain.

Domain 5: Missing data. The missing data domain has been reconceived in V2 with new signaling questions that better distinguish between data missing at random (likely not introducing bias) and data missing in a way that depends on outcome status (potentially introducing bias). The V1 missing data signaling questions were considered by the developers to be insufficiently specific.

Domain 6: Measurement of outcomes. V2 retains the core question of whether outcomes were measured in a way that could depend on knowledge of intervention status, with updated signaling question language.

Domain 7: Selection of the reported result. V2 retains the reporting bias domain, assessing whether the reported effect estimate is selected from multiple analyses.

The Algorithm-Based Judgment Approach

In the original ROBINS-I, reviewers answered signaling questions within each domain and then used holistic judgment to assign the domain-level rating (Low/Moderate/Serious/Critical). This holistic step introduced inter-reviewer variability, because two reviewers who answered the signaling questions identically could reach different domain-level judgments.

ROBINS-I V2 replaces holistic judgment with an algorithm. The algorithm maps specific patterns of signaling question responses to specific domain-level judgments. If the signaling question responses meet the pattern for Moderate risk, the algorithm assigns Moderate. There is no override by reviewer judgment.

The algorithm approach serves two purposes. First, it increases reproducibility: two reviewers who answer the signaling questions in the same way will reach the same domain-level judgment. Second, it makes the basis for the domain-level judgment transparent and auditable.

The New Signaling Question Response Categories

The original ROBINS-I used four response categories for signaling questions: Yes, Probably Yes, Probably No, and No, with an additional No Information option. ROBINS-I V2 introduces a new distinction within the Yes and No categories: "strong yes/no" and "weak yes/no."

A strong yes indicates that the reviewer is confident the condition described in the signaling question is met. A weak yes indicates that the evidence suggests the condition is probably met, but with some uncertainty. The same logic applies to strong no and weak no.

This distinction matters for the algorithm. The algorithm produces different domain-level judgments depending on whether the signaling question was answered with a strong or weak response. A weak yes on a critical signaling question may produce a higher risk-of-bias rating than a strong yes.

The practical implication for research teams is that ROBINS-I V2 assessments take longer per study than V1 assessments. The additional resolution in the signaling question responses and the need to document the basis for strong vs weak responses add time to the assessment process. Research teams should account for this in their project timeline.

Scope: Which Studies ROBINS-I V2 Covers

ROBINS-I V2 is designed for cohort and follow-up study designs. This includes prospective cohort studies, retrospective cohort studies, registry-based cohort studies, and electronic health record studies that follow a defined cohort over time and compare outcomes by intervention status.

ROBINS-I V2 is not designed for:

Case-control studies, which identify participants based on outcome status and work backward to compare exposure. A separate tool is under development for case-control study risk-of-bias assessment.

Cross-sectional studies, which measure intervention and outcome status at the same point in time. These have a different causal inference problem from cohort designs.

Before-and-after studies and interrupted time series designs, which are addressed by ROBINS-I-E (a related tool in development).

For a systematic review that includes a mix of RCTs and non-randomized cohort studies, apply RoB 2 to the RCTs and ROBINS-I V2 to the non-randomized cohort studies. List both tools in the PROSPERO registration form under the LATITUDES dropdown. Report the risk-of-bias results for each tool separately in the manuscript.

Need ROBINS-I V2 assessments conducted by an experienced reviewer for your systematic review?
ScribeLab Writer's risk-of-bias and statistical analysis service applies the correct tool for each study design: RoB 2 per outcome for RCTs, ROBINS-I V2 for non-randomized cohort studies, and QUADAS-2 for diagnostic accuracy studies. Assessments are conducted by two independent reviewers, with Cohen's kappa reported. Robvis traffic-light plots are included as standard. Submit your project details and a PhD methodologist will respond within 2-4 hours.

Need ROBINS-I V2 assessments conducted by an experienced reviewer for your systematic review?

ScribeLab Writer's risk-of-bias and statistical analysis service applies the correct tool for each study design: RoB 2 per outcome for RCTs, ROBINS-I V2 for non-randomized cohort studies, and QUADAS-2 for diagnostic accuracy studies. Assessments are conducted by two independent reviewers, with Cohen's kappa reported. Robvis traffic-light plots are included as standard. Submit your project details and a PhD methodologist will respond within 2-4 hours.

ROBINS-I V2 vs RoB 2 vs NOS: Which Tool to Use

Table 2: Risk-of-Bias Tool Selection Guide: ROBINS-I V2 vs RoB 2 vs NOS

Criterion	ROBINS-I V2	RoB 2	NOS (Newcastle-Ottawa Scale)
Study design	Non-randomized studies of interventions: cohort and follow-up designs.	Randomized controlled trials only.	Observational studies (cohort and case-control). Not specifically designed for intervention studies.
Cochrane recommended	Yes. Recommended for non-randomized studies of interventions in Cochrane Handbook v6.5.	Yes. Recommended for RCTs in Cochrane Handbook v6.5.	No. NOS is not recommended in Cochrane Handbook v6.5 for intervention studies.
Domain-level judgment approach	Algorithm-based (V2). Signaling question responses are mapped to judgments by an algorithm, not holistic reviewer judgment.	Algorithm-based. Signaling question responses mapped to domain and overall judgment.	Score-based. Produces a numeric score, not a risk-of-bias judgment. Low inter-rater reliability documented in literature.
Per-outcome assessment	Yes. ROBINS-I V2 should be applied per outcome where feasible, as risk of bias may differ between outcomes.	Yes. RoB 2 is applied per outcome. Required for Cochrane reviews.	No. NOS is applied per study, not per outcome.
Immortal-time and prevalent-user bias	Yes. V2 addresses both with dedicated signaling questions in Domain 1.	Not applicable. These biases are specific to non-randomized designs.	No dedicated assessment of immortal-time or prevalent-user bias.
PROSPERO LATITUDES recognition	Yes. ROBINS-I V2 is listed in the LATITUDES dropdown on the updated PROSPERO platform (February 2025).	Yes. RoB 2 is listed in the LATITUDES dropdown.	Yes. NOS is listed in LATITUDES but is not recommended for Cochrane-standard intervention reviews.
When to use	For any non-randomized cohort study of an intervention. Use V2, not the 2016 original, for any review submitted in 2026.	For all RCTs. Including cluster RCTs with the appropriate extension.	Avoid for Tier 1 journal submissions of intervention reviews. Acceptable in older reviews or where specifically required by a journal.

Never use NOS for intervention studies in a systematic review targeting a Tier 1 journal. The Newcastle-Ottawa Scale (NOS) is a legacy tool with low inter-rater reliability and no algorithm-based approach to domain-level judgments. It was not designed specifically for intervention studies and does not address confounding, deviations from intended interventions, or selection of reported results with the specificity required for Cochrane Handbook-compliant reviews. NOS is still widely used but is widely recognized as methodologically weaker than either RoB 2 or ROBINS-I V2. Cochrane reviews do not use NOS.

Applying ROBINS-I V2: A Practical Workflow

Before the review begins. Register the use of ROBINS-I V2 in the PROSPERO protocol and select it from the LATITUDES dropdown in the registration form. Specify in the protocol which study designs will be assessed using ROBINS-I V2. Pre-specify any planned subgroup analyses stratified by ROBINS-I V2 risk-of-bias category.

Pilot testing. Before applying ROBINS-I V2 to all included studies, pilot the tool on three to five studies with both reviewers conducting assessments independently. Compare the pilot results and discuss any disagreements. This identifies ambiguous signaling questions or differences in interpretation before they affect the full assessment.

Independent assessment by two reviewers. Each included non-randomized cohort study is assessed independently by two reviewers. Both reviewers complete the full set of signaling questions for all seven domains. They record their strong/weak yes/no responses and the algorithm generates the domain-level judgment.

Disagreement resolution. After independent assessment, reviewers compare their domain-level judgments. Disagreements at the domain level are resolved by discussion. If discussion does not resolve the disagreement, a third reviewer adjudicates.

Overall risk-of-bias judgment. The overall ROBINS-I V2 judgment for a study is determined by the highest domain-level rating. A study with six Low domains and one Serious domain receives an overall Serious rating. This rule is stricter than some reviewers expect, and teams should account for it when planning sensitivity analyses stratified by risk of bias.

Visualization. Results are displayed in traffic-light plots using the robvis R package (McGuinness and Higgins, Research Synthesis Methods, 2021). A domain-by-domain heatmap and a study-level summary plot should be included in the manuscript or supplementary materials.

Common Errors When Applying ROBINS-I V2

Using the 2016 ROBINS-I version. The most common error in 2026 is applying the original 2016 ROBINS-I rather than V2. The two tools differ in signaling question structure, response categories, and the algorithm-vs-holistic approach to domain-level judgment. A reviewer who applies the original tool and labels it as ROBINS-I in the manuscript will be using a superseded tool.

Applying ROBINS-I V2 to case-control studies. ROBINS-I V2 is for cohort studies. Applying it to a case-control study misapplies the tool and produces domain-level judgments that may not reflect the actual risk-of-bias structure of case-control designs.

Applying a holistic judgment after completing the signaling questions. ROBINS-I V2 uses an algorithm to convert signaling question responses to domain-level judgments. A reviewer who answers the signaling questions and then overrides the algorithm with their own judgment is not applying V2 correctly. The algorithm output is the domain-level judgment.

Not distinguishing strong from weak responses. A reviewer who answers all signaling questions as strong yes or strong no without considering the uncertainty in their evidence will produce less accurate domain-level judgments than a reviewer who reflects on the strength of evidence for each response.

Using ROBINS-I V2 per study rather than per outcome. ROBINS-I V2, like its predecessor and like RoB 2, should be applied per outcome where feasible. An included study may have Low risk of bias for the mortality outcome and Serious risk of bias for the quality-of-life outcome if quality of life was assessed by unblinded assessors who knew the intervention status.

Frequently Asked Questions

Is ROBINS-I V2 the same as ROBINS-E?

No. ROBINS-E is a related but separate tool for assessing risk of bias in non-randomized studies of exposures (rather than interventions). It is used in systematic reviews that examine the effect of harmful exposures (such as air pollution or dietary factors) rather than deliberate interventions. ROBINS-I V2 is for interventions. ROBINS-E is for exposures. They share a similar structure but address different causal questions.

Can I use ROBINS-I V2 for both RCTs and observational studies in the same review?

No. ROBINS-I V2 is for non-randomized studies of interventions only. RCTs must be assessed using RoB 2. In a review that includes both RCTs and non-randomized cohort studies, apply RoB 2 to the RCTs and ROBINS-I V2 to the non-randomized studies. Report the results from each tool separately in the methods and results sections.

Where can I access ROBINS-I V2?

ROBINS-I V2 is available at riskofbias.info/welcome/robins-i-v2. The site provides the current version of the tool, the signaling questions, the domain structure, and accompanying guidance. CoRATES (corates.org/resources/robins-i) provides additional implementation resources. Julian Higgins delivered a webinar on ROBINS-I V2 methodology in October 2025; the recording may be available through Cochrane's training resources.

How long does the ROBINS-I V2 assessment take per study?

The original ROBINS-I typically required 20 to 40 minutes per study for an experienced reviewer. ROBINS-I V2 adds time because of the strong/weak distinction in signaling question responses and the documentation requirement for the basis of each response. Expect 30 to 60 minutes per study for reviewers in their first 10 to 20 assessments. Speed increases with experience, but initial assessments in V2 should be budgeted generously.

Does ROBINS-I V2 replace ACROBAT-NRSI?

Yes. ACROBAT-NRSI was an earlier predecessor tool that was superseded by the original ROBINS-I in 2016. ROBINS-I V2 supersedes both the original ROBINS-I and ACROBAT-NRSI. Reviews that used ACROBAT-NRSI or the original ROBINS-I should note which version was applied in the manuscript. They should also acknowledge that V2 represents a methodological update since the review was conducted.

How does ROBINS-I V2 integrate with GRADE?

ROBINS-I V2 domain-level judgments feed directly into the GRADE risk-of-bias criterion. Studies rated Serious or Critical across most domains will trigger a GRADE downgrade for risk of bias. The floor for GRADE is Very Low, so further downgrading from Very Low does not produce a rating below Very Low. Studies with predominantly Low or Moderate domain ratings may not trigger a GRADE downgrade for risk of bias. The starting certainty for observational evidence is still Very Low under GRADE, regardless of ROBINS-I domain ratings.

Applying the Current Standard Before Your Review Is Submitted

A systematic review that includes non-randomized cohort studies and applies the original 2016 ROBINS-I to risk-of-bias assessment in 2026 is applying a superseded tool. Peer reviewers at methods-focused journals, systematic review methodology journals, and Cochrane editorial teams are aware of ROBINS-I V2 and will notice if the 2016 version is used without explanation.

ScribeLab Writer's risk-of-bias and statistical analysis service applies ROBINS-I V2 to non-randomized cohort studies, RoB 2 per outcome to RCTs, and QUADAS-2 to diagnostic accuracy studies. Assessments are conducted by two independent reviewers with RobVis traffic-light plots as standard. Submit your project details, and a PhD methodologist will respond within 2-4 hours.

All Articles Start Your Project

ROBINS-I V2 (November 2025): What Changed and How to Apply the Updated Tool