Causal InferencePharmacoepidemiologyStudy Design

Prevalent-User Bias: When Your Drug Study Starts After the Interesting Harm Already Happened

May 18, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Many drug studies begin with a treated cohort that looks reassuringly stable: patients already on therapy, still refilling it, and apparently doing fine. That calm surface is often exactly the problem. If the study starts after treatment initiation, some of the clinically important story may already be over.

Prevalent-user bias appears when a study includes people who are already using treatment at cohort entry instead of following them from initiation. The treated group is no longer a baseline treatment cohort. It is a selected set of survivors, tolerators, and continuers. Early harms, early discontinuation, and some high-risk patients have already been filtered out before the comparison even begins.

The Core Mistake

The problem is not merely that treatment started earlier. The problem is that treatment history before study entry has already changed who remains eligible to be observed as treated. If early adverse events, intolerance, contraindication discovery, or lack of response happen soon after initiation, a prevalent-user cohort quietly excludes some of the very patients needed to estimate initiation risk honestly.

Decision rule:

If the clinical question concerns what happens when treatment is started, do not begin by observing only people who already proved they could stay on it.

That may sound impolite to the design. Good. Some designs need the discourtesy.

Why Prevalent Users Are a Selected Population

1. Depletion of susceptibles

Patients most vulnerable to early harm or intolerance can experience the event and disappear from the treated cohort before study entry.

2. Survivor and adherer selection

Remaining treated patients are the ones who survived, tolerated treatment, stayed engaged with care, and often looked clinically suitable enough to continue.

3. Post-treatment baseline covariates

Covariates measured at late entry may already reflect treatment effects, adherence, or early clinical evolution rather than true pretreatment state.

A Two-Cohort Thought Experiment

Imagine a drug with an early bleeding risk concentrated in the first weeks after initiation. If you compare patients from the day they start treatment, you observe those early events. If you instead build the treated cohort from patients who are already using the drug three months later, some early bleeds, discontinuations, and treatment abandonments are already missing from the file.

Cohort definition	Who is included?	What gets missed?
New-user cohort	Everyone at treatment initiation	Very little of the early treatment story, if follow-up starts at time zero
Prevalent-user cohort	Only patients still on treatment later	Early harms, early discontinuation, and some of the most vulnerable patients

By the time the prevalent-user cohort enters analysis, treatment may look safer partly because the design waited until the risk-set had been edited by reality.

Interactive prevalence trap

How many early harms disappear when the cohort starts late?

This toy model starts with 1,000 treatment initiators. Move the sliders to see how a prevalent-user cohort quietly drops early events, enriches for survivors, and makes the treatment period look calmer than the one patients actually experienced.

Bias signal93 early events erasedApparent risk attenuation: 68.8%

Share of patients vulnerable to an early treatment-related event: 30.0%

Early event risk among vulnerable patients: 24.0%

Early event risk among other patients: 3.0%

How much lower is risk after the early vulnerable period? 35%

At treatment initiation

9.3%

Risk during the period where early harms are actually allowed to occur.

If you start late

2.9%

Observed risk after the vulnerable window has already filtered the cohort.

Who is left?

25.1%

Share of survivors who still belong to the early-vulnerable group after depletion.

Quantity	Value	Why it matters
Early events among vulnerable patients	72	These are the classic harms or intolerance events that vanish when users must survive long enough to be counted.
Other early events	21	Even lower-risk patients contribute events that a late-entry cohort simply never sees.
Patients remaining for the prevalent-user cohort	907	The treated group is now a selected set of survivors and tolerators, not a baseline treatment cohort.
Apparent risk drop created by late entry	68.8%	The treatment can look safer partly because the study began after the interesting damage already happened.

How to read the toy model

This is deliberately simple. Real studies have competing risks, treatment switching, dose changes, and time-varying confounding. The point is narrower: if you only count people who are still on treatment later, you have already selected away some of the patients and events that define initiation risk.

Decision rule: if clinically important harms or discontinuation cluster soon after treatment starts, a prevalent-user design is answering a quieter, later question than the one readers usually think they are seeing.

•Late entry removes early events from observation and changes who remains under treatment.
•Baseline covariates measured after treatment initiation may already be partly affected by treatment.
•Comparing prevalent users to new initiators is usually a timeline mismatch wearing a model.

Where the Bias Shows Up in Practice

Drug safety studies

If nausea, bleeding, dizziness, rash, or arrhythmia risk is front-loaded after initiation, late-entry treated cohorts can understate harms.

Comparative effectiveness studies

Comparing prevalent users of one drug to initiators of another often creates a timeline mismatch before confounding control even starts.

Chronic-disease maintenance cohorts

Even when long-term maintenance is the interest, you still need to say plainly that the estimand is a later-treatment effect among survivors, not the effect of starting treatment.

Why Adjustment Rarely Fixes It

Researchers sometimes hope a rich propensity score or outcome model can neutralize the problem. Usually it cannot, because the issue is not just covariate imbalance. The issue is that the treatment cohort was conditioned on future survival, future tolerance, and future continuation before the analytic clock began.

Problem	Why standard adjustment struggles	Better move
Early events happened before entry	You cannot model events that the design never allowed into follow-up	Use a new-user design with time zero at initiation
Continuation selects survivors and tolerators	Selection depends on post-baseline history that may be poorly measured or already treatment-affected	Define the estimand around initiation, or state a maintenance estimand explicitly
Baseline covariates measured late	“Baseline” now contains post-treatment information masquerading as pre-exposure state	Measure covariates before initiation whenever the question is about starting treatment

When a Prevalent-User Design Might Be Defensible

Not every late-entry cohort is automatically nonsense. Sometimes the causal question truly concerns ongoing maintenance among people who already persisted on therapy. That can be legitimate. It is just a different question.

What the paper should say plainly

We are not estimating the effect of treatment initiation. We are estimating outcomes among patients who have already remained on treatment up to a later landmark or maintenance point. That is a narrower, more selected, and often less transportable estimand.

If the manuscript never states that distinction, readers will usually infer a broader initiation claim than the design can support.

Reviewer Red Flags

Use this table when a treatment cohort seems oddly serene

Red flag	Why it matters
Treated patients are required to have prior exposure before cohort entry	The treated group is already conditioned on persistence and survival.
The paper asks an initiation question but measures covariates after treatment began	Some so-called baseline information may already be downstream of exposure.
Comparator patients are new users while treated patients are prevalent users	The comparison is mixing different disease and treatment timelines.
Early adverse events or discontinuation are clinically plausible but never discussed	The design may be hiding the very period where bias is largest.
Long follow-up stability is treated as proof of safety	Stable later users are not evidence that initiation was benign.

Decision Rules for Authors and Reviewers

Match time zero to the treatment question. If the claim is about starting treatment, cohort entry should usually be treatment initiation.
Do not call post-treatment information baseline. If treatment already started, say what has already had time to change.
Ask where early harms went. If the paper cannot account for early discontinuation or acute events, the treated cohort may already be edited by selection.
Prefer active-comparator new-user designs for comparative effectiveness. They do not solve everything, but they at least respect the timeline.
If using prevalent users on purpose, narrow the claim. State clearly that the estimand concerns ongoing users who persisted to the entry point.

Where Aqrab Fits

Prevalent-user bias is exactly the kind of methodological slippage that hides inside a respectable cohort table and a smooth regression output. If you want treatment-effect claims stress-tested before peer reviewers do it more publicly, try Aqrab. If you are building methods-aware review or protocol tooling into your own pipeline, the developer tools are the cleaner place to start.

The Bottom Line

A prevalent-user cohort is not automatically wrong. It is automatically selected. If the paper claims to tell you what happens when treatment starts, but it only studies the people still standing later, the estimate may be calmer, cleaner, and more misleading than it looks.