Prevalent-User Bias: When Your Drug Study Starts After the Interesting Harm Already Happened
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
Many drug studies begin with a treated cohort that looks reassuringly stable: patients already on therapy, still refilling it, and apparently doing fine. That calm surface is often exactly the problem. If the study starts after treatment initiation, some of the clinically important story may already be over.
Prevalent-user bias appears when a study includes people who are already using treatment at cohort entry instead of following them from initiation. The treated group is no longer a baseline treatment cohort. It is a selected set of survivors, tolerators, and continuers. Early harms, early discontinuation, and some high-risk patients have already been filtered out before the comparison even begins.
The Core Mistake
The problem is not merely that treatment started earlier. The problem is that treatment history before study entry has already changed who remains eligible to be observed as treated. If early adverse events, intolerance, contraindication discovery, or lack of response happen soon after initiation, a prevalent-user cohort quietly excludes some of the very patients needed to estimate initiation risk honestly.
Decision rule:
If the clinical question concerns what happens when treatment is started, do not begin by observing only people who already proved they could stay on it.
That may sound impolite to the design. Good. Some designs need the discourtesy.
Why Prevalent Users Are a Selected Population
1. Depletion of susceptibles
Patients most vulnerable to early harm or intolerance can experience the event and disappear from the treated cohort before study entry.
2. Survivor and adherer selection
Remaining treated patients are the ones who survived, tolerated treatment, stayed engaged with care, and often looked clinically suitable enough to continue.
3. Post-treatment baseline covariates
Covariates measured at late entry may already reflect treatment effects, adherence, or early clinical evolution rather than true pretreatment state.
A Two-Cohort Thought Experiment
Imagine a drug with an early bleeding risk concentrated in the first weeks after initiation. If you compare patients from the day they start treatment, you observe those early events. If you instead build the treated cohort from patients who are already using the drug three months later, some early bleeds, discontinuations, and treatment abandonments are already missing from the file.
| Cohort definition | Who is included? | What gets missed? |
|---|---|---|
| New-user cohort | Everyone at treatment initiation | Very little of the early treatment story, if follow-up starts at time zero |
| Prevalent-user cohort | Only patients still on treatment later | Early harms, early discontinuation, and some of the most vulnerable patients |
By the time the prevalent-user cohort enters analysis, treatment may look safer partly because the design waited until the risk-set had been edited by reality.
Interactive prevalence trap
How many early harms disappear when the cohort starts late?
This toy model starts with 1,000 treatment initiators. Move the sliders to see how a prevalent-user cohort quietly drops early events, enriches for survivors, and makes the treatment period look calmer than the one patients actually experienced.
At treatment initiation
9.3%
Risk during the period where early harms are actually allowed to occur.
If you start late
2.9%
Observed risk after the vulnerable window has already filtered the cohort.
Who is left?
25.1%
Share of survivors who still belong to the early-vulnerable group after depletion.
| Quantity | Value | Why it matters |
|---|---|---|
| Early events among vulnerable patients | 72 | These are the classic harms or intolerance events that vanish when users must survive long enough to be counted. |
| Other early events | 21 | Even lower-risk patients contribute events that a late-entry cohort simply never sees. |
| Patients remaining for the prevalent-user cohort | 907 | The treated group is now a selected set of survivors and tolerators, not a baseline treatment cohort. |
| Apparent risk drop created by late entry | 68.8% | The treatment can look safer partly because the study began after the interesting damage already happened. |
How to read the toy model
This is deliberately simple. Real studies have competing risks, treatment switching, dose changes, and time-varying confounding. The point is narrower: if you only count people who are still on treatment later, you have already selected away some of the patients and events that define initiation risk.
Decision rule: if clinically important harms or discontinuation cluster soon after treatment starts, a prevalent-user design is answering a quieter, later question than the one readers usually think they are seeing.
- •Late entry removes early events from observation and changes who remains under treatment.
- •Baseline covariates measured after treatment initiation may already be partly affected by treatment.
- •Comparing prevalent users to new initiators is usually a timeline mismatch wearing a model.
Where the Bias Shows Up in Practice
Drug safety studies
If nausea, bleeding, dizziness, rash, or arrhythmia risk is front-loaded after initiation, late-entry treated cohorts can understate harms.
Comparative effectiveness studies
Comparing prevalent users of one drug to initiators of another often creates a timeline mismatch before confounding control even starts.
Chronic-disease maintenance cohorts
Even when long-term maintenance is the interest, you still need to say plainly that the estimand is a later-treatment effect among survivors, not the effect of starting treatment.
Why Adjustment Rarely Fixes It
Researchers sometimes hope a rich propensity score or outcome model can neutralize the problem. Usually it cannot, because the issue is not just covariate imbalance. The issue is that the treatment cohort was conditioned on future survival, future tolerance, and future continuation before the analytic clock began.
| Problem | Why standard adjustment struggles | Better move |
|---|---|---|
| Early events happened before entry | You cannot model events that the design never allowed into follow-up | Use a new-user design with time zero at initiation |
| Continuation selects survivors and tolerators | Selection depends on post-baseline history that may be poorly measured or already treatment-affected | Define the estimand around initiation, or state a maintenance estimand explicitly |
| Baseline covariates measured late | “Baseline” now contains post-treatment information masquerading as pre-exposure state | Measure covariates before initiation whenever the question is about starting treatment |
When a Prevalent-User Design Might Be Defensible
Not every late-entry cohort is automatically nonsense. Sometimes the causal question truly concerns ongoing maintenance among people who already persisted on therapy. That can be legitimate. It is just a different question.
What the paper should say plainly
We are not estimating the effect of treatment initiation. We are estimating outcomes among patients who have already remained on treatment up to a later landmark or maintenance point. That is a narrower, more selected, and often less transportable estimand.
If the manuscript never states that distinction, readers will usually infer a broader initiation claim than the design can support.
Reviewer Red Flags
Use this table when a treatment cohort seems oddly serene
| Red flag | Why it matters |
|---|---|
| Treated patients are required to have prior exposure before cohort entry | The treated group is already conditioned on persistence and survival. |
| The paper asks an initiation question but measures covariates after treatment began | Some so-called baseline information may already be downstream of exposure. |
| Comparator patients are new users while treated patients are prevalent users | The comparison is mixing different disease and treatment timelines. |
| Early adverse events or discontinuation are clinically plausible but never discussed | The design may be hiding the very period where bias is largest. |
| Long follow-up stability is treated as proof of safety | Stable later users are not evidence that initiation was benign. |
Decision Rules for Authors and Reviewers
- Match time zero to the treatment question. If the claim is about starting treatment, cohort entry should usually be treatment initiation.
- Do not call post-treatment information baseline. If treatment already started, say what has already had time to change.
- Ask where early harms went. If the paper cannot account for early discontinuation or acute events, the treated cohort may already be edited by selection.
- Prefer active-comparator new-user designs for comparative effectiveness. They do not solve everything, but they at least respect the timeline.
- If using prevalent users on purpose, narrow the claim. State clearly that the estimand concerns ongoing users who persisted to the entry point.
Where Aqrab Fits
Prevalent-user bias is exactly the kind of methodological slippage that hides inside a respectable cohort table and a smooth regression output. If you want treatment-effect claims stress-tested before peer reviewers do it more publicly, try Aqrab. If you are building methods-aware review or protocol tooling into your own pipeline, the developer tools are the cleaner place to start.
The Bottom Line
A prevalent-user cohort is not automatically wrong. It is automatically selected. If the paper claims to tell you what happens when treatment starts, but it only studies the people still standing later, the estimate may be calmer, cleaner, and more misleading than it looks.