Washout Periods: When “New Use” Is Just Old Use with Better PR
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
Plenty of observational drug studies announce a shiny new-user design and then define “new” with the methodological confidence of someone checking whether the fridge is empty by opening it for half a second. No dispensing in the last six months? Excellent. Probably incident use. Probably.
A washout period is the pre-index interval during which no prior exposure is allowed if a patient is to count as a new or incident user. The idea is sensible: you want cohort entry to align with treatment initiation rather than with some later, calmer chapter of the treatment story. The trouble is that washout windows are often chosen by habit, data availability, or vibes rather than by a defensible account of prescribing rhythm, intermittent use, and the clinical question.
The Core Design Rule
A washout period is not a ceremonial moat around cohort entry. It is an identification rule. If it is too short, continuing users leak into the incident-user cohort and you rebuild prevalent-user bias with cleaner formatting. If it is too long, you may exclude realistic treatment restarters, narrow the cohort into a peculiar subset, and quietly change the estimand.
Decision rule:
Choose the shortest washout that can plausibly exclude continuing use for the treatment pattern you are studying, and only if the database can actually observe that entire window.
Or in less diplomatic language: a twelve-month washout inside an eight-month claims history is not a design choice. It is a trust fall.
Why Washout Periods Matter More Than They First Appear
1. They define who is “new”
The washout does not just tidy the baseline. It decides whether the study observes first use, later continuation, or some ambiguous re-entry after a gap in captured treatment.
2. They shape confounding control
A long washout can consume most available lookback history, leaving less room to characterize baseline severity, healthcare utilization, and prior treatment trajectories.
3. They change the clinical question
The right washout for lifetime first initiation is not necessarily the right one for treatment restart, step-up therapy, or episodic medication use.
A Concrete Clinical Example
Imagine a comparative effectiveness study of GLP-1 receptor agonists versus DPP-4 inhibitors in adults with type 2 diabetes. The paper uses a six-month washout and calls everyone with no dispensing in that interval a new user.
Why six months may be too short
Some patients stop and restart therapy after side effects, cost barriers, or formulary changes. A six-month gap may identify a restart cohort while the manuscript keeps saying incident initiation.
Why twelve months may not solve everything
If the database only has thirteen months of history, an elegant twelve-month washout leaves almost no room to measure baseline disease trajectory, monitoring intensity, or prior comparator use.
What the protocol should say instead
State whether the target estimand is first observed initiation in available data or treatment restart, justify the washout from prescribing cadence, and show sensitivity analyses with nearby windows.
Interactive washout triage
Is this washout defining new use, or just flattering the cohort?
Adjust the treatment rhythm, database history, and analytic goal. The tool estimates whether the washout is too short to exclude continuing users, too long for the clinical question, or simply unsupported by the available lookback.
Use pattern most consistent with this treatment
What is the study really trying to estimate?
Leakage risk
5%
Approximate risk that apparent new users still include continuing users whose prior treatment is hidden by an insufficient gap.
Restriction cost
18%
A rough signal for how much the washout may shrink or over-select the cohort away from the decision context.
Minimum plausible washout
3 months
Based on the treatment cadence and use pattern entered above, shorter windows are likely to mislabel ongoing users as incident users.
Plausible washout choice
The chosen lookback window broadly matches the refill rhythm and use pattern, so the new-user label is at least clinically defensible.
Estimated estimand drift: Closer to first observed treatment initiation among patients with enough prior clean history
You still need to justify why this drug, this database, and this clinical setting support the chosen washout rather than a nearby alternative.
| Design quantity | Value | Why it matters |
|---|---|---|
| Chosen washout | 6 months | This is the rule that determines whether the study sees a patient as a new user or a continuing user. |
| Available pre-index history | 12 months | Without enough observed history, the washout becomes partly aspirational instead of empirically verified. |
| History shortfall | None | If this is nonzero, the database is missing part of the very window used to certify incident treatment. |
| Practical interpretation | Plausible washout choice | This is the reviewer-facing bottom line the protocol should be able to defend before modeling begins. |
How to Pick a Washout Without Pretending the Database Has Better Memory Than It Does
| Design situation | A sensible instinct | What can go wrong | Reviewer question |
|---|---|---|---|
| Chronic maintenance therapy | Washout should exceed the longest plausible refill cycle and grace around irregular fills. | Short windows relabel continuing users as incident users. | How often can stable patients go between fills without truly being off therapy? |
| Intermittent or episodic treatment | Define whether the question is first use, restart, or episode initiation. | A long washout may over-purify the cohort into an unusual subset of long-gap users. | Does the chosen window match the actual treatment rhythm or just a convention from another drug class? |
| Limited baseline history | Keep the washout fully observable and preserve enough history for confounders. | The study can become both under-verified for incident use and under-measured for baseline severity. | How much pre-index history remains after the washout to measure the things that drive treatment choice? |
| Sensitivity analysis | Show nearby plausible windows rather than one enchanted number. | A single favored window can look suspiciously selected for result behavior. | Does the conclusion survive shorter and longer clinically plausible washouts? |
Five Failure Modes That Deserve Less Politeness
1. The washout is shorter than ordinary refill behavior
If a patient can plausibly go four or five months between observed dispensings, a three-month washout does not establish new use. It establishes impatience.
2. The paper never distinguishes incident use from restart
Restarters often differ from first-time initiators in prior tolerance, disease trajectory, and clinician expectations. Lumping them together muddies both design and interpretation.
3. The database cannot observe the full washout
This is common and avoidable. If enrollment, EHR continuity, or claims capture begin after the washout has already started, part of the incident-user definition lives offstage.
4. A long washout quietly hollows out baseline measurement
The more history you reserve for proving no prior treatment, the less history remains to characterize disease severity, utilization patterns, prior therapies, and outcome risk.
5. The chosen window is defended only because the estimate looked nicer
Washout sensitivity analysis is supposed to test robustness, not to provide a scavenger hunt for the most flattering hazard ratio in the neighborhood.
Reviewer Red Flags for “Incident User” Claims
- The washout window is named, but not justified. “We used 180 days” is not a rationale.
- The treatment pattern is never described. Chronic, intermittent, and episodic therapies do not deserve the same default.
- Available history is shorter than the washout. The cohort is being certified with missing paperwork.
- The manuscript says incident use but the protocol behaves like restart. Those are different patients and often different causal questions.
- No neighboring washout windows are shown. One lucky threshold is not a robustness strategy.
- Baseline covariates depend on history that the washout already consumed. The design may be incident-clean but confounding-blind.
What Aqrab Should Help Teams Catch
Washout choices are exactly the sort of design detail that gets waved through in protocol review and then determines whether the cohort means what the title claims. This is not glamorous, but it is where a lot of observational credibility leaks out.
Practical takeaway
Before you trust an incident-user cohort, ask three things: how the treatment is actually used, how much history the data truly observe, and whether the chosen washout matches the estimand rather than the analyst’s muscle memory.
If your team wants a faster way to stress-test cohort definitions, reviewer red flags, and estimand drift before the manuscript gets expensive, Aqrab is built for exactly that kind of methods critique. Try it at /try or inspect the workflow ideas on /developers.
The Short Version
Washout periods are not tiny housekeeping variables. They decide who counts as newly treated, what treatment history is being compared, and whether a so-called new-user design still smells suspiciously like a prevalent-user cohort in a fresh coat of paint.
A defensible washout is clinically argued, fully observable in the data, and explicitly tied to the estimand. Anything less is not rigor. It is formatting.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Exposure Lagging: When Your Induction Window Becomes Wishful Thinking
A practical guide to exposure lagging for clinical researchers. Covers induction periods, reverse causation, protopathic bias, estimand drift, and what reviewers should demand before trusting a lagged analysis.
AI-Assisted Methods Review: What LLMs Can Catch, What They Cannot, and Where Judgment Still Matters
A practical guide to AI-assisted methods review for clinical researchers. Covers where LLMs help with structural critique, where source verification and causal judgment still require humans, and what reviewers should demand before trusting AI-generated methodological comments.
Run-In Periods: When Your Trial Randomizes the Easy Patients First
A practical guide to run-in periods for clinical researchers. Covers adherence enrichment, tolerability selection, estimand drift, external validity, and what reviewers should demand before trusting a polished randomized cohort.