Clinical TrialsStudy DesignMethods Critique

Run-In Periods: When Your Trial Randomizes the Easy Patients First

May 29, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

A surprising number of trials want credit for studying ordinary clinical practice after first removing the patients who behave most like ordinary clinical practice. That is the run-in period in one sentence.

A run-in period is a pre-randomization interval during which eligible participants may be exposed to placebo, active treatment, adherence monitoring, or procedural rehearsal before investigators decide who actually gets randomized. Sometimes this is useful. Sometimes it is a quiet filtration system for intolerance, poor adherence, early complications, or lack of enthusiasm. The protocol says “optimization.” The estimand often says “we switched populations midway through the recruitment story.”

The Core Design Rule

A run-in period is defensible only if the trial is explicit about the question it creates. If the aim is to estimate treatment effect among participants who can tolerate and sustain the regimen, say that. If the aim is effectiveness for the patients clinicians will actually try to treat, then pre-randomization pruning needs much more skepticism.

Decision rule:

When a run-in excludes people for intolerance, early nonadherence, or early events, assume the trial now estimates a narrower, more treatment-friendly question unless the authors prove otherwise.

Or less politely: if your trial only randomizes the patients who already demonstrated they can live with the intervention, do not market the result as if the difficult patients never existed.

Why Run-In Periods Matter More Than Their Footnote Status Suggests

1. They change who gets randomized

A run-in is not mere logistics. It actively determines whether the randomized cohort includes people prone to adverse effects, practical barriers, early worsening, or weak adherence.

2. They can inflate tolerability and adherence

If participants who stop early never reach randomization, later estimates of side-effect burden and persistence can look suspiciously civilized.

3. They quietly alter generalizability

The headline may sound population-wide, but the true population may be “patients who already passed a rehearsal exam for treatment success.”

A Concrete Clinical Example

Imagine a cardiovascular outcomes trial of a once-daily metabolic drug. Before randomization, everyone enters a four-week active run-in. Participants who miss pill-count targets, report troublesome nausea, or stop therapy early are excluded from randomization.

What the trial gains

Better protocol adherence, fewer immediate discontinuations, cleaner exposure contrast, and less operational chaos after randomization.

What the trial loses

It has now removed exactly the people clinicians most need help understanding: those who feel sick, struggle with routines, or declare early that the regimen is harder than the brochure promised.

What the abstract should say

“Among participants who tolerated and adhered through a pre-randomization run-in, treatment reduced risk…” That sentence is less glamorous and much more honest.

Interactive run-in explorer

Watch a cleaner randomized cohort appear before treatment effect estimation even begins

This toy model assumes a pre-randomization run-in removes people who cannot tolerate or sustain the regimen. Move the sliders and compare the apparent event risk in the randomized sample with the risk in the full eligible cohort that existed before the run-in started pruning it.

Selection signal3.2%lower observed event risk created by run-in selection alone

Eligible participants before run-in: 1,000

Excluded during run-in for early intolerance: 18.0%

Additional exclusion for poor adherence: 22.0%

One-year event risk among people removed by run-in: 20.0%

One-year event risk among participants retained for randomization: 11.0%

Randomized fraction

64.0%

Share of the original eligible cohort that survives the tolerability and adherence screen.

Observed trial risk

11.0%

Event risk inside the polished post-run-in sample that readers often mistake for the answer in ordinary patients.

All-comer risk

14.2%

Event risk if the full eligible population, including early stoppers and poor adherers, still counted.

Stage	Participants	What happened
Eligible before run-in	1,000	The real target population that investigators usually invoke in the abstract.
Removed for intolerance	180	Early adverse effects, inconvenience, or intolerance can vanish from the randomized sample before treatment comparison begins.
Removed for poor adherence	180	The run-in can also exclude people who struggle with the regimen the real world will later ask them to sustain.
Randomized cohort	640	Cleaner, more adherent, more treatment-tolerant, and usually less representative than the recruitment pitch implies.

How to read the toy model

This is not a full trial simulator. It isolates one mechanism: pre-randomization selection can make the randomized cohort healthier, more adherent, and easier to treat than the eligible population named in the protocol.

Sometimes that is acceptable because the scientific question is explicitly about sustained use among people who can tolerate the regimen. Trouble starts when the paper quietly generalizes back to all likely users.

Decision rule

If a run-in excludes people for the very behaviors or harms clinicians care about, the trial estimates a narrower question than “what happens when we prescribe this treatment?”

The stronger the pruning before randomization, the less comfortable you should be generalizing benefit, tolerability, or adherence claims to ordinary practice.

Not All Run-Ins Are the Same, and That Distinction Matters

Run-in type	Why investigators use it	Main methodological cost	Reviewer question
Placebo run-in	Rehearse follow-up, wash out prior therapy, or identify participants unlikely to comply with study procedures.	Selects for engagement and persistence before the trial even starts comparing treatments.	Were participants excluded primarily for protocol convenience or for a scientific reason tied to the estimand?
Active-treatment run-in	Identify intolerance, early response, or feasible dose escalation before randomization.	Removes early harms and difficult users, which can bias later safety and effectiveness impressions.	Which participants were removed, for what exact reasons, and how different were they from those retained?
Single-blind enrichment run-in	Stabilize background therapy or enrich for likely responders before formal comparison.	Can turn a pragmatic-sounding trial into a selected responder study with narrower transportability.	Is the paper estimating effect in all eligible patients or only in those who pass the enrichment filter?
Procedural familiarization	Teach device use or workflow without excluding based on treatment response.	Lower risk of selection bias if exclusion is minimal and transparent, but still not irrelevant.	Did the familiarization period meaningfully exclude participants, or was it mostly training without selective attrition?

Five Failure Modes That Should Make Reviewers Sit Up Straighter

1. The run-in is described as “standard” instead of justified

“Standard” is not a scientific defense. Authors should explain what problem the run-in solves and why that problem matters more than the resulting loss of representativeness.

2. Exclusion reasons are not fully reported

If you only learn how many participants were randomized, not how many were filtered out for nausea, nonadherence, withdrawal, or early events, you are being asked to trust a missing denominator.

3. Safety claims ignore pre-randomization harm

Adverse effects that appear during the active run-in are still adverse effects. Hiding them outside the main comparison does not make them clinically irrelevant.

4. The abstract speaks about “patients” when the trial studied survivors of the screening funnel

A selected randomized cohort may be appropriate, but the language should name it honestly rather than quietly expanding the target population after the fact.

5. There is no comparison between pre-run-in eligibles and randomized participants

Without that comparison, readers cannot judge how much the run-in reshaped age, frailty, symptom burden, baseline risk, or the practical tolerability profile of the cohort.

What Good Reporting Looks Like

Authors should show

The exact duration and procedures of the run-in
Counts and reasons for all pre-randomization exclusions
Whether the run-in used placebo, active drug, dose titration, or adherence thresholds
Baseline characteristics for eligible participants versus randomized participants if feasible
Language in the abstract that matches the selected population actually studied

Reviewers should ask

Would the headline effect plausibly shrink if early intolerant or poorly adherent patients stayed in view?
Does the run-in answer a mechanistic efficacy question or a real-world treatment decision question?
Are pre-randomization adverse events being undercounted in the main safety narrative?
Is the paper overselling external validity after substantial enrichment?
Would a pragmatic parallel cohort or broader effectiveness study tell a meaningfully different story?

The Practical Takeaway

Run-in periods are not methodological misconduct. They can sharpen adherence-sensitive efficacy questions, reduce operational noise, and sometimes make a trial feasible. But feasibility is not the same as innocence. Every run-in creates a second eligibility screen, and second eligibility screens tend to come with second thoughts about generalizability.

If you are reviewing a paper, the right question is not “did they use a run-in?” It is “what population survived it, what outcome question does that population justify, and which harms or burdens disappeared before the comparison even began?”

Use Aqrab when the protocol sounds tidy but the estimand does not

Aqrab is built for exactly this kind of methods friction: the place where the manuscript says “standard design feature” and the careful reader hears “unreported population change.” If you want a structured critique of eligibility filters, enrichment logic, outcome definitions, or causal language, try the review workflow at /try. Teams building their own review or protocol-audit tooling can also start at /developers.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive

Related guides

Trial Design

Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone

A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.

2026-06-19 · 16 min read

Biomarkers

Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit

A practical guide to surrogate endpoints for clinical researchers. Covers validated versus merely plausible surrogates, classic failure modes, and what reviewers should demand before trusting a biomarker-driven trial claim.

2026-06-17 · 16 min read

Methods Critique

AI-Assisted Methods Review: What LLMs Can Catch, What They Cannot, and Where Judgment Still Matters

A practical guide to AI-assisted methods review for clinical researchers. Covers where LLMs help with structural critique, where source verification and causal judgment still require humans, and what reviewers should demand before trusting AI-generated methodological comments.

2026-06-14 · 16 min read

Previous guide

← Washout Periods: When “New Use” Is Just Old Use with Better PR

Next guide

Treatment Switching in Oncology Trials: When Overall Survival Becomes a Rescue Protocol Audit →