Run-In Periods: When Your Trial Randomizes the Easy Patients First
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
A surprising number of trials want credit for studying ordinary clinical practice after first removing the patients who behave most like ordinary clinical practice. That is the run-in period in one sentence.
A run-in period is a pre-randomization interval during which eligible participants may be exposed to placebo, active treatment, adherence monitoring, or procedural rehearsal before investigators decide who actually gets randomized. Sometimes this is useful. Sometimes it is a quiet filtration system for intolerance, poor adherence, early complications, or lack of enthusiasm. The protocol says “optimization.” The estimand often says “we switched populations midway through the recruitment story.”
The Core Design Rule
A run-in period is defensible only if the trial is explicit about the question it creates. If the aim is to estimate treatment effect among participants who can tolerate and sustain the regimen, say that. If the aim is effectiveness for the patients clinicians will actually try to treat, then pre-randomization pruning needs much more skepticism.
Decision rule:
When a run-in excludes people for intolerance, early nonadherence, or early events, assume the trial now estimates a narrower, more treatment-friendly question unless the authors prove otherwise.
Or less politely: if your trial only randomizes the patients who already demonstrated they can live with the intervention, do not market the result as if the difficult patients never existed.
Why Run-In Periods Matter More Than Their Footnote Status Suggests
1. They change who gets randomized
A run-in is not mere logistics. It actively determines whether the randomized cohort includes people prone to adverse effects, practical barriers, early worsening, or weak adherence.
2. They can inflate tolerability and adherence
If participants who stop early never reach randomization, later estimates of side-effect burden and persistence can look suspiciously civilized.
3. They quietly alter generalizability
The headline may sound population-wide, but the true population may be “patients who already passed a rehearsal exam for treatment success.”
A Concrete Clinical Example
Imagine a cardiovascular outcomes trial of a once-daily metabolic drug. Before randomization, everyone enters a four-week active run-in. Participants who miss pill-count targets, report troublesome nausea, or stop therapy early are excluded from randomization.
What the trial gains
Better protocol adherence, fewer immediate discontinuations, cleaner exposure contrast, and less operational chaos after randomization.
What the trial loses
It has now removed exactly the people clinicians most need help understanding: those who feel sick, struggle with routines, or declare early that the regimen is harder than the brochure promised.
What the abstract should say
“Among participants who tolerated and adhered through a pre-randomization run-in, treatment reduced risk…” That sentence is less glamorous and much more honest.
Interactive run-in explorer
Watch a cleaner randomized cohort appear before treatment effect estimation even begins
This toy model assumes a pre-randomization run-in removes people who cannot tolerate or sustain the regimen. Move the sliders and compare the apparent event risk in the randomized sample with the risk in the full eligible cohort that existed before the run-in started pruning it.
Randomized fraction
64.0%
Share of the original eligible cohort that survives the tolerability and adherence screen.
Observed trial risk
11.0%
Event risk inside the polished post-run-in sample that readers often mistake for the answer in ordinary patients.
All-comer risk
14.2%
Event risk if the full eligible population, including early stoppers and poor adherers, still counted.
| Stage | Participants | What happened |
|---|---|---|
| Eligible before run-in | 1,000 | The real target population that investigators usually invoke in the abstract. |
| Removed for intolerance | 180 | Early adverse effects, inconvenience, or intolerance can vanish from the randomized sample before treatment comparison begins. |
| Removed for poor adherence | 180 | The run-in can also exclude people who struggle with the regimen the real world will later ask them to sustain. |
| Randomized cohort | 640 | Cleaner, more adherent, more treatment-tolerant, and usually less representative than the recruitment pitch implies. |
How to read the toy model
This is not a full trial simulator. It isolates one mechanism: pre-randomization selection can make the randomized cohort healthier, more adherent, and easier to treat than the eligible population named in the protocol.
Sometimes that is acceptable because the scientific question is explicitly about sustained use among people who can tolerate the regimen. Trouble starts when the paper quietly generalizes back to all likely users.
Decision rule
If a run-in excludes people for the very behaviors or harms clinicians care about, the trial estimates a narrower question than “what happens when we prescribe this treatment?”
The stronger the pruning before randomization, the less comfortable you should be generalizing benefit, tolerability, or adherence claims to ordinary practice.
Not All Run-Ins Are the Same, and That Distinction Matters
| Run-in type | Why investigators use it | Main methodological cost | Reviewer question |
|---|---|---|---|
| Placebo run-in | Rehearse follow-up, wash out prior therapy, or identify participants unlikely to comply with study procedures. | Selects for engagement and persistence before the trial even starts comparing treatments. | Were participants excluded primarily for protocol convenience or for a scientific reason tied to the estimand? |
| Active-treatment run-in | Identify intolerance, early response, or feasible dose escalation before randomization. | Removes early harms and difficult users, which can bias later safety and effectiveness impressions. | Which participants were removed, for what exact reasons, and how different were they from those retained? |
| Single-blind enrichment run-in | Stabilize background therapy or enrich for likely responders before formal comparison. | Can turn a pragmatic-sounding trial into a selected responder study with narrower transportability. | Is the paper estimating effect in all eligible patients or only in those who pass the enrichment filter? |
| Procedural familiarization | Teach device use or workflow without excluding based on treatment response. | Lower risk of selection bias if exclusion is minimal and transparent, but still not irrelevant. | Did the familiarization period meaningfully exclude participants, or was it mostly training without selective attrition? |
Five Failure Modes That Should Make Reviewers Sit Up Straighter
1. The run-in is described as “standard” instead of justified
“Standard” is not a scientific defense. Authors should explain what problem the run-in solves and why that problem matters more than the resulting loss of representativeness.
2. Exclusion reasons are not fully reported
If you only learn how many participants were randomized, not how many were filtered out for nausea, nonadherence, withdrawal, or early events, you are being asked to trust a missing denominator.
3. Safety claims ignore pre-randomization harm
Adverse effects that appear during the active run-in are still adverse effects. Hiding them outside the main comparison does not make them clinically irrelevant.
4. The abstract speaks about “patients” when the trial studied survivors of the screening funnel
A selected randomized cohort may be appropriate, but the language should name it honestly rather than quietly expanding the target population after the fact.
5. There is no comparison between pre-run-in eligibles and randomized participants
Without that comparison, readers cannot judge how much the run-in reshaped age, frailty, symptom burden, baseline risk, or the practical tolerability profile of the cohort.
What Good Reporting Looks Like
Authors should show
- The exact duration and procedures of the run-in
- Counts and reasons for all pre-randomization exclusions
- Whether the run-in used placebo, active drug, dose titration, or adherence thresholds
- Baseline characteristics for eligible participants versus randomized participants if feasible
- Language in the abstract that matches the selected population actually studied
Reviewers should ask
- Would the headline effect plausibly shrink if early intolerant or poorly adherent patients stayed in view?
- Does the run-in answer a mechanistic efficacy question or a real-world treatment decision question?
- Are pre-randomization adverse events being undercounted in the main safety narrative?
- Is the paper overselling external validity after substantial enrichment?
- Would a pragmatic parallel cohort or broader effectiveness study tell a meaningfully different story?
The Practical Takeaway
Run-in periods are not methodological misconduct. They can sharpen adherence-sensitive efficacy questions, reduce operational noise, and sometimes make a trial feasible. But feasibility is not the same as innocence. Every run-in creates a second eligibility screen, and second eligibility screens tend to come with second thoughts about generalizability.
If you are reviewing a paper, the right question is not “did they use a run-in?” It is “what population survived it, what outcome question does that population justify, and which harms or burdens disappeared before the comparison even began?”
Use Aqrab when the protocol sounds tidy but the estimand does not
Aqrab is built for exactly this kind of methods friction: the place where the manuscript says “standard design feature” and the careful reader hears “unreported population change.” If you want a structured critique of eligibility filters, enrichment logic, outcome definitions, or causal language, try the review workflow at /try. Teams building their own review or protocol-audit tooling can also start at /developers.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone
A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.
Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit
A practical guide to surrogate endpoints for clinical researchers. Covers validated versus merely plausible surrogates, classic failure modes, and what reviewers should demand before trusting a biomarker-driven trial claim.
AI-Assisted Methods Review: What LLMs Can Catch, What They Cannot, and Where Judgment Still Matters
A practical guide to AI-assisted methods review for clinical researchers. Covers where LLMs help with structural critique, where source verification and causal judgment still require humans, and what reviewers should demand before trusting AI-generated methodological comments.