← Back to Blog
Causal InferenceEstimandsTrial Design

Per-Protocol Effects: The Estimand Everyone Wants and the Bias Trap They Usually Build

May 11, 2026·15 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Researchers say they want the effect of actually staying on treatment. Fair enough. Clinicians care about what happens if patients really follow the strategy, not just what happens after a randomization envelope is opened and life gets messy.

The problem is that many so-called per-protocol analyses are not estimating a clean sustained-adherence effect at all. They are estimating the outcome among the patients healthy enough, motivated enough, or tolerated-enough to keep doing the thing. That is a very different population, and usually a much more flattering one.

What the Per-Protocol Effect Is Trying to Estimate

The intention-to-treat effect asks what happens when patients are assigned to a strategy. The per-protocol effect asks what would happen if patients followed the strategy as specified, over time, allowing only the deviations the protocol explicitly permits.

EstimandMain questionUsual danger
Intention-to-treatWhat is the effect of being assigned or starting the strategy?Dilution by nonadherence if readers secretly wanted sustained use
Per-protocolWhat is the effect if patients adhere to the protocol strategy over follow-up?Selection bias once adherence depends on evolving prognosis
Naive as-treatedWhat happened among those who ended up taking treatment?Post-baseline confounding and immortal time bias dressed as pragmatism

The per-protocol effect is often scientifically interesting. It can matter for comparative effectiveness, implementation, dose persistence, or policy planning. It just becomes treacherous the moment adherence is no longer random, which is to say: almost immediately.

Why the Simple Version Fails

Suppose a trial compares biologic A versus biologic B in inflammatory bowel disease. Patients on biologic A are more likely to stop early because of infusion reactions. Patients with worsening disease on biologic B are more likely to switch rescue therapy.

Now imagine the analyst says: “We performed a per-protocol analysis by excluding everyone who discontinued or switched.” That sounds tidy. It is also usually wrong.

The core problem

Discontinuation and switching are often caused by toxicity, early response, disease progression, access, or clinician judgment. Those same forces are tied to the outcome. Conditioning on adherence therefore selects patients in a prognostically distorted way.

The adherers are not just “more protocol-compliant versions” of the original cohort. They are a selected subgroup shaped by post-baseline events. Once you restrict to them without handling that selection properly, the comforting phrase “per-protocol” becomes mostly decorative.

Interactive estimand triage

Is this a real per-protocol analysis, or just post-baseline wishful thinking?

Toggle the design choices below. The explorer will tell you whether you are still answering an assignment question, whether a per-protocol effect is plausible, or whether the analysis is quietly turning into a selected-patient comparison.

Likely interpretationRed flag: naïve per-protocol

What is the main question?

Will clinically meaningful nonadherence or crossover happen?

Are you planning to exclude, censor, or reclassify patients once they deviate?

How well can you measure time-varying reasons for adherence or discontinuation?

What you are probably estimating

Estimand: Mostly a selected-survivor comparison, not a clean protocol effect

Dropping nonadherent patients after baseline when adherence depends on evolving symptoms, toxicity, or clinician judgment usually creates selection bias. With weak measurement of those drivers, the estimate becomes very hard to defend.

Main warning

“We excluded patients who stopped treatment” is often just immortal time bias and informative censoring wearing a protocol badge.

What to do next

  • Do not present this as a credible per-protocol effect
  • Either stay with ITT or redesign the per-protocol analysis explicitly
  • Collect or model time-varying predictors of adherence and censoring before trying g-methods

When Per-Protocol Is Worth Estimating

Per-protocol becomes worth the effort when the practical question is genuinely about sustained use rather than mere assignment.

Good fit

  • Does continuing anticoagulation for 12 months reduce recurrence?
  • What is the effect of remaining on first-line biologic therapy unless prespecified toxicity occurs?
  • What happens under sustained use of a monitoring protocol, not just initial enrollment?

Bad fit

  • You mainly need the effect of launching a policy or recommending a first prescription.
  • You cannot define what counts as deviation, grace period, or rescue treatment.
  • You have little information on why patients stop, switch, or drift off protocol.

A useful rule: if you cannot describe the sustained strategy in protocol language a clinician could actually follow, you are not ready to estimate its effect.

The Design Work Most Papers Skip

  1. Define adherence operationally. Dose windows, grace periods, allowable interruptions, rescue medications, and switching rules should be explicit.
  2. State when deviation occurs. The clock matters. “Ever discontinued” is usually too crude and leaks future information backward into baseline language.
  3. Decide how intercurrent events are handled. Toxicity, pregnancy, transplant, treatment escalation, death, and loss to follow-up do not all belong in the same bucket.
  4. Measure post-baseline predictors of deviation. Symptoms, adverse events, biomarkers, access disruptions, and clinician decisions are not optional scenery.
  5. Choose a method that matches the problem. Once adherence is time-varying, ordinary regression rarely deserves your trust.

This is where target trial emulation and g-method thinking earn their rent. A defensible per-protocol analysis behaves less like a convenience subgroup and more like an explicit longitudinal intervention question.

Clinical Example: Statin Discontinuation After Myocardial Infarction

Imagine an observational study comparing patients who continue high-intensity statins for one year after myocardial infarction with those who discontinue early.

Early discontinuation is not random. Patients may stop because of myalgias, frailty, drug interactions, liver test changes, cost, or because they are clinically deteriorating and medications are being simplified. Those reasons are related to future cardiovascular risk and mortality.

What a naive analysis does

Compare “continuers” versus “discontinuers” after reclassifying exposure over follow-up. Result: continuers look healthier. Shocking news from the department of exactly what you selected.

What a real per-protocol analysis needs

A defined treatment strategy, explicit deviation rules, time-updated information on symptoms and clinical status, and a method such as inverse-probability weighting to account for informative adherence and censoring.

Decision Rules for Reviewers and Analysts

If you see...Assume...What should happen next
“Patients who deviated were excluded.”Selection bias is likely.Ask what caused deviation and how post-baseline predictors were handled.
No definition of grace period, switching, or dose holds.The estimand is still mushy.Request protocol-style operational definitions before trusting the result.
Ordinary Cox or logistic regression after reclassifying exposure over time.Time-varying confounding may be running the show.Consider IP weighting, g-computation, or a better emulation of the sustained strategy.
Large ITT/per-protocol divergence with little explanation.Either adherence matters a lot or bias does.Demand adherence diagnostics, deviation reasons, and sensitivity analysis.

The Failure Modes That Keep Reappearing

1. Smuggling in future information

Exposure groups defined by what patients eventually did over follow-up create timeline distortions fast. If someone must survive long enough to be counted as adherent, the bias has already started unpacking.

2. Treating adherence like a personality trait

Patients do not stop treatment because they hate methodology. They stop because symptoms change, toxicities happen, logistics fail, or clinicians intervene. Those reasons matter causally.

3. Using “per-protocol” as a synonym for “secondary analysis”

A secondary analysis can still be good. But calling it per-protocol does not spare it from having to define the protocol and estimate the right contrast.

4. Forgetting positivity

If some clinically important subgroups almost never stay on therapy, the sustained-adherence estimand may become unstable or effectively unidentified for them.

Where Aqrab Fits

Per-protocol claims are exactly where method sections start sounding confident while quietly changing the question. Aqrab is useful here because it can pressure-test the estimand, the timeline, the deviation rules, and whether the analysis actually handles post-baseline selection instead of just renaming it.

If you are reviewing a protocol, manuscript, or AI-generated methods section that suddenly promises the effect of sustained treatment, try Aqrab. If you want that critique embedded upstream in your own workflows, the developer docs are the cleaner place to start.

The Practical Bottom Line

The per-protocol effect is not fake. It is often the question people actually care about. But it only becomes credible when you define the sustained strategy clearly and treat adherence as a time-varying process, not a polite afterthought.

If the analysis removes deviators and calls it a day, be suspicious. If it defines deviations, measures why they happen, aligns the timeline, and uses methods built for longitudinal selection, now we are talking.

In other words: per-protocol is a real estimand. Naive per-protocol is usually just optimism with attrition.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive