Estimands: The Causal Question You Should Define Before Running the Analysis
Most causal arguments fall apart before the model runs. Not because the regression was wrong, but because the research team never pinned down the exact effect they wanted. They say they are estimating “the effect of treatment” when they actually mean one of several different questions.
My take is blunt: if the estimand is vague, the analysis is performative. You cannot judge whether a method is correct if the causal question itself is still fuzzy.
What an Estimand Actually Is
An estimand is the precise treatment effect you want to estimate. It forces you to define the population, treatment strategies, outcome, how you will handle events after treatment starts, and the summary measure used to compare groups.
Core idea:
The estimand is the question. The estimator is the tool. People love arguing about tools while quietly skipping the question.
This is why two studies can analyze the same treatment and the same dataset yet reach different conclusions without either one being mathematically wrong. They may be targeting different effects.
The Fast Intuition
Suppose you are studying a new antihypertensive drug. Are you asking what happens if patients are assigned to start it, regardless of later discontinuation? Or what happens if they actually stay on it as intended? Or what happens before rescue medication gets added?
Those are not wording tweaks. They are different causal questions, and they lead to different design choices, censoring rules, and interpretations.
The Five Pieces You Need to Specify
| Component | What it answers | Typical failure |
|---|---|---|
| Population | Who is the effect for? | Eligibility is vague or changes after analysis starts. |
| Treatment strategies | What interventions are being compared? | “Treatment” is defined loosely or with future information. |
| Outcome | What endpoint matters? | Composite outcomes hide clinically different events. |
| Intercurrent events | What do we do with discontinuation, switching, rescue therapy, death? | Post-baseline events are handled ad hoc after seeing the data. |
| Summary measure | Risk ratio, risk difference, hazard ratio, mean difference? | Measure is chosen for convenience, not because it matches the decision problem. |
The Part Everyone Screws Up: Intercurrent Events
Intercurrent events are things that happen after treatment starts and affect either exposure, outcome interpretation, or both. Treatment discontinuation, switching, transplant, rescue medication, pregnancy, and death are classic examples.
Most papers deal with these badly. They either ignore them, censor them thoughtlessly, or bury them in supplement language. But how you handle them is not a technical footnote. It defines the estimand.
The Common Estimand Strategies
Treatment policy
Estimate the effect of starting treatment regardless of later deviations. This is closest to intention-to-treat logic.
Hypothetical
Estimate what would happen if an intercurrent event did not occur, such as no rescue medication or no discontinuation.
Composite
Fold the intercurrent event into the endpoint itself, for example death or treatment failure as a combined outcome.
While on treatment
Restrict the effect to the period before discontinuation or switching. Useful sometimes, but easy to bias if handled naively.
Why This Matters in Observational Research Even More
Trial people talk about estimands because ICH E9(R1) made them. Observational researchers should care even more, because they already have extra ambiguity around time zero, eligibility, adherence, switching, and censoring.
- Target trial emulation is basically estimand discipline applied to observational data.
- Clone-censor-weight methods exist because per-protocol-style questions need explicit handling of deviation.
- Immortal time bias often starts when the treatment strategy was never defined sharply enough.
If your observational study has a loose treatment definition, a hand-wavy censoring rule, and a vague causal claim, the estimand problem is not academic. It is the main problem.
A Clinical Example
Imagine comparing early invasive versus conservative management after non-ST elevation acute coronary syndrome.
Estimand A
Effect of being assigned an early invasive strategy at baseline, regardless of later crossover or nonadherence.
Estimand B
Effect if patients actually followed their initially chosen strategy through a prespecified period without crossover.
A is closer to a policy question. B is closer to a biological or adherence-dependent question. Both can matter. But mixing them in the same paper is how interpretation turns into mush.
What Good Papers Do
State the causal contrast plainly
One sentence should tell the reader exactly what happens under strategy A versus strategy B, in whom, over what follow-up.
Define time zero with discipline
Eligibility, treatment assignment, and follow-up start should align. If they do not, your estimand is probably drifting already.
Pre-specify intercurrent event handling
Do not wait to see treatment switching patterns and then decide whether to censor, ignore, or redefine the endpoint.
Match method to estimand
Treatment-policy questions, per-protocol questions, and hypothetical questions often require different designs and analytic strategies.
Reviewer Red Flags
- The paper says “effect of treatment” without defining what happens after discontinuation, switching, or rescue therapy.
- Eligibility and exposure are defined using future information.
- Censoring rules appear only in the statistical appendix and clearly were not part of the design logic.
- The discussion interprets a while-on-treatment estimate as if it were an intention-to-treat policy effect.
- Different tables in the same paper quietly target different causal questions.
The Practical Bottom Line
Estimands sound abstract until you realize they are really just a demand for intellectual honesty. What exactly are you trying to estimate? For whom? Under what treatment behavior? With what follow-up logic?
Answer that first. Then pick the design and estimator that serve it. Not the other way around. Because when the causal question is sloppy, the rest of the analysis is just polished confusion.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Prevalent-User Bias: When Your Drug Study Starts After the Interesting Harm Already Happened
A practical guide to prevalent-user bias for clinical researchers. Covers depletion of susceptibles, survivor selection, post-treatment baseline covariates, and what reviewers should demand before trusting late-entry treatment cohorts.
Clone-Censor-Weight: The Target Trial Fix That Still Breaks When You Use It Casually
A practical guide to clone-censor-weight for clinical researchers. Covers when the design is needed, how cloning and artificial censoring work, where immortal time bias reappears, and what reviewers should demand before trusting a target trial emulation.
Per-Protocol Effects: The Estimand Everyone Wants and the Bias Trap They Usually Build
A practical guide to per-protocol effects for clinical researchers. Covers sustained-adherence estimands, naive as-treated failure, selection bias after protocol deviation, and what reviewers should demand before trusting per-protocol claims.