Principal Stratification: Estimating Effects When Post-Treatment Variables Matter

Some causal questions fall apart the moment you touch a post-treatment variable. You want the treatment effect among patients who adhered, among patients who would survive long enough to be measured, or among the people whose treatment status was actually changed by encouragement or assignment. The lazy move is to condition on that post-treatment variable and keep going. That is how you turn a real causal question into selection bias with confidence intervals. Principal stratification exists for exactly this problem.
The central idea is sharp: define groups by their potential values of an intermediate post-treatment variable under different treatment conditions, then ask for causal effects within those latent groups. This is how we talk coherently about compliers, always-takers, never-takers, or patients who would survive regardless of treatment. It is also where a lot of published interpretation goes off the rails, because the groups are generally unobserved.
Why Ordinary Subgroup Analysis Fails
Imagine a randomized trial where some patients assigned to treatment do not take it, and some assigned to control obtain the intervention anyway. Or imagine a cancer study where quality of life is only defined among patients who survive six months. Researchers constantly ask questions like:
- What is the effect among adherers?
- What is the effect among survivors?
- What is the effect among patients who would accept treatment if offered?
Those are not stupid questions. They are just dangerous if answered naively. Adherence, survival, and treatment receipt are post-treatment variables. Conditioning on the observed value can break randomization, induce selection bias, and compare biologically different people across arms.
The bad move:
“Among people who survived, treatment looked better.” That comparison is contaminated because survival itself may have been changed by treatment. You are conditioning on the very thing the intervention may have altered.
The Core Idea of Principal Strata
Let Z be treatment assignment, S be some intermediate post-treatment variable, and Y the final outcome. Principal stratification defines groups using the pair (S(0), S(1)), meaning what the intermediate variable would be under control and under treatment.
In noncompliance problems, S is treatment received. That gives the classic strata:
- Compliers: take treatment if assigned treatment, do not take it if assigned control.
- Always-takers: take treatment regardless of assignment.
- Never-takers: do not take treatment regardless of assignment.
- Defiers: do the opposite of assignment.
In truncation-by-death problems, S might be survival to a follow-up time. Then principal strata include people who would survive under both arms, survive only under treatment, survive only under control, or die under both. The scientifically appealing estimand is often the effect among the always-survivors, because the outcome is well-defined there under both treatments.
The Catch: You Usually Cannot Observe the Stratum
This is the whole game. A patient only experiences one treatment condition, so you never directly see both S(0) and S(1). You cannot open the dataset and label someone “complier” or “always-survivor” with certainty. Principal strata are usually latent.
That means every principal-stratification analysis lives or dies on design assumptions, identifying restrictions, or model structure. If a paper talks about principal strata as though they are observed baseline subgroups, the authors have either simplified too aggressively or do not understand their own method.
The Famous Case: Compliers and the CACE
The most common principal-stratification estimand in practice is the Complier Average Causal Effect or CACE, also called the Local Average Treatment Effectin the instrumental variables literature. In a randomized encouragement design, assignment serves as the instrument and identifies the treatment effect among people whose treatment receipt is actually moved by assignment, namely the compliers.
Why CACE matters:
Intention-to-treat tells you the effect of offering treatment. CACE tells you the effect of actually receiving treatment among the subgroup whose behavior is changed by the offer. Those are different questions, and both can be worth knowing.
Under random assignment, exclusion restriction, relevance, and monotonicity, the Wald estimator gives:
That is clean, useful, and absolutely not a universal treatment effect. It is local to compliers. If the complier population differs from the broader clinical population, your interpretation must say so.
Principal Stratification Is Not Just IV with Fancy Packaging
IV with noncompliance is the entry point, but principal stratification is broader. It gives a general language for causal effects when the outcome is undefined or interpretation changes across post-treatment states. Three recurring use cases matter in clinical research.
Noncompliance
Estimate effects among compliers rather than pretending observed adherers are a clean subgroup.
Truncation by death
Define quality-of-life effects among people for whom the outcome would exist under both arms.
Intermediate response
Separate treatment effects by latent responder types without conditioning on the observed response.
A Clinical Example: Quality of Life When Survival Differs
Suppose an oncology treatment improves survival but causes toxicity. At 12 months, you want to compare quality of life between treatment arms. The naive analysis compares quality of life among survivors only. That is seductive and wrong. Treatment changed who survived, so the survivors in each arm are not the same kind of patients.
Principal stratification reframes the question: what is the effect of treatment on quality of life among patients who would survive to 12 months under either treatment? That is the always-survivor stratum. The outcome is well-defined there under both arms. The price is that you cannot observe who belongs to that stratum directly.
This is why truncation-by-death problems are hard. The estimand is intellectually honest, but it often requires strong assumptions, bounds, or sensitivity analysis rather than clean point identification.
The Assumptions That Do the Heavy Lifting
Different principal-stratification setups need different assumptions, but the usual toolkit includes:
Randomization or conditional exchangeability
Treatment assignment must be ignorable, either by design or after measured confounder adjustment.
Monotonicity
Often stated as no defiers. In encouragement designs, nobody takes treatment only when assigned control. Useful, plausible in some settings, laughable in others.
Exclusion restriction
Assignment affects outcome only through treatment received. This is often the brittle assumption in pragmatic trials and encouragement designs.
Principal ignorability or structural restrictions
Needed in many latent-strata problems to connect observed data to unseen strata. This is where model dependence quietly enters.
My blunt take: the farther you move from randomized assignment with clean noncompliance, the less you should pretend principal stratification is a plug-and-play estimator. It becomes an assumptions machine.
What Principal Stratification Buys You
- It stops you from conditioning directly on a post-treatment variable and calling the result causal.
- It makes the target population explicit, such as compliers or always-survivors.
- It separates scientifically meaningful questions from statistically convenient ones.
- It forces honesty about when an outcome is undefined for part of the population.
That is all genuinely valuable. A lot of causal rhetoric would improve overnight if people learned this one lesson: the subgroup you want is not always the subgroup you can observe.
What It Does Not Buy You
It does not recover the effect in everyone
CACE is not the average treatment effect. Always-survivor effects are not population-average effects.
It does not make latent groups observed
Posterior probabilities and model-based classifications are not the same thing as directly seeing the principal stratum.
It does not rescue weak design
If assignment is confounded, exclusion restriction is implausible, or positivity is broken, the method will not save you. It will just hide the damage behind notation.
Principal Stratification vs Other Common Moves
| Approach | What it targets | Main problem |
|---|---|---|
| Observed subgroup analysis | Effect among those with observed S = 1 | Usually biased because S is post-treatment |
| Per-protocol / as-treated | Effect of actual treatment patterns | Requires strong confounding control after deviation |
| Instrumental variables | Local effect among compliers | Needs instrument validity and local interpretation |
| Principal stratification | Effect within latent strata defined by S(0), S(1) | Strata usually unobserved, identification may be weak |
How to Report It Without Bullshitting
If you use principal stratification, reviewers should expect answers to these questions:
- What intermediate variable defines the principal strata?
- What is the target estimand, exactly?
- Why is that estimand scientifically meaningful in this context?
- Which strata are latent, and what assumptions identify the effect?
- Did you rely on monotonicity, exclusion restriction, or principal ignorability?
- How sensitive are results to relaxing those assumptions?
- Who does the estimate apply to, and who is left out?
Reporting rule:
Never say “the treatment effect” if you estimated a principal-stratum effect. Say “the effect among compliers” or “the effect among always-survivors.” The population is part of the result.
Where Researchers Usually Mess This Up
- They analyze survivors only and call it a causal effect on quality of life.
- They treat observed adherence as if it were a baseline subgroup.
- They cite CACE as though it were the treatment effect for everybody.
- They assume monotonicity because it is convenient, not because it is plausible.
- They bury the identification assumptions in the supplement and oversell the estimate in the abstract.
The pattern is consistent: strong language, weak target, fuzzy population. Principal stratification is powerful precisely because it makes those choices explicit. If a paper ends up less explicit after using it, something has gone badly wrong.
The Bottom Line
Principal stratification is one of the cleanest ways to think about post-treatment complications. It does not dodge the problem by pretending the intermediate variable is baseline. It asks the harder question: which latent subgroup defines the scientific estimand we actually care about?
That makes it worth learning. But do not romanticize it. The method is only as credible as the design and assumptions holding it up. In randomized noncompliance settings, it can be excellent. In survival-linked or response-defined subgroups, it is often the right conceptual frame, but point estimates can get shaky fast.
My opinionated version is simple: when a subgroup is created after treatment starts, stop doing ordinary subgroup analysis. Either define a principled causal estimand, often through principal stratification, or admit the question is descriptive. Anything in between is just bias with better branding.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Treatment-Induced Mediator-Outcome Confounding: When Mediation Analysis Starts Chasing the Consequences of Treatment
A practical guide to treatment-induced mediator-outcome confounding for clinical researchers. Covers why natural direct and indirect effects fail when treatment changes later severity, toxicity, adherence, or surveillance that affect both the mediator and outcome.
Stochastic Interventions: When “Treat Everyone” Is Not the Policy Question
A practical guide to stochastic interventions for clinical researchers. Covers when deterministic treatment rules become unrealistic, how probability-shift interventions preserve positivity, and what reviewers should demand before trusting policy-effect claims.
Calendar Time Confounding: When Secular Trends Pretend Your Intervention Worked
A practical guide to calendar time confounding for clinical researchers. Covers secular trends, treatment diffusion, concurrent comparators, and what reviewers should demand before trusting real-world benefit that may just reflect a later era.