← Back to Blog
Causal InferencePrincipal StratificationTrial Interpretation

Principal Stratification: Estimating Effects When Post-Treatment Variables Matter

April 13, 2026·16 min read·By Anas H. Alzahrani, MD, PhD, MPH
Infographic: Principal Stratification: Estimating Effects When Post-Treatment Variables Matter

Some causal questions fall apart the moment you touch a post-treatment variable. You want the treatment effect among patients who adhered, among patients who would survive long enough to be measured, or among the people whose treatment status was actually changed by encouragement or assignment. The lazy move is to condition on that post-treatment variable and keep going. That is how you turn a real causal question into selection bias with confidence intervals. Principal stratification exists for exactly this problem.

The central idea is sharp: define groups by their potential values of an intermediate post-treatment variable under different treatment conditions, then ask for causal effects within those latent groups. This is how we talk coherently about compliers, always-takers, never-takers, or patients who would survive regardless of treatment. It is also where a lot of published interpretation goes off the rails, because the groups are generally unobserved.

Why Ordinary Subgroup Analysis Fails

Imagine a randomized trial where some patients assigned to treatment do not take it, and some assigned to control obtain the intervention anyway. Or imagine a cancer study where quality of life is only defined among patients who survive six months. Researchers constantly ask questions like:

  • What is the effect among adherers?
  • What is the effect among survivors?
  • What is the effect among patients who would accept treatment if offered?

Those are not stupid questions. They are just dangerous if answered naively. Adherence, survival, and treatment receipt are post-treatment variables. Conditioning on the observed value can break randomization, induce selection bias, and compare biologically different people across arms.

The bad move:

“Among people who survived, treatment looked better.” That comparison is contaminated because survival itself may have been changed by treatment. You are conditioning on the very thing the intervention may have altered.

The Core Idea of Principal Strata

Let Z be treatment assignment, S be some intermediate post-treatment variable, and Y the final outcome. Principal stratification defines groups using the pair (S(0), S(1)), meaning what the intermediate variable would be under control and under treatment.

In noncompliance problems, S is treatment received. That gives the classic strata:

  • Compliers: take treatment if assigned treatment, do not take it if assigned control.
  • Always-takers: take treatment regardless of assignment.
  • Never-takers: do not take treatment regardless of assignment.
  • Defiers: do the opposite of assignment.

In truncation-by-death problems, S might be survival to a follow-up time. Then principal strata include people who would survive under both arms, survive only under treatment, survive only under control, or die under both. The scientifically appealing estimand is often the effect among the always-survivors, because the outcome is well-defined there under both treatments.

The Catch: You Usually Cannot Observe the Stratum

This is the whole game. A patient only experiences one treatment condition, so you never directly see both S(0) and S(1). You cannot open the dataset and label someone “complier” or “always-survivor” with certainty. Principal strata are usually latent.

That means every principal-stratification analysis lives or dies on design assumptions, identifying restrictions, or model structure. If a paper talks about principal strata as though they are observed baseline subgroups, the authors have either simplified too aggressively or do not understand their own method.

The Famous Case: Compliers and the CACE

The most common principal-stratification estimand in practice is the Complier Average Causal Effect or CACE, also called the Local Average Treatment Effectin the instrumental variables literature. In a randomized encouragement design, assignment serves as the instrument and identifies the treatment effect among people whose treatment receipt is actually moved by assignment, namely the compliers.

Why CACE matters:

Intention-to-treat tells you the effect of offering treatment. CACE tells you the effect of actually receiving treatment among the subgroup whose behavior is changed by the offer. Those are different questions, and both can be worth knowing.

Under random assignment, exclusion restriction, relevance, and monotonicity, the Wald estimator gives:

CACE = (Effect of assignment on outcome) / (Effect of assignment on treatment received)

That is clean, useful, and absolutely not a universal treatment effect. It is local to compliers. If the complier population differs from the broader clinical population, your interpretation must say so.

Principal Stratification Is Not Just IV with Fancy Packaging

IV with noncompliance is the entry point, but principal stratification is broader. It gives a general language for causal effects when the outcome is undefined or interpretation changes across post-treatment states. Three recurring use cases matter in clinical research.

Noncompliance

Estimate effects among compliers rather than pretending observed adherers are a clean subgroup.

Truncation by death

Define quality-of-life effects among people for whom the outcome would exist under both arms.

Intermediate response

Separate treatment effects by latent responder types without conditioning on the observed response.

A Clinical Example: Quality of Life When Survival Differs

Suppose an oncology treatment improves survival but causes toxicity. At 12 months, you want to compare quality of life between treatment arms. The naive analysis compares quality of life among survivors only. That is seductive and wrong. Treatment changed who survived, so the survivors in each arm are not the same kind of patients.

Principal stratification reframes the question: what is the effect of treatment on quality of life among patients who would survive to 12 months under either treatment? That is the always-survivor stratum. The outcome is well-defined there under both arms. The price is that you cannot observe who belongs to that stratum directly.

This is why truncation-by-death problems are hard. The estimand is intellectually honest, but it often requires strong assumptions, bounds, or sensitivity analysis rather than clean point identification.

The Assumptions That Do the Heavy Lifting

Different principal-stratification setups need different assumptions, but the usual toolkit includes:

Randomization or conditional exchangeability

Treatment assignment must be ignorable, either by design or after measured confounder adjustment.

Monotonicity

Often stated as no defiers. In encouragement designs, nobody takes treatment only when assigned control. Useful, plausible in some settings, laughable in others.

Exclusion restriction

Assignment affects outcome only through treatment received. This is often the brittle assumption in pragmatic trials and encouragement designs.

Principal ignorability or structural restrictions

Needed in many latent-strata problems to connect observed data to unseen strata. This is where model dependence quietly enters.

My blunt take: the farther you move from randomized assignment with clean noncompliance, the less you should pretend principal stratification is a plug-and-play estimator. It becomes an assumptions machine.

What Principal Stratification Buys You

  • It stops you from conditioning directly on a post-treatment variable and calling the result causal.
  • It makes the target population explicit, such as compliers or always-survivors.
  • It separates scientifically meaningful questions from statistically convenient ones.
  • It forces honesty about when an outcome is undefined for part of the population.

That is all genuinely valuable. A lot of causal rhetoric would improve overnight if people learned this one lesson: the subgroup you want is not always the subgroup you can observe.

What It Does Not Buy You

It does not recover the effect in everyone

CACE is not the average treatment effect. Always-survivor effects are not population-average effects.

It does not make latent groups observed

Posterior probabilities and model-based classifications are not the same thing as directly seeing the principal stratum.

It does not rescue weak design

If assignment is confounded, exclusion restriction is implausible, or positivity is broken, the method will not save you. It will just hide the damage behind notation.

Principal Stratification vs Other Common Moves

ApproachWhat it targetsMain problem
Observed subgroup analysisEffect among those with observed S = 1Usually biased because S is post-treatment
Per-protocol / as-treatedEffect of actual treatment patternsRequires strong confounding control after deviation
Instrumental variablesLocal effect among compliersNeeds instrument validity and local interpretation
Principal stratificationEffect within latent strata defined by S(0), S(1)Strata usually unobserved, identification may be weak

How to Report It Without Bullshitting

If you use principal stratification, reviewers should expect answers to these questions:

  • What intermediate variable defines the principal strata?
  • What is the target estimand, exactly?
  • Why is that estimand scientifically meaningful in this context?
  • Which strata are latent, and what assumptions identify the effect?
  • Did you rely on monotonicity, exclusion restriction, or principal ignorability?
  • How sensitive are results to relaxing those assumptions?
  • Who does the estimate apply to, and who is left out?

Reporting rule:

Never say “the treatment effect” if you estimated a principal-stratum effect. Say “the effect among compliers” or “the effect among always-survivors.” The population is part of the result.

Where Researchers Usually Mess This Up

  • They analyze survivors only and call it a causal effect on quality of life.
  • They treat observed adherence as if it were a baseline subgroup.
  • They cite CACE as though it were the treatment effect for everybody.
  • They assume monotonicity because it is convenient, not because it is plausible.
  • They bury the identification assumptions in the supplement and oversell the estimate in the abstract.

The pattern is consistent: strong language, weak target, fuzzy population. Principal stratification is powerful precisely because it makes those choices explicit. If a paper ends up less explicit after using it, something has gone badly wrong.

The Bottom Line

Principal stratification is one of the cleanest ways to think about post-treatment complications. It does not dodge the problem by pretending the intermediate variable is baseline. It asks the harder question: which latent subgroup defines the scientific estimand we actually care about?

That makes it worth learning. But do not romanticize it. The method is only as credible as the design and assumptions holding it up. In randomized noncompliance settings, it can be excellent. In survival-linked or response-defined subgroups, it is often the right conceptual frame, but point estimates can get shaky fast.

My opinionated version is simple: when a subgroup is created after treatment starts, stop doing ordinary subgroup analysis. Either define a principled causal estimand, often through principal stratification, or admit the question is descriptive. Anything in between is just bias with better branding.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive