Treatment-Induced Mediator-Outcome Confounding: When Mediation Analysis Starts Chasing the Consequences of Treatment
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
Many mediation papers fail in the same polished, dangerous way. The authors know they need to worry about mediator-outcome confounding, so they adjust for a later clinical variable that affects both. What they do not fully admit is that treatment created that later variable. Once that happens, the standard natural direct and indirect effect machinery is no longer identifying the clean mechanism story they want.
This is the assumption most people mention in one sentence and then sprint past: there must be no mediator-outcome confounder that is itself affected by treatment. In practical clinical work, that means treatment cannot change later severity, toxicity, clinic attendance, adherence support, rescue therapy, or surveillance intensity in a way that then reshapes both the mediator and the final outcome. Unfortunately, that exact structure is common.
The Core Decision Rule
If treatment changed a later variable L, and L then changed both the mediator and the outcome, do not treat a default natural-effect decomposition as a mechanism estimate just because the software returned one.
Decision rule:
Once the treatment reshapes the later clinical course, part of what you are decomposing is the consequence of treatment on that course, not a clean pathway through the mediator alone.
The Structural Problem in One DAG Sentence
Failure mode
A → L → M and A → L → Y, where L is the variable you would need to condition on to block mediator-outcome confounding.
The moment L is both post-treatment and a common cause of mediator and outcome, standard natural direct and indirect effects become much harder to justify. The issue is structural, not a lack of statistical enthusiasm.
Why This Fools Smart Reviewers
| What the paper says | Why it sounds careful | Why it can still fail |
|---|---|---|
| We adjusted for post-baseline disease severity in the mediator and outcome models. | It sounds like the authors controlled the obvious source of mediator-outcome confounding. | If treatment changed severity, that adjustment is not a free cleanup step. It is part of the identification problem. |
| The mediator was measured before the final outcome. | Temporal ordering sounds better than same-visit measurement. | Timing alone does not rescue the design if treatment already changed a later common cause of mediator and outcome before the mediator was recorded. |
| We reported percent mediated. | It sounds like the mechanism was neatly quantified. | A precise percentage can be attached to the wrong estimand if the structural assumptions are not credible. |
A Concrete Clinical Example
Case
Rehabilitation program → exercise adherence → disability at 12 months
Suppose a stroke rehabilitation program improves long-term disability outcomes. Investigators want to claim that the benefit operates mainly through exercise adherence measured at month 3.
The problem is that the program may also improve early post-baseline functional status at week 6. Better function makes later adherence easier and independently improves 12-month disability. That week-6 function variable is not a nuisance covariate floating outside the story. It is a treatment-created common cause of mediator and outcome.
If the analysis adjusts for week-6 function and then reports a natural indirect effect through adherence, the estimate is no longer obviously a clean mechanism measure. It is partly entangled with how treatment changed later clinical trajectory before the mediator was even assessed.
Mediation failure-mode explorer
Check whether treatment created the very confounding your mediation model needs away
Pick a clinical setup, then test the four structural conditions that usually break naive natural direct and indirect effect claims.
Treatment changes functional status during follow-up. That evolving status changes both later adherence, your mediator of interest, and the final outcome.
Verdict
Natural direct and indirect effects are not credibly identified by standard mediation methods here.
Ask whether the paper is really estimating a mechanism, or whether it is partly estimating the consequences of treatment on the later clinical course that then reshaped both mediator and outcome.
Safer target
Prefer interventional direct and indirect effects, or a longitudinal g-method if the mediator and confounding evolve over time.
Reviewer question
What exactly prevents L from being the treatment-created common cause of both mediator and outcome?
Common Clinical Variables That Trigger This Problem
Post-baseline disease severity
The treatment changes how sick the patient is. That later severity then changes both the mediator and the final outcome.
Toxicity or tolerability
Toxicity changes dose intensity, discontinuation, or adherence, while also changing prognosis.
Clinic attendance and monitoring
More visits change both the measured mediator and the chance that outcome worsening is detected or prevented.
Rescue care or cointerventions
Treatment changes who receives additional care, and that added care changes both mediator values and outcomes.
What to Ask Before You Believe a Mechanism Claim
- 1. What later variables did treatment plausibly change before the mediator was measured?
- 2. Could any of those variables affect both the mediator and the outcome?
- 3. Did the analysis need to condition on those variables to make the mediator-outcome comparison look exchangeable?
- 4. If yes, why is the estimand still being described as a natural direct or indirect effect rather than an alternative decomposition?
- 5. Is the paper explicit about whether interventional effects or longitudinal g-methods would be more defensible?
What Usually Works Better
The practical answer is often not to abandon mechanism questions. It is to target a more defensible estimand. Interventional direct and indirect effects are often more realistic because they avoid some of the cross-world fragility of natural effects. If mediator and confounding evolve over time, you may need a full longitudinal g-method instead of a tidy single-mediator decomposition.
Practical default:
When treatment clearly changes later severity, toxicity, surveillance, or adherence support, treat natural effects as the special case that needs defense. Do not treat them as the software default.
Where Aqrab Fits
This is exactly the sort of methods error that survives peer review because the regression output looks polished while the causal structure stays implicit. If you want a second pass on whether a mediation paper is identifying a mechanism or just decomposing a post-treatment mess, Aqrab can help pressure-test the DAG, the estimand, and the reviewer questions before you trust the headline pathway claim.
Start with Aqrab Try if you want to audit a draft or published paper against these failure modes, or use Aqrab Developers if you want to integrate structured methods critique into your own research workflow.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Calendar Time Confounding: When Secular Trends Pretend Your Intervention Worked
A practical guide to calendar time confounding for clinical researchers. Covers secular trends, treatment diffusion, concurrent comparators, and what reviewers should demand before trusting real-world benefit that may just reflect a later era.
Mediation Analysis: When You Want the Mechanism, Not Just the Effect
A practical guide to mediation analysis for clinical researchers. Covers direct and indirect effects, mediator-outcome confounding, treatment-induced confounding, interventional effects, and why most mediator-adjusted regressions are wrong.
Differential Misclassification: When One Study Arm Gets More Chances to Be Wrong
A practical guide to differential misclassification for clinical researchers. Covers arm-specific outcome detection, adjudication asymmetry, false positives, missed events, and what reviewers should demand before trusting an effect estimate.