← Back to Blog
Causal InferenceConfounding by IndicationObservational Studies

Confounding by Indication: When Sicker Patients Make Treatments Look Dangerous

April 22, 2026·16 min read·By Anas H. Alzahrani, MD, PhD, MPH
Infographic: Confounding by Indication: When Sicker Patients Make Treatments Look Dangerous

Confounding by indication is one of the oldest traps in clinical research, and people still fall into it like it is a hidden hole. The problem is brutally simple: treatments are not assigned at random in routine care. Clinicians give them for reasons, and those reasons usually have a lot to do with prognosis.

My take is blunt. If the sickest patients are more likely to get the treatment, then a crude outcome comparison is mostly measuring clinical judgment, not treatment effect. When a paper ignores that, it is not doing comparative effectiveness research. It is comparing people who were treated for different reasons and pretending the therapy caused the difference.

What Confounding by Indication Actually Means

Confounding by indication happens when the reason a patient receives a treatment is itself related to the outcome. In real practice, doctors prescribe stronger, riskier, or newer treatments to patients who look worse, more complicated, or more urgent. That means the treated group often starts out with a different prognosis before the drug, procedure, or device even gets a chance to work.

Core problem:

Treatment choice carries information about baseline risk, severity, contraindications, and clinician judgment.

If you do not account for that decision process, bad outcomes in the treated group can reflect why they were treated rather than what the treatment did.

The Classic Clinical Example

Imagine an observational study comparing mortality in patients with severe infection who did versus did not receive broad-spectrum antibiotics. The treated group looks worse. Does that mean the antibiotics are harmful? Not so fast.

  • Patients with higher fever, shock, rising lactate, or organ dysfunction are more likely to get aggressive treatment.
  • Those same features predict death even if treatment is beneficial.
  • The treatment group may therefore look doomed at baseline before the first dose is given.

A naive analysis will happily label that baseline severity difference as a treatment effect.

Why the Bias Is So Dangerous

Confounding by indication can make effective treatments look harmful, neutral treatments look helpful, and weak therapies look like miracle rescues if they are preferentially given to carefully selected patients. It is especially vicious in pharmacoepidemiology, ICU studies, oncology, and surgical comparisons where treatment choice is tightly linked to disease severity and fitness for intervention.

What gets tangled together

Treatment effect, baseline severity, clinician intuition, access to care, contraindications, frailty, and timing of deterioration.

What gets missed

The fact that “why this patient got treated” is often more prognostic than any single covariate sitting in the database.

How to Spot It Fast

If I am screening a paper for confounding by indication, I ask four questions immediately:

Why would a clinician choose this treatment?

If the answer is “because the patient looked worse” or “because the patient looked healthier and eligible,” the design is already under stress.

Are severity drivers measured well?

Administrative data usually miss nuance: symptom burden, imaging findings, clinician gestalt, patient preference, and urgency at the bedside.

Could contraindication patterns distort the comparison?

Sometimes untreated patients are not lower risk. They are too frail, too unstable, or too complex to receive the intervention safely.

Does the paper act as if adjustment solved everything?

That is usually the tell. The authors balanced what they measured and quietly ignored everything that drove treatment choice but never made it into the dataset.

Where It Commonly Shows Up

  • Comparing aggressive rescue therapies against standard care in ICU cohorts.
  • Comparing surgery versus no surgery when frailty and operability are poorly captured.
  • Comparing newer cancer regimens with older regimens when line of therapy and tumor burden differ.
  • Comparing anticoagulation use when bleeding risk and physician caution shape who gets treated.
  • Comparing vaccination, screening, or preventive medication uptake when health-seeking behavior is uneven.

Different clinical areas, same core issue: treatment assignment is informative.

Why Regression Usually Does Not Magically Fix It

Standard multivariable adjustment can help, but it does not become causal just because the model has many covariates. If the key drivers of treatment choice are missing, poorly measured, or mis-timed, the regression coefficient is still contaminated.

ApproachWhat it helps withWhat it cannot rescue
Ordinary regressionMeasured baseline confoundersUnmeasured severity, bad time alignment, treatment indication hidden in clinician judgment
Propensity scoresBalancing observed treatment predictorsMissing predictors, contraindication bias, positivity failures, future variables
Instrumental variablesPotentially addresses unmeasured confoundingWeak or invalid instruments, local effects misread as universal effects
Target trial emulationDesign clarity, time-zero alignment, explicit strategy comparisonStill depends on measuring the right confounders and honoring eligibility logic

Confounding by Indication vs Healthy User Bias

These two get mixed up constantly, but they are not the same. Confounding by indication usually means the treated group is sicker because treatment is triggered by risk or severity. Healthy user bias is almost the reverse: patients who seek preventive care or adhere to chronic therapy often look healthier, more organized, and more advantaged in ways that also improve outcomes.

Same family, different direction: one bias pushes treatment toward the sick; the other pushes treatment toward the health-conscious.

A Practical Example in Surgical Outcomes

Suppose you compare survival after revascularization versus medical therapy in patients with complex coronary disease. The surgical group may be younger, anatomically suitable, and robust enough to tolerate a major procedure. Or the opposite: maybe surgery is reserved for the sickest anatomy and most severe ischemic burden.

Either way, treatment selection is not random. It reflects clinical reasoning about anatomy, comorbidity, frailty, operative risk, and expected benefit. If your data do not capture those factors well, your estimate is partly a shadow of selection rather than a clean treatment effect.

What Better Design Looks Like

There is no magic wand, but there are serious ways to reduce the damage.

Define the clinical decision point clearly

Start with the moment a patient could realistically receive either strategy. If the comparison is not fair at time zero, the rest is theater.

Measure the indication, not just demographics

Severity scores, symptom burden, prior treatment history, contraindications, labs, imaging, and clinician-facing decision variables matter more than generic baseline tables.

Choose an estimand that matches the question

Average treatment effect, treatment effect in the treated, per-protocol effect, and dynamic strategy effects are not interchangeable. Sloppy estimands create sloppy comparisons.

Use negative controls and sensitivity analysis

If residual confounding is plausible, act like it. Stress-test the claim instead of pretending your adjustment set achieved purity.

Reviewer Red Flags

  • The treated group is obviously sicker or healthier at baseline and the discussion shrugs.
  • Authors say “we adjusted for all important confounders” without showing how treatment decisions are made clinically.
  • Severity is represented by a few coarse claims codes while bedside decision variables are missing.
  • Massive treatment effects appear in settings where confounding by indication is almost guaranteed.
  • Propensity score balance is presented as if it proves causal identification.

When authors sound too relaxed about selection into treatment, I assume the estimate is carrying more bias than they admit.

What Good Reporting Looks Like

A serious paper should make these points explicit:

  • the exact clinical decision moment being emulated,
  • the observed factors that drive treatment selection,
  • which likely treatment drivers are missing or imperfectly measured,
  • how overlap and positivity were assessed,
  • which estimand is being targeted,
  • what sensitivity analyses probe residual confounding,
  • why the authors believe the comparison is clinically credible.

If a paper never explains why some patients got the treatment and others did not, it probably does not understand its own confounding structure.

The Bottom Line

Confounding by indication is what happens when the clinical reason for treatment hijacks the analysis. The sickest patients often get the strongest treatments, and the healthiest patients often get the interventions they are fit enough to receive. Either way, treatment choice is telling you something about prognosis.

That means observational comparisons live or die on design quality, variable quality, and honesty about what the data cannot capture. Fancy modeling helps only after the causal question is framed correctly.

My blunt version: if your study treats clinician judgment like randomization, the model is not estimating treatment effect. It is estimating the consequences of pretending medicine is a coin flip.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive