← Back to Blog
Real-World EvidenceBias DiagnosticsMethods Critique

Confounding by Contraindication: When the Untreated Group Is Too Fragile for the Therapy

June 11, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Clinical researchers learn early that treated patients can look worse because clinicians reserve therapy for the sickest patients. That is confounding by indication. The mirror-image trap is quieter and just as dangerous: the untreated group can look worse because some patients were never reasonable treatment candidates in the first place.

That is confounding by contraindication. Frailty, renal failure, bleeding risk, hemodynamic instability, drug interactions, poor functional status, or anticipated intolerance push higher-risk patients away from treatment. A naive comparison then rewards the therapy for a selection process that happened before the first dose.

The Core Decision Rule

Before comparing treated and untreated patients, ask whether both groups were genuinely eligible to receive either strategy under similar clinical circumstances.

Decision rule:

If the untreated group contains many patients who were withheld from therapy because they were too fragile to receive it safely, the analysis may be estimating treatment candidacy rather than treatment effect.

This is why “treated versus untreated” is often the wrong first comparison in observational data. The question is not whether both groups avoided randomization. The question is whether both groups were even competing for the same clinical decision.

What the Bias Actually Looks Like

Typical triggers

Anticoagulation withheld in patients with active bleeding risk, surgery avoided in frail patients, nephrotoxic therapies skipped in advanced kidney disease, and intensive regimens deferred in people with poor performance status.

Why it fools analysts

The untreated group looks like a clean comparator on paper. In practice it may be a refuge for patients whose baseline risk was already too high to tolerate the intervention.

What gets misread

Better treated outcomes get interpreted as drug benefit, when part of the contrast is simply that the treatment group passed an invisible eligibility screen.

A Concrete Clinical Example

Imagine a real-world study of anticoagulation after atrial fibrillation diagnosis in very elderly patients. Some patients do not receive anticoagulation because they have recent gastrointestinal bleeding, recurrent falls, severe thrombocytopenia, or advanced frailty with uncertain adherence.

What the raw data show

Treated patients have fewer strokes and sometimes even lower all-cause mortality than untreated patients.

What the clinical reality may be

The untreated group may contain patients at high baseline risk of both thromboembolism and death, plus extra bleeding risk that made clinicians avoid treatment in the first place.

Why the estimate drifts

Untreated patients are not merely unexposed. They are clinically different in the exact ways that matter for outcome risk and treatment candidacy.

Adjustment can help only if the analysis captures the reasons treatment was withheld with enough timing and granularity. A diagnosis code for anemia is not the same thing as the clinician's real judgment that this patient should not be anticoagulated today.

Interactive contraindication-bias explorer

Watch a treatment look protective because the highest-risk patients never receive it

This toy model assumes treatment has the same effect within each risk stratum. The distortion comes only from who is ruled out of treatment because of frailty, bleeding risk, organ failure, or another clinical contraindication.

Naive risk difference-3.8%treated minus untreated without fixing eligibility imbalance

This is the untreated group absorbing patients who were judged too unstable, too frail, or too risky for the therapy.

Lower values mean the treatment cohort is increasingly filtered toward patients who look safer to treat.

Negative values mean treatment is beneficial. Positive values mean treatment is harmful. If the naive estimate disagrees with this slider, the design is letting contraindication patterns speak louder than the biology.

Observed treated risk

16.2%

Lower partly because treatment works, but also because the cohort is filtered toward patients who could safely receive it.

Observed untreated risk

20.0%

Higher partly because this group contains patients who were not treatment candidates to begin with.

Naive risk ratio

0.81

A ratio below 1 can look reassuring even when the treatment slider is set to zero or net harm.

QuantityValueInterpretation
True stratum-specific effect3.0%This is the causal effect you meant to learn inside comparable patients.
Naive observed risk difference-3.8%This is what an uncritical cohort comparison reports after mixing treatment effect with eligibility filtering.
Direction checkNaive estimate flips the signSign reversal is the classic warning that contraindication patterns are masquerading as treatment benefit or harm.
Decision cueLarge untreated-risk enrichmentWhen the untreated group contains many patients who were never reasonable candidates, ask whether the study is estimating treatment effect or operability.

How This Differs from Confounding by Indication and Healthy User Bias

Bias patternWho gets treatment?What false story appears?
Confounding by indicationPatients who look sicker, more urgent, or more severe.Treatment looks harmful because clinicians aim it at patients with worse prognosis.
Confounding by contraindicationPatients judged safe or fit enough to receive therapy.Treatment looks protective because the highest-risk noncandidates remain untreated.
Healthy user or healthy adherer biasPatients who are more preventive, engaged, or resourced.Treatment or prevention looks beneficial because uptake tracks broader health-seeking behavior.

These biases can co-occur. A therapy can be channeled toward clinically suitable patients while also being adopted faster by patients with better support, cleaner access, or more consistent follow-up. The important thing is to separate the mechanisms instead of calling every awkward comparison “residual confounding” and moving on.

Failure Modes That Should Make a Reviewer Slow Down

Red flags

  • The untreated group has obvious frailty, organ failure, or bleeding-risk enrichment.
  • The paper compares treatment versus no treatment when active comparators exist in practice.
  • Contraindications are summarized coarsely or only after treatment initiation.
  • Propensity-score balance looks good despite poor overlap in clinical eligibility.
  • Mortality is lower in treated patients by an implausibly large margin across many outcomes.

Better reviewer questions

  • Could both groups realistically have received either strategy on the index date?
  • What exact clinical findings made treatment unsafe or inappropriate, and were they measured?
  • Would an active-comparator new-user design better align candidacy across groups?
  • How much trimming or restriction was needed to create overlap?
  • Do negative controls or falsification outcomes suggest a broader healthy-candidate signal?

Why Statistical Adjustment Often Underperforms Here

Contraindication patterns are difficult because they live partly in structured variables and partly in bedside synthesis. Kidney function, platelet count, prior bleed, frailty score, oxygen requirement, drug interactions, cognitive impairment, and anticipated adherence all matter, but they rarely line up perfectly inside one database snapshot.

ApproachWhat it can help withWhy it still breaks
Ordinary regressionMeasured comorbidity and lab differences.It cannot recover the unrecorded clinical judgment that made treatment unsafe.
Propensity scores or weightingBalancing observed treatment predictors when overlap exists.If the untreated group includes patients with no realistic probability of treatment, positivity fails before the weighting begins.
Target trial emulationClarifying eligibility, time zero, and strategy definitions.Good emulation still requires the eligibility criteria to exclude people who were never clinically in play for treatment.
Instrumental variablesPotential leverage against unmeasured confounding in narrow settings.Valid instruments are rare, and local effects can be badly misread as broad treatment truth.

What Better Design Usually Looks Like

Restrict to patients who were plausible candidates

If half the untreated group had a hard contraindication on day zero, you do not have a clean treatment-versus-no-treatment study. You have an eligibility comparison. Restrict first.

Prefer active comparators when possible

Comparing one anticoagulant with another or one systemic option with another often aligns clinical candidacy better than comparing treatment with nonuse.

Audit overlap, not just balance

A tidy standardized-difference table can hide that some patients were effectively untreated by necessity. Overlap plots and explicit exclusion rules tell the more honest story.

Use negative controls thoughtfully

If treated patients also look mysteriously protected against outcomes that should not respond to the therapy, you may be observing a healthy-candidate signature rather than a drug effect.

Where Aqrab Fits

Contraindication bias tends to hide behind respectable language like “patients were managed according to clinician judgment” or “treatment choice reflected routine care.” Those phrases may be true. They are not enough.

Aqrab is useful precisely when a manuscript needs this kind of design-level skepticism. If you want a fast critique of whether the comparator, eligibility logic, and overlap story actually support a causal claim, start with Aqrab. If you need those checks embedded into a reproducible review workflow, the developer tools are the next layer.

The Bottom Line

When the untreated group includes patients who were too fragile for treatment, better treated outcomes do not automatically mean treatment worked. They may only mean the therapy was reserved for patients who could survive receiving it.

Before you celebrate a reassuring effect estimate, ask the impolite question: was this a treatment comparison, or an operability comparison wearing causal language?

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive