← Back to Blog
Causal InferenceClinical EpidemiologyEffect Measures

Noncollapsibility of Odds Ratios: Why Adjustment Can Change the Number Even When Confounding Did Not

May 13, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Few statistical facts create more fake methodological confidence than this one: the adjusted odds ratio changed, therefore confounding was fixed.

Sometimes adjustment does reduce confounding. Sometimes the odds ratio changes because odds ratios are noncollapsible, which is a less glamorous way of saying the crude and conditional versions need not match even when treatment is perfectly balanced within strata and no bias is being repaired at all.

What Noncollapsibility Actually Means

A measure is collapsible if, in the absence of confounding, the marginal effect equals the common stratum-specific effect. Risk differences are collapsible. Risk ratios are collapsible under common conditions. Odds ratios are not.

Effect measureIf there is no confounding...Interpretive consequence
Risk differenceMarginal and conditional effects usually alignCrude versus adjusted changes are easier to read as confounding or effect modification
Risk ratioOften aligns when the stratum-specific ratio is commonStill needs care, but usually behaves more intuitively for readers
Odds ratioMay differ even with zero confoundingA shifted adjusted OR is not automatic evidence that bias was removed

This is not a software bug. It is a property of the measure. The odds ratio operates on odds, and odds are nonlinear enough that averaging across baseline-risk strata changes the summary even when the within-stratum odds ratio is identical everywhere.

What Noncollapsibility Is Not

Not confounding

You can see noncollapsibility in a randomized trial or a perfectly balanced simulation. No rogue covariate is required.

Not effect modification

The stratum-specific odds ratios can be identical and the crude odds ratio can still differ. The problem is not heterogeneity first; it is the measure itself.

Not harmless wording

If a common-outcome study reports the odds ratio as though it were a risk ratio, readers can walk away with a much louder treatment story than the data support.

The practical mistake is subtle: analysts notice a crude OR of 1.5 and an adjusted OR of 1.9, then tell themselves a satisfying story about hidden confounders being tamed. That story may be true, but the odds ratio alone does not prove it.

Interactive explainer

Watch the crude odds ratio move even when confounding is zero

Both strata below share the same conditional odds ratio. Treatment is balanced within strata. There is no confounding to fix. Yet the marginal odds ratio still drifts because odds ratios are noncollapsible.

Key contrastconditional OR 1.80marginal OR 1.63

Effect framing

PopulationUntreated riskTreated riskRisk ratioOdds ratio
Lower-risk stratum8.0%13.5%1.691.80
Higher-risk stratum40.0%54.5%1.361.80
Marginal population19.2%27.9%1.451.63

How to read this

Here the crude and adjusted odds ratios diverge even though there is no confounding in the setup. The shift comes from odds-ratio geometry, not from adjustment fixing bias.

Marginal risk difference: 8.7 percentage points

Why it matters: the adjusted odds ratio can move away from the crude odds ratio even if adjustment did not remove confounding.

  • Rare outcomes make the gap smaller.
  • Common outcomes and heterogeneous baseline risk make the gap larger.
  • A different crude versus adjusted OR is not automatic evidence that confounding was fixed.

Why Logistic Regression Keeps Producing This Headache

Logistic regression estimates conditional odds ratios. That is perfectly legitimate if you truly want a conditional odds parameter. The trouble begins when the reported coefficient is casually translated into a population-level treatment story.

Reviewer rule of thumb

If the outcome is common and the paper says “patients were X times more likely” based on an odds ratio, stop reading that sentence as plain English. It has already become more confident than the method deserves.

In low-incidence settings, odds ratios and risk ratios sit close together and the distinction hurts less. As outcomes become common, the gap widens. Once baseline risk varies meaningfully across patients, the adjusted OR can also drift away from the crude OR even in clean designs.

Clinical Example: ICU Delirium and Sedation Exposure

Imagine a multicenter ICU study on deep sedation and delirium. Suppose investigators use logistic regression, adjust for severity, age, and ventilation status, and report an adjusted odds ratio of 2.1 for delirium.

If delirium is common, that 2.1 does not mean the sedated group had about double the risk. It means the adjusted odds were about doubled conditional on covariates in the model. The corresponding risk ratio may be much smaller, and the absolute risk increase may be the clinically useful number.

What a strong paper would show

  • Adjusted risk estimates or standardized probabilities, not only adjusted odds ratios
  • Baseline risk context so the reader can tell whether the outcome is rare or common
  • Language that distinguishes odds from risk instead of quietly swapping the nouns
  • A clear statement of whether covariate adjustment changed the estimand, the confounding structure, or just the metric presentation

Decision Rules for Authors and Reviewers

  1. Ask whether the outcome is common. If yes, assume odds ratios will be more fragile as a communication tool.
  2. Do not treat crude-versus-adjusted OR movement as proof of confounding control. Check the design logic and covariate balance story separately.
  3. Prefer adjusted risks, risk differences, or risk ratios when clinical interpretation matters. Standardization, g-computation, marginal models, or post-estimation risk conversion often help.
  4. Report absolute risks whenever decisions are clinical. Clinicians prescribe to people, not to log-odds coefficients.
  5. Use the word “odds” when you mean odds. If the manuscript translates an OR into “times more likely,” make it earn that wording or delete it.

A nice side effect of this discipline is that it makes AI-assisted manuscript review far better. Once the estimand, scale, and baseline risk are explicit, nonsense has fewer places to hide.

Reviewer Red-Flag Table

If the paper says...Possible hidden problemWhat to ask next
“The adjusted OR increased, showing confounding was controlled.”Noncollapsibility may explain part of the shift.What happens on the risk scale, and what balance or design evidence supports actual confounding control?
“Patients were 2.3 times more likely to develop the outcome.”Odds are being translated into risk language too aggressively.What was the event rate, and what are the standardized risks by group?
“We used logistic regression and therefore adjusted for confounding.”Modeling and confounding control are not synonyms.Why this covariate set, what was the target estimand, and how was positivity or overlap assessed?
“Odds ratios were chosen because they are standard.”Convenience may be outrunning interpretability.Standard for which purpose: estimation, software defaults, or clinical communication?

What to Do Instead When Interpretation Matters

Show standardized risks

If you can estimate conditional probabilities, you can usually average them back to the population and report clinically readable risks.

Add absolute risk differences

A treatment effect framed as six extra events per 100 patients often teaches more than a polished adjusted OR ever will.

Use risk ratios when appropriate

Log-binomial, modified Poisson, or marginal standardization may fit the communication goal better, provided the estimand is clear.

State the scale explicitly

If you still report an odds ratio, say why that scale is scientifically relevant rather than just statistically convenient.

Where Aqrab Fits

Noncollapsibility is exactly the kind of methods detail that slips through when a manuscript looks statistically sophisticated but communicates causally sloppy conclusions. Aqrab is useful here because it can pressure-test whether the reported effect measure matches the question, whether baseline risk makes the language misleading, and whether the analysis scale silently changed the story.

If you want to sanity-check a draft before a reviewer does it for sport, try Aqrab. If you want methods critique embedded in a workflow rather than pasted in at midnight, the developer tools are the cleaner path.

The Practical Bottom Line

Odds ratios are not wrong. They are just easier to overread than many people admit.

When the adjusted OR differs from the crude OR, do not automatically applaud the regression for fixing confounding. First ask whether the design changed bias, whether the scale changed interpretation, and whether the outcome was common enough that odds became a theatrical substitute for risk.

In other words: if the number changed, the science may have improved — but the metric may also just be doing its weird little odds-ratio thing.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive