← Back to Blog
Causal InferenceG-ComputationStandardization

G-Computation: Predict the Outcome Under Each Treatment Strategy

May 2, 2026·15 min read·By Coefficients Health Analytics

G-computation is one of the cleanest ideas in causal inference: model the outcome, set treatment to the strategy you care about, predict what would happen for every patient, then average. That is it. No matching pairs, no pseudo-populations, no mystical jargon.

The catch is that this apparent simplicity hides real responsibility. If the outcome model is wrong, the causal estimate can be wrong with great confidence.G-computation rewards analysts who think clearly about the estimand, the intervention, and the shape of the data-generating process.

Why This Method Matters

A lot of causal teaching starts with propensity scores, weights, or instrumental variables. Those matter. But g-computation is often the best bridge between ordinary regression and counterfactual reasoning.

Core intuition:

Instead of asking whether the treated and untreated groups can be balanced, g-computation asks: for this same population, what outcome would we predict if everyone received strategy A versus strategy B?

Once you see regression this way, you stop treating adjustment as a box-checking exercise and start treating it as counterfactual prediction followed by standardization.

The Setup in One Diagram

Suppose L is a set of baseline confounders, A is treatment, and Y is the outcome.

L ─────▶ A ─────▶ Y └───────────────▶ Y

If exchangeability, positivity, and consistency hold, then you can model E[Y | A, L], predict outcomes under each intervention, and average those predictions over the observed covariate distribution.

What G-Computation Actually Does

  1. Define the intervention clearly.
  2. Fit an outcome model conditional on treatment and confounders.
  3. Predict each person's outcome under treatment.
  4. Predict each person's outcome under no treatment.
  5. Average the predicted outcomes in each world.
  6. Compare those two averages using the effect measure you care about.

That final averaging step matters. Regression coefficients are usually conditional objects. G-computation turns them into a marginal causal contrast for the target population.

What It Is Not

Not just “adjusted regression”

You only get a causal estimate when the model is tied to an explicit intervention and used to generate counterfactual predictions.

Not robust to lazy modeling

If important nonlinearities or interactions are missing, the estimate can fail quietly.

Not limited to binary treatment

The same logic works for multiple treatment levels, continuous doses, and more complex treatment rules.

Not the same as propensity scoring

Propensity methods model treatment assignment. G-computation models the outcome. They fail in different ways.

The Estimand Has to Come First

Before fitting anything, decide what contrast you want. If you cannot state the causal question in plain language, the analysis is not ready.

EstimandQuestion
ATEWhat if everyone were treated versus no one treated?
Risk differenceHow many percentage points would the risk change?
Risk ratioHow many times higher or lower is the risk under one strategy?
Mean differenceHow much would the average continuous outcome change?

A Simple Clinical Example

Imagine a baseline cohort study asking whether starting a statin lowers one-year cardiovascular risk. You measure age, diabetes status, baseline LDL, smoking, kidney function, and prior cardiovascular disease.

You fit a logistic outcome model for one-year events, then run two prediction passes:

  • World 1: set everyone to statin treatment and predict one-year risk.
  • World 2: set everyone to no statin treatment and predict one-year risk.

If the average predicted risk is 8% in the treated world and 11% in the untreated world, then the estimated risk difference is -3 percentage points. That is the causal logic: same covariate distribution, different intervention worlds.

Why Averaging Over the Population Is the Whole Point

Many people stop at the treatment coefficient and call it a day. That misses the main idea. G-computation is not about the coefficient. It is about the predicted marginal outcome under an intervention.

Fit the model. Intervene on treatment. Predict each patient's outcome. Average. Compare worlds.

That is why g-computation is sometimes called a plug-in estimator: you plug intervention values into the fitted model and then standardize over the covariate distribution you care about.

The Assumptions You Still Have to Defend

1. No unmeasured confounding

If important causes of treatment and outcome are missing, g-computation inherits that bias just like any other backdoor adjustment method.

2. Positivity

If nobody with a given covariate pattern ever receives one of the treatment options, your prediction becomes unsupported extrapolation.

3. Consistency

The intervention has to be well defined. “Treatment” is not enough if dose, timing, formulation, or adherence vary in ways that matter.

4. Correct outcome model specification

This is the method's pressure point. If the outcome surface is nonlinear, interactive, or threshold-driven, a simplistic model can distort the whole estimate.

When G-Computation Is Especially Attractive

Absolute risk is clinically important

Outcome modeling naturally produces risks, risk differences, and other measures clinicians can interpret directly.

The exposure model is unstable

If propensity scores are near 0 or 1, weighting can become noisy and fragile. A good outcome model may behave more calmly.

You want transparent counterfactual predictions

“We predicted each patient twice, then averaged” is often easier to teach than pseudo-population logic.

You are building toward the g-formula

Single-timepoint g-computation is the conceptual gateway to longitudinal g-methods where covariates and treatment evolve over time.

Where It Breaks

1. Model misspecification

The estimate can look polished and reproducible while being wrong because the modeled outcome surface is too rigid.

2. Hidden positivity problems

Weighting methods often scream when overlap is weak. G-computation can whisper by extrapolating silently.

3. Post-treatment adjustment

If you slip mediators or treatment-affected variables into a baseline analysis, you can block part of the effect or induce bias.

4. Overconfident interpretation

A tidy regression output is not proof that the design, covariate support, or intervention definition was credible.

G-Computation vs IPW vs Doubly Robust Estimators

MethodMain strengthMain failure mode
G-computationEfficient and intuitive when the outcome model is good.Sensitive to outcome-model misspecification.
IPWSeparates design from outcome modeling conceptually.Extreme weights and weak overlap can wreck stability.
Doubly robust estimatorsOffer protection if either the treatment or outcome model is correct.Still demand careful nuisance modeling and good support.

My bias is simple: if you can build a genuinely credible outcome model, g-computation is elegant. If you are worried that either model might be wrong, doubly robust methods are usually the safer practical default.

How to Report It Without Hand-Waving

  • State the intervention and estimand in plain language.
  • Show the causal structure or justify the confounder set explicitly.
  • Describe the exact outcome model, including interactions and nonlinear terms.
  • Explain how counterfactual predictions were generated under each strategy.
  • Report how predictions were averaged and how uncertainty was estimated.
  • Discuss overlap and where predictions relied on extrapolation.
  • Show sensitivity to alternate model specifications when possible.

Reviewer Red Flags

  • The paper reports an adjusted coefficient and casually calls it g-computation.
  • No estimand is named, so nobody knows what causal contrast was actually estimated.
  • No positivity discussion appears, even though treatment groups look clinically disjoint.
  • The outcome model is rigid despite obvious nonlinearity or effect modification.
  • Post-treatment variables creep into the adjustment set without a clear causal justification.

The Practical Bottom Line

G-computation deserves more respect because it forces an honest causal workflow: define the intervention, model the outcome, generate counterfactual predictions, and average over the target population.

But it is not a free lunch. If confounding is unmeasured, overlap is weak, or the model is naive, the resulting estimate can be precise-looking nonsense. Used well, though, g-computation is one of the clearest ways to turn a clinical causal question into an interpretable causal answer.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive