Multiple Imputation: Missing Data Does Not Become Innocent Because MICE Ran
Missing data has a talent for being treated like housekeeping. A few rows disappear, someone clicks a multiple imputation package, and the paper resumes pretending the problem is solved. It usually is not.
My view is simple: multiple imputation is useful, but it is not a moral cleansing ritual. If the missingness mechanism is ugly, the imputation model is thin, or the variable itself was badly measured, the fact that MICE converged is not a scientific achievement.
What Multiple Imputation Is Actually Doing
Multiple imputation replaces each missing value with several plausible draws from a model, analyzes each completed dataset, and then combines the results to reflect uncertainty from missingness.
Core idea:
Imputation does not recover the truth. It creates plausible versions of the unobserved data under modeling assumptions. If those assumptions are weak, the finished analysis can look polished while remaining badly biased.
That is why the real question is not “did you impute?” It is “what story about the missing data are you asking me to believe?”
The First Mistake: Treating Missingness as a Software Setting
Missing data begins in study design and data generation, not in R or Python. Patients miss visits. Labs are ordered for sicker people. Sensitive behaviors are underreported. Follow-up drops when treatment becomes intolerable. Those are clinical and operational processes, and they often carry prognosis with them.
If missingness depends on severity, adherence, access, clinician judgment, or the future outcome path, then complete-case analysis is rarely neutral, and imputation is only as credible as the causal understanding behind it.
MCAR, MAR, MNAR — Useful Labels, Often Used Lazily
| Mechanism | What it means | What usually goes wrong |
|---|---|---|
| MCAR | Missingness unrelated to observed or unobserved data. | People assume this because it is convenient. Real clinical data rarely deserves that optimism. |
| MAR | Missingness may depend on observed variables included in the model. | Researchers say “assumed MAR” without showing the observed predictors that would make it plausible. |
| MNAR | Missingness depends on unobserved values or other unmeasured factors. | People ignore it because it is inconvenient, then write a confident discussion section anyway. |
MAR is not a free pass. It becomes more plausible only when your imputation model includes the variables that actually predict both missingness and the missing values themselves.
Why Complete-Case Analysis Fails So Often
Deleting everyone with missing covariates sounds clean. It is usually causal vandalism.
- You lose precision by throwing away observed information.
- You change the study population, often toward healthier, more organized, or more intensively monitored patients.
- You can induce selection bias when inclusion now depends on post-baseline behavior or prognosis.
Complete-case analysis only behaves nicely under fairly strict conditions. In practice, it often answers a different question in a different population and acts offended when you notice.
The Clinical Intuition
Suppose you are estimating the effect of a diabetes drug on hospitalization and baseline HbA1c is missing more often in patients with fragmented care. Those same patients may have worse adherence, worse control, and higher event risk.
If you drop them, you have not merely simplified the data. You have selected a tidier, better-observed subgroup. If you impute HbA1c without including care fragmentation, prior utilization, treatment history, and related severity markers, your shiny imputed values may be too polite for the patients they are supposed to represent.
What a Good Imputation Model Includes
Predictors of missingness
Visit frequency, site, care intensity, prior measurements, baseline severity, and anything else that helps explain why the value is absent.
Predictors of the variable itself
Correlated labs, demographics, diagnoses, treatment history, and outcomes that carry real signal about the missing value.
The exposure and outcome
If the missing covariate relates to treatment or outcome, leave those out and you weaken the very structure needed to make MAR believable.
Proper functional form
Respect skew, boundaries, interactions, and nonlinearity. Imputing a biomarker as if it were gentle Gaussian wallpaper is how weird datasets become fake-normal.
Multiple Imputation Is Not a License to Ignore the Estimand
The imputation model should match the analysis question. If you are estimating a causal effect, you need the imputation strategy to preserve the relationships among treatment, confounders, and outcomes that matter for identification.
A bad habit is to hand the missing data problem to a generic analyst, run default MICE on a convenience variable set, and then do a sophisticated causal analysis on top. That is the statistical equivalent of ironing a wrinkled shirt while the building is on fire.
Common Failure Modes
1. Imputing without understanding why data are missing
If the missingness mechanism is unexamined, the model is guessing in the dark.
2. Leaving key predictors out of the imputation model
This is how “assumed MAR” quietly becomes “conveniently underinformed MAR.”
3. Imputing post-treatment variables carelessly
If the timing is wrong, you can blur treatment effects, induce bias, or create impossible patient trajectories.
4. Reporting only the fact that MICE was used
“We used multiple imputation” is not methods transparency. It is a shrug in formalwear.
What Reviewers Should Ask For
- How much data were missing for each important variable?
- What process likely caused the missingness?
- Which variables were included in the imputation model, and why?
- Were exposure, outcome, and strong predictors of both missingness and value included?
- Did the authors compare complete-case and imputed analyses?
- Was any sensitivity analysis done for departures from MAR?
Good Practice Looks Boring in the Best Way
Good multiple imputation work is not flashy. It usually involves a thoughtful missingness table, sensible variable selection, diagnostics that compare observed and imputed distributions, and sensitivity analyses that admit the data-generation process may be less than charitable.
It also involves saying when imputation cannot save the design. If a critical confounder is mostly absent, measured badly when present, and missingness tracks severity in ways your data barely capture, the honest conclusion may still be “residual bias is likely.” Not glamorous. Very useful.
The Practical Bottom Line
Multiple imputation is often better than complete-case analysis, and sometimes much better. But it is not magic, and it definitely is not innocence by algorithm. Missing data remains a study-design problem, a measurement problem, and sometimes a causal-identification problem with a few imputation equations draped over it.
Use multiple imputation when it is justified. Build the model with real domain knowledge. Report it like you respect your reader. And if the assumptions are shaky, say so plainly. The point is not to make the dataset look complete. The point is to make the science less wrong.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Prevalent-User Bias: When Your Drug Study Starts After the Interesting Harm Already Happened
A practical guide to prevalent-user bias for clinical researchers. Covers depletion of susceptibles, survivor selection, post-treatment baseline covariates, and what reviewers should demand before trusting late-entry treatment cohorts.
Clone-Censor-Weight: The Target Trial Fix That Still Breaks When You Use It Casually
A practical guide to clone-censor-weight for clinical researchers. Covers when the design is needed, how cloning and artificial censoring work, where immortal time bias reappears, and what reviewers should demand before trusting a target trial emulation.
Prediction vs Causation: Why Your Best Risk Model Still Cannot Tell You What to Treat
A practical guide for clinical researchers on the difference between prediction and causation. Covers why strong risk models do not identify treatment effects, how to frame the right estimand, and what reviewers should flag in AI-driven clinical studies.