Last Observation Carried Forward: When Yesterday's Outcome Pretends the Patient Stopped Changing
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
Few methods in applied clinical research have aged as strangely as last observation carried forward. It is easy to explain, easy to code, and easy to defend with the sort of confidence that should make reviewers nervous.
The basic move is simple: if a patient disappears before the endpoint visit, freeze their outcome at the last recorded value and call that the final answer. Convenient, yes. Neutral, no. LOCF smuggles in a trajectory assumption about exactly the patients whose trajectories are already most uncertain.
The Core Decision Rule
Do not ask whether LOCF keeps everyone in the analysis. Ask whether the post-dropout path it assumes is clinically believable for the patients who left.
Decision rule:
If patients could plausibly improve, worsen, relapse, or die after their last observed visit, then LOCF is not a harmless imputation shortcut. It is a structural outcome assumption.
That assumption can exaggerate benefit, dilute benefit, or reverse the sign of the effect. The direction depends on who dropped out, when they dropped out, and what would have happened next. In other words, the supposedly simple method becomes simple only by refusing to look at the hard part.
Why LOCF Keeps Surviving
It feels conservative
Analysts often imagine that freezing a dropout prevents over-optimism. Sometimes it does. Sometimes it preserves a transient response that would have vanished a week later.
It respects sample size theater
Nobody likes watching randomized patients disappear from the analysis table. LOCF offers the visual comfort of completeness without the inferential honesty of a real missing-data strategy.
It turns dynamics into bookkeeping
Longitudinal disease courses are messy. LOCF replaces that mess with one frozen number and hopes nobody asks whether symptoms, biomarkers, or survival states usually stop moving on command.
A Concrete Clinical Example
Imagine a 12-week randomized pain trial. The active arm improves quickly in the first month, but gastrointestinal adverse effects push a subset of patients out by week 6. Control patients improve more slowly yet stay on treatment more often.
What LOCF does
It takes the last pain score from each dropout, freezes it, and treats that frozen score as if the patient remained stable through week 12.
What may actually happen
Patients who leave because of toxicity may discontinue the drug, rebound symptomatically, seek rescue therapy, or have unrecorded worsening. Stability is not the default just because the dataset ran out.
Why the estimate drifts
The active arm can keep the early benefit of patients who tolerated treatment poorly, while the control arm may be penalized or flattered differently depending on its dropout pattern.
This is why LOCF is not just about missingness. It is about outcome trajectory. The method behaves as if the patient became a screenshot.
Interactive LOCF bias explorer
When dropouts keep changing after they leave, LOCF quietly edits the treatment effect
This toy model uses a continuous outcome where higher gain means more improvement by week 12. LOCF freezes each dropout at the last recorded value, even if the patient would have improved further or deteriorated after that point.
Higher and more differential dropout gives LOCF more room to invent the endpoint.
Even if both groups drop out, different rates can turn a convenience method into asymmetric bias.
Negative values mean the patient worsens after leaving. Positive values mean improvement continues offstage.
True week-12 effect
5.9 points
What the study would estimate if all patients were actually observed through the endpoint.
LOCF-imputed effect
7.9 points
The treatment contrast after every dropout is frozen in amber at the last visit.
Bias direction cue
LOCF exaggerates benefit
The sign flips depending on who drops out and what would have happened after the last observed visit.
| Quantity | Value | Why it matters |
|---|---|---|
| Treatment true mean gain | 20.1 points | Includes the post-dropout path LOCF refuses to acknowledge. |
| Treatment LOCF mean gain | 22.3 points | Every dropout is treated as if their outcome trajectory flatlined after the last visit. |
| Control true mean gain | 14.2 points | The same missing-data mechanism can bias the comparator differently. |
| Control LOCF mean gain | 14.4 points | Equal methods do not imply equal bias when dropout timing and prognosis diverge by arm. |
How to read the toy model
This is a teaching device, not a longitudinal mixed model. It strips the issue down to the uncomfortable part: the last observed value is rarely the last meaningful value.
Decision rule
If the unobserved post-dropout path matters clinically and dropout differs by arm or prognosis, LOCF is not conservative. It is just unverified.
How LOCF Usually Fails
| Failure mode | What LOCF assumes | Why that is risky |
|---|---|---|
| Adverse-event dropout after early response | Improvement would have persisted unchanged off treatment. | This can exaggerate efficacy by preserving a temporary gain that might have decayed quickly. |
| Lack-of-efficacy dropout | The patient would have stayed stuck at the same poor value. | If outcomes would have worsened further, LOCF can hide treatment failure by stopping the decline early. |
| Natural improvement continues after dropout | No additional recovery occurs once observation ends. | The method can unfairly flatten recovery and dilute a real treatment effect. |
| Differential timing of dropout across arms | A frozen week-4 score and a frozen week-10 score are equally defensible. | Earlier dropout means more unobserved time, so the amount of hidden trajectory being guessed can differ sharply by arm. |
What Reviewers Should Ask Instead of Nodding at the Imputation Footnote
Red flags
- Dropout differs meaningfully between arms.
- Patients leave for reasons related to efficacy, tolerability, or prognosis.
- The outcome is expected to keep evolving after the last observed visit.
- LOCF is described as conservative without a clinical argument for why.
- No sensitivity analysis explores alternative post-dropout trajectories.
Better questions
- What estimand is the analysis targeting after intercurrent events and discontinuation?
- Would a mixed model, multiple imputation, or explicit sensitivity analysis match that estimand better?
- How much of the endpoint depends on unobserved post-dropout behavior?
- Are reasons for missingness documented well enough to defend MAR?
- Does the conclusion survive a plausible MNAR stress test?
When a Frozen Value Might Be Defensible, and Why That Still Does Not Rescue LOCF as a Default
There are narrow cases where a carried-forward value approximates the scientific question reasonably well. A permanently irreversible outcome, a very short gap between the last visit and endpoint, or a prespecified estimand tied to treatment discontinuation can shrink the damage.
Even there, the burden is on the analyst to show why the frozen value corresponds to the estimand rather than merely to software convenience. "We used LOCF because that is what prior studies did" is not a method; it is folklore.
A Practical Replacement Hierarchy
| Situation | Usually better move | Why |
|---|---|---|
| Repeated continuous outcomes under a treatment-policy estimand | Mixed model for repeated measures | Uses observed longitudinal structure rather than pretending the trajectory stopped moving. |
| Covariate- and history-rich missingness under plausible MAR | Multiple imputation aligned to the analysis model | Makes the assumptions explicit and lets the imputation model learn from observed patterns. |
| Concern that missingness may be MNAR | Pattern-mixture or tipping-point sensitivity analysis | Forces the paper to show how much hidden deterioration or recovery would change the conclusion. |
| Outcome changes meaningfully after discontinuation | Explicit estimand choice before method choice | The real problem is often conceptual: what outcome after what intercurrent event are you trying to estimate? |
Where Aqrab Fits
LOCF tends to survive in manuscripts because it sits in the methods section wearing a vintage respectability blazer. The abstract sounds serious. The tables look complete. The missing-data assumption is hiding in one acronym.
That is the sort of quiet methodological overreach Aqrab is built to catch. If you want a fast critique of whether the estimand, dropout story, and analysis actually agree, start with Aqrab. If your methods team wants those checks embedded into a reproducible review workflow, the developer tools are the natural next stop.
The Bottom Line
LOCF does not solve missing outcomes. It replaces them with a clinical fairy tale: whatever was true at the last visit stayed true afterward. Sometimes that tale is optimistic. Sometimes punitive. It is almost never assumption-free.
When the endpoint keeps evolving after patients disappear, the last observed value is not a destination. It is just the last page you managed to read.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Jump-to-Reference Imputation: When Missing Outcomes Start Borrowing the Control Arm's Future
A practical guide to jump-to-reference imputation for clinical researchers. Covers what J2R assumes after treatment discontinuation, when it helps sensitivity analysis, and when it quietly answers the wrong estimand.
Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone
A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.
Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit
A practical guide to surrogate endpoints for clinical researchers. Covers validated versus merely plausible surrogates, classic failure modes, and what reviewers should demand before trusting a biomarker-driven trial claim.