Clinical TrialsReporting BiasMethods Critique

Outcome Switching: When the Primary Endpoint Moves After the Results Get Interesting

May 20, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Many papers say the primary endpoint was “refined,” “updated,” or “re-prioritized.” Sometimes that is a legitimate response to feasibility problems or new biological understanding. Quite often it is the manuscript equivalent of moving the goalposts after the ball landed somewhere else.

Outcome switching happens when the reported primary outcome, analysis window, or favored result differs from what was prespecified, or when the prespecification is so vague that several outcomes could later be presented as the obvious main endpoint. The damage is simple: readers think they are seeing a clean confirmatory test when they are really seeing a curated subset of the data.

The Core Mistake

A p-value only has its advertised meaning if the question was defined before the answer was visible. Once authors can choose among several endpoints, time horizons, populations, or model specifications, the nominal type I error stops being honest.

Decision rule:

If the paper cannot show what the primary outcome was, when it was locked, and whether that choice changed after data accrual, treat “statistically significant” as a descriptive event rather than strong confirmatory evidence.

This is not a paperwork obsession. It is how you tell planned inference apart from retrospective storytelling with a confidence interval attached.

What Counts as Outcome Switching?

Move	How it shows up	Why it matters
Primary becomes secondary	The registered main endpoint fails, while a cleaner secondary endpoint takes center stage.	Readers may mistake an exploratory rescue for the original confirmatory target.
Time window drift	The paper emphasizes 30-day, 90-day, or 1-year results depending on where the curve looked friendliest.	A “primary” result can be manufactured by choosing the prettiest calendar slice.
Composite reshuffling	A composite endpoint is rewritten, or one component quietly becomes the headline.	The clinical question changes after the data already answered a different one.
Analysis-population switching	The main story moves between intention-to-treat, modified intention-to-treat, complete-case, or per-protocol analyses.	This can rescue significance by changing who counts or how nonadherence is handled.

A Familiar Trial Story

Consider a pragmatic cardiovascular trial that prespecifies 1-year hospitalization as the primary endpoint. The 1-year analysis is null. By manuscript time, the abstract highlights a 90-day composite of hospitalization or urgent visit, reported in a modified intention-to-treat population. Each move is explainable in isolation. Together they create a new study.

What the reader sees

A concise abstract with one positive headline endpoint and language that sounds confirmatory.

What may have happened

Several outcomes, windows, and populations were available, and the cleanest result won editorial promotion.

Why reviewers should care

The inferential target has shifted, so the reported p-value no longer means what readers assume it means.

Interactive reviewer check

How many chances did the paper have to find one “positive” result?

This explorer assumes there is no real treatment effect anywhere. It estimates how often a paper can still produce at least one nominally significant finding once authors have several outcomes, analysis populations, and timepoints available to shop from.

Chance at least one false-positive result appears70.8%Total analytic looks: 24

Candidate primary or co-primary outcomes: 4

Analysis populations or models tried: 3

Timepoints or windows examined: 2

Nominal alpha level: 0.05

Analytic degree of freedom	Count	Why it creates risk
Candidate outcomes	4	Each extra endpoint gives the manuscript another place to hunt for a clean p-value.
Analysis populations or models	3	Modified intention-to-treat, per-protocol, adjusted, unadjusted, complete-case, imputed: the menu matters.
Timepoints or windows	2	Picking the prettiest follow-up window after seeing the data is still a form of switching.
Total analytic looks	24	This is the quiet number behind the phrase “the primary endpoint was refined during analysis.”

Quick read

A nominally positive endpoint here needs protocol discipline, not applause.

Decision rule: if the paper does not clearly state what was primary, when it was locked, and how many alternative analyses were considered, read the p-value as descriptive rather than decisive.

•Switching can happen across outcomes, time horizons, covariate sets, or analysis populations.
•Registration helps, but only if the registered endpoint is specific enough to constrain choices.
•A statistically significant rescue endpoint does not erase a failed prespecified primary outcome.

Failure Modes That Should Slow You Down

Red flag	Why it is weak	What to ask for instead
Registry entry is vague	“Clinical improvement” or similar broad wording leaves too much room to choose a favorable outcome later.	A specific endpoint definition with timing, scale, and analysis rule.
Protocol amendment exists but timing is unclear	A justified amendment before unblinding is very different from a rescue edit after the results looked flat.	Date-stamped amendment history and a statement about data visibility at the time of change.
Main text celebrates a secondary endpoint after a failed primary	Secondary endpoints often deserve reporting, but they do not inherit confirmatory status by enthusiasm.	Explicit separation of confirmatory and exploratory claims, with multiplicity context.
AI or digital-health paper cycles through many proxy outcomes	Usage, engagement, risk score change, and utilization can become a buffet of interchangeable success metrics.	One clearly prespecified patient-relevant endpoint plus transparent reporting of the rest.

When Endpoint Changes Are Legitimate

Not every change is misconduct and not every amendment is a scandal. Trials sometimes discover that an endpoint is impossible to measure reliably, event rates are far lower than expected, or a newer consensus definition supersedes the original one. The methodologically important question is not whether the endpoint changed. It is whether the change was transparent, justified, and insulated from knowledge of the outcome data.

1. Show the old and new endpoint side by side

Readers should not need archeology across registry archives to learn what changed.

2. Give the timing, not just the rationale

“Changed due to feasibility” is incomplete without telling readers whether outcomes were already visible.

3. Keep confirmatory language on a short leash

If the final headline comes from a switched endpoint, the discussion should sound exploratory, not triumphant.

Reviewer Checklist

Compare the manuscript with the registry or protocol before trusting any “primary” label.
Check whether endpoint definitions include timing, scale, analysis population, and censoring rules.
If the registered primary failed, do not let a polished secondary endpoint impersonate the main test.
If multiple windows or models are reported, ask which one was prespecified and why the others were run.
Read upbeat abstracts with extra suspicion when the methods section sounds like a menu.

Why This Matters for Aqrab

Aqrab is useful precisely where endpoint drift becomes hard to audit at reading speed. It helps researchers pressure-check whether the claimed primary outcome is clinically coherent, whether the analysis reads as confirmatory or opportunistic, and which reviewer objections are already visible in the manuscript.

If you want a faster critique of a trial, protocol, or AI-health manuscript before review or submission, start with Aqrab Try. If you want to see how the methodology stack is built, visit /developers.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive

Related guides

Trial Design

Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone

A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.

2026-06-19 · 16 min read

Biomarkers

Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit

A practical guide to surrogate endpoints for clinical researchers. Covers validated versus merely plausible surrogates, classic failure modes, and what reviewers should demand before trusting a biomarker-driven trial claim.

2026-06-17 · 16 min read

Missing Data

Jump-to-Reference Imputation: When Missing Outcomes Start Borrowing the Control Arm's Future

A practical guide to jump-to-reference imputation for clinical researchers. Covers what J2R assumes after treatment discontinuation, when it helps sensitivity analysis, and when it quietly answers the wrong estimand.

2026-06-12 · 15 min read

Previous guide

← Overdiagnosis: When Finding More Disease Does Not Mean Saving More Lives

Next guide

Surveillance Bias: When One Group Gets More Chances to Become a Case →