Clinical TrialsBiomarkersMethods Critique

Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit

June 17, 2026·16 min read

Anas H. Alzahrani, MD PhD MPH

Department of Preventive Medicine and Public Health

Faculty of Medicine, King Abdulaziz University

Surrogate endpoints are seductive because they let trials read out earlier, with fewer patients, and with cleaner-looking curves than hard outcomes often permit. A lab value improves. A scan shrinks. A biomarker normalizes. Everyone starts speaking as if patients have already lived longer or felt better.

That leap is exactly where methodological trouble begins. A surrogate endpoint is a substitute for direct clinical benefit, not the benefit itself. The right question is not whether the marker moved. It is whether changing that marker in this disease, with this mechanism, in this population, has earned the right to stand in for outcomes patients actually care about.

The Core Decision Rule

Do not let a manuscript translate surrogate success into clinical triumph unless it shows why that surrogate is trustworthy in the exact context being studied, and unless the discussion stays honest about what remains unproven.

Decision rule:

A biomarker result can support an argument. It should not impersonate how patients feel, function, or survive unless the surrogate relationship is already well validated in that setting.

What Makes a Surrogate Endpoint Useful

FDA materials distinguish between markers that are merely candidate, those that are reasonably likely to predict benefit, and those that are validated. That ladder matters because people often talk about “surrogates” as if the word itself were a stamp of maturity. It is not.

Candidate surrogate

Biologically interesting, but still mostly a hypothesis about what clinical benefit might follow.

Reasonably likely surrogate

Useful for earlier decisions in some regulatory settings, but still requires confirmatory outcome evidence.

Validated surrogate

Backed by strong context-specific evidence that treatment effects on the marker predict treatment effects on the outcome of patient interest.

Why Surrogates Fail Even When the Biology Sounds Elegant

The surrogate captures only one pathway

The treatment may improve the marker while affecting other pathways that matter just as much for symptoms, disability, or survival.

Off-target harms erase the gain

A prettier biomarker can coexist with toxicity, arrhythmia, bleeding, infection, or other damage the surrogate never sees.

Validation is borrowed from the wrong setting

A marker that works for one disease stage, drug class, or population may be unreliable in another.

The trial stops before reality catches up

Early readouts often exaggerate confidence precisely because the patient outcomes are not mature yet.

Three Clinical Anchors Worth Remembering

Anchor 1

Suppressing ventricular ectopy did not guarantee safer hearts

The classic cautionary tale is the Cardiac Arrhythmia Suppression Trial. Improving a rhythm marker that looked ominous did not translate into better patient outcomes, and in that case the clinical outcome direction was worse, not better. The lesson is not “never use surrogates.” The lesson is that physiologic plausibility is not the same thing as validated patient benefit.

Anchor 2

Tumor response can be meaningful without being the whole answer

In oncology, radiographic response or progression-based endpoints may be very useful, especially when waiting for overall survival would take years or be confounded by later therapies. But a response curve is still not a synonym for living longer or living better. The claim should match the endpoint.

Anchor 3

Familiar biomarkers can still be over-translated

Blood pressure, LDL cholesterol, viral load, or glycemic markers can be highly informative in the right contexts. The trap is assuming that because a biomarker is familiar, every intervention that changes it deserves a broad patient-benefit narrative without checking mechanism, safety, and follow-up.

Where Reviewers Get Fooled

What the paper says	Why it sounds convincing	What is still missing
The biomarker improved significantly.	A clean p-value feels like therapeutic proof.	Statistical movement in a marker is not validation that patient outcomes improved.
This endpoint is accepted by regulators.	Regulatory use is mistaken for universal evidentiary maturity.	Acceptance is context-specific and can still require later confirmation.
Overall survival is immature, but the surrogate is strongly positive.	The discussion sounds forward-looking and practical.	Immature patient outcomes are exactly when overclaiming risk is highest.

Interactive surrogate stress test

Ask whether the biomarker is carrying more certainty than it earned

This does not estimate real treatment benefit. It is a reviewer-facing teaching tool that shows why surrogate claims depend on validation, mechanistic fit, safety, and actual outcome follow-up together.

Current verdictDo not sell this as patient benefitTreat the surrogate result as an early signal or mechanistic clue. It is not enough to support confident claims about how patients feel, function, or survive.

Validation status

Mechanistic link

Off-target harm risk

Clinical outcome follow-up

Magnitude of surrogate effect

How to read this

4 / 10

Higher scores mean the surrogate story is more coherent. They do not mean patient benefit has been proven. The main use is to stop manuscripts from jumping directly from biomarker movement to clinical certainty.

Reviewer rule of thumb

If the paper has weak validation, unresolved harm, or no mature patient outcomes, the discussion should sound provisional no matter how pretty the surrogate curve looks.

Dimension	Why it matters
Validation status	The marker is biologically interesting, but there is no strong proof that changing it predicts patient benefit in this context.
Mechanistic link	A mechanistic story helps, but mechanism alone does not validate a surrogate.
Off-target harm	You cannot let surrogate gains stand in for patient outcomes if safety tradeoffs remain unresolved.
Clinical follow-up	The surrogate may be useful, but the confidence claim should stay narrower than the press release wants.
Effect size	Magnitude can strengthen the signal, but only after the validity question is answered.

Red-Flag Checklist for a Biomarker-Driven Claim

The manuscript never states whether the surrogate is candidate, reasonably likely, or well validated.
The discussion uses patient-benefit language that outruns the actual endpoint.
Safety tradeoffs are summarized shallowly while efficacy prose is expansive.
Validation evidence is borrowed from another disease stage, mechanism, or treatment class.
Clinical-outcome follow-up exists but is pushed into supplement language because it is less flattering.

When Surrogate Endpoints Are Worth Defending

Surrogates are not methodological fraud by default. They are often necessary. Some diseases progress too slowly, some outcomes are too rare, and some settings genuinely need earlier evidence. The point is that using a surrogate responsibly requires humility about what has and has not been shown.

Defensible use

The marker has strong context-specific validation, harms are transparently reported, and the paper clearly limits the claim to what the endpoint can support.

Weak use

The marker is treated as a shortcut around clinical uncertainty instead of a provisional signal that still needs outcome confirmation.

Why This Matters for Study-Design Review

Surrogate endpoints are one of the cleanest examples of why study-design rigor is not the same as statistical polish. A trial can be randomized, blinded, and beautifully analyzed, then still tell a clinically inflated story if the endpoint itself is over-trusted.

That is exactly the kind of manuscript weakness Aqrab is meant to pressure-test. If your team wants a fast critique of whether an endpoint strategy, causal claim, and discussion section still agree, start with Aqrab's trial review workflow. The useful output is not generic skepticism. It is knowing where the claim outruns the design.

Keep reading

Don't stop at one method.

Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.

Browse full archive

Related guides

Trial Design

Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone

A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.

2026-06-19 · 16 min read

Missing Data

Jump-to-Reference Imputation: When Missing Outcomes Start Borrowing the Control Arm's Future

A practical guide to jump-to-reference imputation for clinical researchers. Covers what J2R assumes after treatment discontinuation, when it helps sensitivity analysis, and when it quietly answers the wrong estimand.

2026-06-12 · 15 min read

Trial Design

Multiple Testing in Clinical Trials: When One Positive Endpoint Is Just the Loudest Coin Flip

A practical guide to multiple testing in clinical trials for clinical researchers. Covers endpoint families, subgroup fishing, interim looks, alpha control, and what reviewers should demand before trusting a lone positive result.

2026-06-11 · 16 min read

Previous guide

← Data Leakage in Clinical Prediction Models: When the Model Learns the Future

Next guide

Treatment-Induced Mediator-Outcome Confounding: When Mediation Analysis Starts Chasing the Consequences of Treatment →