Surrogate Endpoints: When a Biomarker Improvement Pretends to Be Patient Benefit
Anas H. Alzahrani, MD PhD MPH
Department of Preventive Medicine and Public Health
Faculty of Medicine, King Abdulaziz University
Surrogate endpoints are seductive because they let trials read out earlier, with fewer patients, and with cleaner-looking curves than hard outcomes often permit. A lab value improves. A scan shrinks. A biomarker normalizes. Everyone starts speaking as if patients have already lived longer or felt better.
That leap is exactly where methodological trouble begins. A surrogate endpoint is a substitute for direct clinical benefit, not the benefit itself. The right question is not whether the marker moved. It is whether changing that marker in this disease, with this mechanism, in this population, has earned the right to stand in for outcomes patients actually care about.
The Core Decision Rule
Do not let a manuscript translate surrogate success into clinical triumph unless it shows why that surrogate is trustworthy in the exact context being studied, and unless the discussion stays honest about what remains unproven.
Decision rule:
A biomarker result can support an argument. It should not impersonate how patients feel, function, or survive unless the surrogate relationship is already well validated in that setting.
What Makes a Surrogate Endpoint Useful
FDA materials distinguish between markers that are merely candidate, those that are reasonably likely to predict benefit, and those that are validated. That ladder matters because people often talk about “surrogates” as if the word itself were a stamp of maturity. It is not.
Candidate surrogate
Biologically interesting, but still mostly a hypothesis about what clinical benefit might follow.
Reasonably likely surrogate
Useful for earlier decisions in some regulatory settings, but still requires confirmatory outcome evidence.
Validated surrogate
Backed by strong context-specific evidence that treatment effects on the marker predict treatment effects on the outcome of patient interest.
Why Surrogates Fail Even When the Biology Sounds Elegant
The surrogate captures only one pathway
The treatment may improve the marker while affecting other pathways that matter just as much for symptoms, disability, or survival.
Off-target harms erase the gain
A prettier biomarker can coexist with toxicity, arrhythmia, bleeding, infection, or other damage the surrogate never sees.
Validation is borrowed from the wrong setting
A marker that works for one disease stage, drug class, or population may be unreliable in another.
The trial stops before reality catches up
Early readouts often exaggerate confidence precisely because the patient outcomes are not mature yet.
Three Clinical Anchors Worth Remembering
Anchor 1
Suppressing ventricular ectopy did not guarantee safer hearts
The classic cautionary tale is the Cardiac Arrhythmia Suppression Trial. Improving a rhythm marker that looked ominous did not translate into better patient outcomes, and in that case the clinical outcome direction was worse, not better. The lesson is not “never use surrogates.” The lesson is that physiologic plausibility is not the same thing as validated patient benefit.
Anchor 2
Tumor response can be meaningful without being the whole answer
In oncology, radiographic response or progression-based endpoints may be very useful, especially when waiting for overall survival would take years or be confounded by later therapies. But a response curve is still not a synonym for living longer or living better. The claim should match the endpoint.
Anchor 3
Familiar biomarkers can still be over-translated
Blood pressure, LDL cholesterol, viral load, or glycemic markers can be highly informative in the right contexts. The trap is assuming that because a biomarker is familiar, every intervention that changes it deserves a broad patient-benefit narrative without checking mechanism, safety, and follow-up.
Where Reviewers Get Fooled
| What the paper says | Why it sounds convincing | What is still missing |
|---|---|---|
| The biomarker improved significantly. | A clean p-value feels like therapeutic proof. | Statistical movement in a marker is not validation that patient outcomes improved. |
| This endpoint is accepted by regulators. | Regulatory use is mistaken for universal evidentiary maturity. | Acceptance is context-specific and can still require later confirmation. |
| Overall survival is immature, but the surrogate is strongly positive. | The discussion sounds forward-looking and practical. | Immature patient outcomes are exactly when overclaiming risk is highest. |
Interactive surrogate stress test
Ask whether the biomarker is carrying more certainty than it earned
This does not estimate real treatment benefit. It is a reviewer-facing teaching tool that shows why surrogate claims depend on validation, mechanistic fit, safety, and actual outcome follow-up together.
How to read this
4 / 10
Higher scores mean the surrogate story is more coherent. They do not mean patient benefit has been proven. The main use is to stop manuscripts from jumping directly from biomarker movement to clinical certainty.
Reviewer rule of thumb
If the paper has weak validation, unresolved harm, or no mature patient outcomes, the discussion should sound provisional no matter how pretty the surrogate curve looks.
| Dimension | Why it matters |
|---|---|
| Validation status | The marker is biologically interesting, but there is no strong proof that changing it predicts patient benefit in this context. |
| Mechanistic link | A mechanistic story helps, but mechanism alone does not validate a surrogate. |
| Off-target harm | You cannot let surrogate gains stand in for patient outcomes if safety tradeoffs remain unresolved. |
| Clinical follow-up | The surrogate may be useful, but the confidence claim should stay narrower than the press release wants. |
| Effect size | Magnitude can strengthen the signal, but only after the validity question is answered. |
Red-Flag Checklist for a Biomarker-Driven Claim
- The manuscript never states whether the surrogate is candidate, reasonably likely, or well validated.
- The discussion uses patient-benefit language that outruns the actual endpoint.
- Safety tradeoffs are summarized shallowly while efficacy prose is expansive.
- Validation evidence is borrowed from another disease stage, mechanism, or treatment class.
- Clinical-outcome follow-up exists but is pushed into supplement language because it is less flattering.
When Surrogate Endpoints Are Worth Defending
Surrogates are not methodological fraud by default. They are often necessary. Some diseases progress too slowly, some outcomes are too rare, and some settings genuinely need earlier evidence. The point is that using a surrogate responsibly requires humility about what has and has not been shown.
Defensible use
The marker has strong context-specific validation, harms are transparently reported, and the paper clearly limits the claim to what the endpoint can support.
Weak use
The marker is treated as a shortcut around clinical uncertainty instead of a provisional signal that still needs outcome confirmation.
Why This Matters for Study-Design Review
Surrogate endpoints are one of the cleanest examples of why study-design rigor is not the same as statistical polish. A trial can be randomized, blinded, and beautifully analyzed, then still tell a clinically inflated story if the endpoint itself is over-trusted.
That is exactly the kind of manuscript weakness Aqrab is meant to pressure-test. If your team wants a fast critique of whether an endpoint strategy, causal claim, and discussion section still agree, start with Aqrab's trial review workflow. The useful output is not generic skepticism. It is knowing where the claim outruns the design.
Keep reading
Don't stop at one method.
Good methods judgment comes from contrast. Read the neighboring guides, see where the assumptions diverge, and avoid treating every observational problem like it needs the same hammer.
Adaptive Enrichment Trials: When Precision for One Subgroup Pretends to Be Evidence for Everyone
A practical guide to adaptive enrichment trials for clinical researchers. Covers predictive versus prognostic enrichment, assay timing, multiplicity, external validity, and what reviewers should demand before trusting a biomarker-selected win.
Jump-to-Reference Imputation: When Missing Outcomes Start Borrowing the Control Arm's Future
A practical guide to jump-to-reference imputation for clinical researchers. Covers what J2R assumes after treatment discontinuation, when it helps sensitivity analysis, and when it quietly answers the wrong estimand.
Multiple Testing in Clinical Trials: When One Positive Endpoint Is Just the Loudest Coin Flip
A practical guide to multiple testing in clinical trials for clinical researchers. Covers endpoint families, subgroup fishing, interim looks, alpha control, and what reviewers should demand before trusting a lone positive result.