The Confidence Gap: Why Drug Discovery’s Data Explosion Hasn’t Solved Its Billion-Dollar Decision Problem

Laurence Arnold, PhD
Head of R&D
Pelago Bioscience

We’ve never had more data in drug discovery. Yet despite this explosion in capability, our industry’s most fundamental challenge remains stubbornly intact: making confident early decisions about which drug programs deserve billion-dollar investments, and which should be shelved.

It costs two to three billion dollars to bring a drug to market, with a 90% failure rate, often higher. These numbers mask something more troubling. We’re not just failing because biology is hard; we’re failing because the mountains of data we’re generating aren’t giving us what we actually need at decision points that matter.

In my view, we don’t have a data volume problem—we have a data relevance problem.

Biological activity is not relevance

Traditional drug discovery relies on a “dissect and build” approach: isolate one variable, measure it in a controlled environment, then extrapolate. It’s disciplined. It’s reproducible. And it has delivered important medicines.

But the persistently high failure rate in drug development tells us we’re reaching the limits of this approach. In reality, biology operates through cascading networks, feedback loops, and context-dependent equilibria. These are dynamic biological systems where cause and effect rarely follow straight lines.

We’ve successfully drugged only about 650 of 20,000 potentially druggable proteins. Not because scientists lack talent, but because for most targets, we don’t have robust ways to measure what matters—the initiating molecular event in a biologically relevant context.

We’re good at measuring activity. What we struggle with is measuring relevance.

An assay telling you a compound binds to your target protein is useful, but does it bind in living cells? In the disease context that matters? With the pharmacokinetics to reach patients? A compound brilliant in a purified enzyme assay might never reach its target in cells, or it might hit off-targets producing effects through entirely different mechanisms.

The result? Ever-expanding data sets that still don’t answer the critical question in modern drug discovery: Are we making the right decision?

The cost of borrowed confidence

There’s a human dimension here that rarely makes it into industry discussions. Despite what is often repeated in drug discovery circles, scientists in R&D are rewarded for being right, not for being bold.

Most scientists think in terms of “future hindsight”: will we look back and realize we missed something obvious? The responsibility isn’t to push programs forward at all costs. It’s to execute each step well, knowing that most will fail. Success stories often appear bold in retrospect. In practice, they are usually built on careful, incremental decisions that gradually improve the odds.

So, teams do their jobs with discipline and rigor. They hit milestones, generate data, and advance programs. Everyone knows 90% of projects will fail, but this one has shown activity in the assay, has a plausible mechanism, and has momentum. The data might not be perfect, but it’s good enough to keep going.

Until it isn’t. And the failure comes late, after years of effort and hundreds of millions spent.

Of course, failure is how science advances. But many of these failures were avoidable earlier. Hard-working teams just didn’t have data that would let them make the call with confidence when it mattered most, before massive resources were committed.

What decision-ready evidence looks like

The best experiment isn’t always the one that moves your program forward—it’s the one that tells you when to stop.

Think of it as taking a stepladder to look over a thick hedge rather than hacking through it with an axe. You might not learn everything about what’s inside it, but you’ll know much faster whether there’s anything worth pursuing on the other side.

The pharmaceutical industry has been built on a model of going through the hedge, but the resource cost and timelines are increasingly untenable. So, what would an alternative, evidence-driven discovery model look like?

Evidence-driven discovery requires a hierarchy of questions. Before optimizing potency or selectivity, can you prove that engaging this target in this context produces therapeutically relevant effects? Not in an abstract system, but in actual disease biology.

This is about front-loading proof of concept before investing in optimization. Measure the initiating molecular interaction early, free from tags or unnatural expression control, in cells and tissues that approximate disease.

It also requires new frameworks for proof of target engagement. We’re seeing this with technologies that measure binding in native cellular contexts, patient-derived models, and translational designs that test hypotheses much earlier in preclinical development. The goal isn’t replacing traditional assays, but knowing which programs deserve that investment.

Ultimately, the win comes from making the right decision at each stage, even when that decision is to stop.

The path forward

Successful programs will establish coherent lines of evidence from initial target engagement through preclinical models to human proof of concept—and they will do it fast enough to fail early when evidence doesn’t align.

This means rigorously testing hypotheses in the real biological context of disease before perfecting molecules or committing billions of dollars.

Some will argue this is unrealistic—that you need optimized compounds, and that shortcuts lead to false negatives killing promising programs. These concerns aren’t wrong; they’re just insufficient when the old model demonstrably isn’t working.

The real question is whether the risk of earlier translational testing exceeds spending nine years and a billion dollars on a target that was never going to work.

Making the call with confidence

Here’s what I tell my team: Your job isn’t to get a drug to the clinic. Your job is to do each step exceptionally well, building evidence you can defend. Because if we’re systematic about gathering the right evidence early, and if we’re honest about what the data is—and isn’t—telling us, the statistics start working in our favor.

The industry is moving toward evidence-first approaches—technologies validating targets in relevant contexts, translational frameworks testing hypotheses earlier, and computational tools trained on quality data.

But all this data is just noise until it answers the question keeping many of us up at night: Can I make this call with confidence, or am I crossing my fingers and hoping?

We won’t solve the 90% failure rate entirely. Biology is too complex. But we can close the confidence gap by using the right data, at the right time, to answer the key question: Should we keep going?

And sometimes—often, even—the most valuable answer will be no.

Laurence Arnold, PhD, is the Head of R&D at Pelago Bioscience.