Man vs. Machine – Which is Best at Predicting Clinical Trial Outcomes?

Artificial intelligence vs human

Artificial intelligence (AI) and machine learning (ML) are making strides in predicting clinical trial failures and successes far faster than human analysts, but are those predictions more accurate? A recent Deny-Colton Virtual Salon pitted human intelligence against AI/ML to weigh not just the outcomes, but the process.

“One of the biggest challenges in drug development is the high dropout rate, and the high costs and frustrations around it,” Friedrich von Bohlen, co-founder and CEO of Molecular Health GmbH (an AI-based clinical trials predictor), said. Currently, it takes more than a decade for drugs in Phase I development to gain regulatory approval, and only 8% ever make it to market. Even once they reach Phase II, fewer than 30% advance to Phase III clinical trials.

With such grim statistics, it’s clear the biopharmaceutical industry needs a more accurate way to predict probable successes and failures as early as possible. With keener insights into the factors that drive success and failure, they can design better trials to deliver the information they need and yield high quality evidence. The benefits include shorter development time, higher success rates and lower drug costs for patients. It also helps investors make investment decisions, von Bohlen said.

There are two accepted prediction options: AI and humans. AI can handle big data, deliver results almost immediately and minimize human biases. Humans can bring experience and make logical connections that AI/ML may not. Each are valuable prognosticators of clinical trial success. Often, the unknowns provide the greatest pitfalls.

For this Salon, Diviner (which makes human-based predictions) and Molecular Health (which makes AI-based predictions) examined 10 recent clinical trials and discussed their predictions.

In terms of comparison, AI is the fastest, offering near instantaneous results, but it also is the most negative. Although neither AI nor humans predicted that 8 of the 10 trials would meet their primary endpoints, “humans were more optimist than the AI algorithm,” Daniel Mytelka, head of methodology for Diviner, said.

The probability of success was specific to an individual trial rather than to the therapeutic, Mytelka emphasized. The AI looked at clinical records, comparator compounds with relevant mechanisms of action or disease states, related diseases, as well as the experience of the company leadership. “A Phase II program will have a couple of Phase II trials…and you need to look at all of them to determine whether to progress into Phase III.”

Matthias Kopf, Ph.D., head of product management for Molecular Health, explained the more negative AI predictions this way: “Typically, humans don’t do predictions in extremes. Also, a lot of the negative predictions are for small companies,” which the clinical trials prediction algorithm associates with clinical failures.

The rationale is that small companies seem to be inexperienced in conducting Phase III trials, and that their innovations are at the leading edge of what’s possible. There’s no established body of knowledge, therefore, for developers and regulators to reference. Consequently, the AI sees an inexperienced company trying to punch above its weight, to borrow a boxing metaphor.

Of course, young companies may have seasoned leadership to compensate. Molecular Health’s software is transparent regarding the factors considered in the AI’s determination, “so you can assess the reliability of the prediction,” Kopf said.

Typically, AI defines success as meeting the primary endpoint, but a broader view includes what the trial needs to do to deliver value to the stakeholders. AI companies include hundreds of additional factors in the analysis. “Ultimately, however, success may come down to what the market identifies as successful, as well as the sponsors and patients,” Mytelka said, recognizing that each may have different needs.

As Mark Gordon, founder and CEO of Diviner, pointed out, “Deep domain expertise alone doesn’t make you a good forecaster.” In that regard, it’s a bit of an art.

“People often focus on the topline results, but we (at Diviner) look at the full set of data from the clinical trial and the quality of the information,” Mytelka said. For example, “We did a forecast for Idera Pharmaceutical’s drug for metastatic melanoma. It’s a single arm trial that produced an extremely positive signal that excited people. We saw irregularities in how the data was put together and reported. You need to look at all the information and see that it tells you the truth.”

Meeting the primary endpoint isn’t always enough. Positive results from the joint development of zuranolone (SAGE-217/BIIB125) by Sage Therapeutics and Biogen is a case in point. Even though zuranolone showed a significant reduction in the symptoms of depression – a 1.7 improvement in the 17-item Hamilton Rating Scale for Depression (HAMD-17) – at day 15, “the marketplace reacted negatively,” according to Dan Chancellor, director, thought leadership, Informa Pharma Intelligence.

Molecular Health’s AI algorithm predicted zuranolone had a 39% chance of success. As Kopf explained, “It had a low similarity with approved drugs and limited data from clinical trials compared to others in this therapeutic area.” There also was a high number of already-approved drugs for this indication, and others with Fast Track designations. Yet, despite failing to meet Phase III endpoints in 2020, it recently announced a successful Phase III trial. So, he said, “We were wrong, but I still think identifying additional factors into the decision-making adds a good perspective.”

Diviner evaluated that trial, too. “When our experts looked earlier and later than the 15-day endpoint, there was a more moderate benefit…so the 15-day benefit looked like a fluke,” Mytelka said.

In another example, Vertex launched a trial for APOL1-mediated focal segmental glomerulosclerosis (FSGS), a rare, chronic kidney disease for which there are no approve treatments. “Diviner predicted a 74% chance of success and Molecular Health predicted 67,” Chancellor said.

The predictions of success hinged on the presence of biomarkers. “Known biomarkers indicate the mechanism of action is understood and that there is a clear target,” Kopf said, which improve the chances of success. Molecular Health’s algorithm factored in trial design, the drug itself, the sponsor, and other factors, “but it was the biomarker that drove us toward predicting success.”

The key to successfully predicting the outcomes of clinical trials lies in considering all the factors, Chancellor summarized. Whether you use human intelligence, AI, or both, “Use all the data available, understand the methodology, and come to your own conclusion.”

Back to news