Opinion: For AI to have impact, the industry must align on data

3D illustration for Polygon hand touches mirrored data in cyberspace, big data processing, analytics concept

In the future of AI-driven biopharma, reusable data is the most undervalued asset.

Artificial intelligence is rapidly becoming embedded across the biopharmaceutical R&D continuum—from early discovery and preclinical decision‑making to clinical development and regulatory review. Last June, the FDA signaled how far that integration has progressed when it announced the use of Elsa, a generative AI tool, to support aspects of the drug approval process. While regulatory adoption is an important milestone, it represents just one facet of a much larger shift underway.

Biopharma companies increasingly rely on AI to guide scientific, safety and development decisions. As we move into this new era, a foundational question comes into focus: Are we generating and structuring the right data to ensure these tools deliver safer, more effective drugs for patients?

My colleagues at Charles River Laboratories and I are working with both large and small biopharma to help ensure studies conducted today have future-proofed data. Today’s studies and experiments not only provide vital information on the next steps for a specific novel therapeutic, but also contribute to datasets that inform future predictive models. Data quality is of paramount importance to ensure those models are effective.

FDA
The FDA’s clunky launch of Elsa, an AI tool to increase efficiency, has sparked concern from agency employees and outside experts.

Think like a machine

AI is truly only as good as the data we provide to it. Understanding the way AI sees and uses data is critical to ensuring that it is a useful tool and not a misleading path with real implications for therapeutic development and human risk assessment. Metadata, formatting, data harmonization and avoiding bias in datasets all play a role in making sure machine learning provides accurate insights, but it is up to human programmers to think like a machine to anticipate and get ahead of common pitfalls in data quality.

As a drug moves through research and regulatory processes, any mistakes in the data will be compounded. Small gaps that a human reviewer would catch and fill in based on intuition may be impossible chasms for AI. The earlier you start with clean, organized data, the easier progress will be. Some of the key considerations include:

  • Metadata: Metadata is data about data. Most of us have seen it mainly in data about photographs, such as the date, GPS location, type of camera used and time of day. Metadata for drug research can give context, but it can also serve to make the data reusable for the future. As data becomes more plentiful, metadata becomes more valuable. It might reveal, for example, why an experiment was performed in the first place, so that the data generated will be better contextualized for future reuse. We cannot always predict how data might be used one day, so by making it as robust and rich as possible from the start, we could be increasing efficiency and insights exponentially in the future.
  • Formatting: For data to be readable, we must be able to format diverse datasets into a common language. To accomplish this, experimentalists and data scientists must collaborate throughout the journey of data generation. This will also help foster a generation of cross-disciplinary scientists.
  • Data Harmonization: The biopharma industry has not yet adopted formal standards for making drug discovery data FAIR (findable, accessible, interoperable and reusable). This is likely the largest hurdle to the success of AI integration in drug discovery programs broadly across the industry. There is progress: working groups and consortia are forming, and standards have been set for reporting of development data to regulatory authorities (such as SEND for nonclinical and CDISC for clinical trial data). Our industry must move beyond simple data collection for an immediate purpose to a place where reuse is central to the thought process. We must actively structure data and provide easy-to-access metadata for every number we collect. This puts each data point in a context that cannot be broken or misunderstood by anyone who accesses the data later—including AI.

The impact of AI on drug discovery

Once we’ve honed our mechanisms for teaching AI to appropriately read drug discovery and development data, the next, more exciting phase is what we do with it. The FDA’s use of ELSA is focused on analyzing nonclinical and clinical trial data to assess drug safety and efficacy, but how else can the industry capitalize on the power of AI?

AI has the potential to increase both the efficiency and probability of success in drug development. AI-based tools are becoming commonplace in small molecule design and have changed the landscape for protein structure prediction and engineering. However, modeling complex biological processes to identify new target-disease relationships or predict toxicities remains challenging for today’s technology. That said, AI-enabled toxicology models are showing strong predictive accuracy, and computational models have been validated for endpoints such as reproductive toxicity, carcinogenicity, endocrine disruption and skin sensitization.

Furthermore, advanced cell models, which themselves contribute to a future of increased human translation, can be paired with AI to enable deeper analysis and more robust simulation of biological processes. Algorithms based on studies of these models can facilitate comprehensive analysis of complex organoid behaviors, cellular interactions and dynamic responses, leading to more accurate predictive models, disease simulations and personalized medicine approaches.

The use of nonanimal models (NAMs) to support safety risk in humans has received considerable attention recently as the FDA and EMA have signaled a new level of support for such methods. Agencies have recently begun to provide published guidance on validation of these methods for formal human risk assessment but there remains caution from drug developers in how to adopt these methods at scale. Drug development teams across a variety of organizations have a unique opportunity to fill that void.

FDA
Draft guidance, issued by the FDA last week, could remove ambiguity and uncertainty that may have so far limited uptake of new approach methodologies, experts told BioSpace, particularly emphasizing the agency’s recommendations around defining NAMs’ regulatory purpose.

CROs, regulators and health authorities, with buy-in from the industry at large, could democratize access to noncompetitive data to help build, train and validate models for NAMs. With a trusted organization acting as a safe harbor, these data could support a transition to viable alternatives, ultimately supporting quicker drug development timelines.

Overall, we expect AI to have a positive impact on drug discovery, shortening timelines for best-in-class programs, perhaps increasing the choices around lead series and development candidates for first in class drugs, and, one day, accelerating and improving our ability to predict safety in humans. For that to happen, companies will need to fundamentally change their approach to generating and sharing data.

Subscribe to ClinicaSpace!

Clinical trial results, research news, the latest in cancer, cell and gene therapy

MORE ON THIS TOPIC