Leveraging Best-of-Breed Algorithms for Accuracy in Precision Medicine
There is no one-size-fits-all algorithm for AI that enables drug developers to apply it and quickly identify whatever features they seek. Finding an answer to this dilemma has great implications for the field of precision medicine.
Currently, researchers are selecting best-of-breed algorithms in a modular approach to build customized analytics engines that answer specific questions in a way that is both unbiased and reproducible.
“We still don’t have a gold standard in terms of implementing and applying reproducible AI/ML approaches,” said Zeeshan Ahmed, Ph.D., assistant professor of medicine at Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, in an interview with BioSpace.
As yet, there has been very little effort to organize and understand the many computing approaches in this field.
Key AI/ML Objectives for Precision Medicine
A review published in Briefings in Bioinformatics is among the first. Ahmed and colleagues examined five years of literature in whole-genome or whole exome sequencing to identify 32 of the most frequently used AI/ML algorithms and approaches used to deliver precision medicine insights.
The team compared scientific objectives, methodologies, data sources, ethics and gaps for each of those approaches.
For AI/ML to be more useful for drug developers, several things are required. Chief among them, Ahmed said, are:
- “Efficient data collection, quality inspection, cleansing and AI/ML-ready (data) generation
- “Data modelling, with the establishment of correct associations between predictive input variables and expected outcome
- “Training and validation of the model to evaluate the predictive performance.
“During cases, when data is of high volume, it is important to ensure the right balance between training and actual datasets to avoid overfitting,” Ahmed noted.
For AI/ML to be most useful, the data should be standardized to enable more accurate searches. Ensuring the data uses the same terms to refer to the same elements helps ensure that all the relevant information can be identified and analyzed.
There should be a method to correct errors in the data, too. Data that is entered by hand, for instance, may well have inaccuracies. The study data also should span multiple diseases and distinct populations to reflect the broad way in which diseases, conditions and symptoms present.
The Role of AI/ML in COVID-19 Drug Repurposing
Recent research in Biomedicine & Pharmacotherapy, conducted by Kyung Hyun Choi from the Jeju National University in Korea and colleagues noted the value of ML and deep learning in drug repurposing for COVID-19 therapeutics. Those methods helped them distinguish between drug targets and gene products that affect target activity.
Each type of analysis had its own group of algorithms, Choi explained in the paper. Types of analyses used for machine learning included k-nearest neighbors (a non-supervised learning method), random forest and support vector machine, among others.
Deep learning techniques included artificial neural networks, convolutional neural networks and long short-term memory. AI algorithms were used for link prediction, node prediction or graph prediction and other tasks.
In applying AI/ML to research, Choi and colleagues wrote, “The limitations…include the inconsistency…in biological networks,” as well as challenges associated with various networks that can lead to bias in the outcome. To overcome those issues, he recommended using heterogeneous data from multiple sources to enhance the reliability of analyses.
Another study published in Current Drug Targets last year reviews ML tools used to identify biologically active compounds from among millions of candidates.
It found, among other things, that the support vector machine (SVM) algorithm was more effective than others in indicating the classification model best used for human intestinal absorption predictions. However, the quantitative structure-activity relationship (QSAR) model predicted flavonoid inhibitory effects in specific indications. Clearly, the choice of algorithm matters.
Decoding the Black Box
Until a few years ago, AI often was considered a black box that ingested data and expelled findings without providing researchers with the details needed to understand how those results were derived.
“You’re learning how thousands of inputs connect to hundreds or thousands of outputs,” David Longo, co-founder and CEO of Ordaos, told BioSpace. Machine learning algorithms “learn the intrinsic relationships between – for example – amino acids and motifs and domains…in a nonlinear, complex way, so there’s still a kind of black box element to AL/ML, depending on how you construct it.”
Generally, modern AI/ML algorithms allow some degree of insight into how individual algorithms reach their conclusions.
For example, Ordaos, which develops mini-proteins, “provides a trace-back of every single amino acid that was changed and how that affected the properties that come out of that protein,” Longo said. For researchers, that’s a huge benefit.
Innovation in the field of AI/ML today “is not necessarily around creating new individual components, but putting them together in interesting ways,” Longo continued.
He cited Ordaos’ multitask learning model as an example.
Traditionally, ML models were developed by training the algorithm in a specific area – a structure predictor would just train on structures and, with a few more steps, create a model. Using that model for another, slightly different purpose, required retraining. Ordaos’ model, in contrast, learns from multiple tasks simultaneously, somewhat countering Ahmed’s view of algorithm specificity.
Selecting the Right Algorithms
AI/ML analytic approaches have the potential to help develop enhanced, systems-level understanding of disease mechanisms and treatment impacts, and can replace the homogeneity of existing genetic and statistical approaches with heterogeneity. Realizing that value, however, requires selecting the right algorithms for the job.
“It is important to measure and avoid algorithmic bias,” Ahmed said. “Classifying tasks based on available predictor variables is a key step to correctly address the problem of choosing a suitable AI/ML algorithm.”
“In my lab,” Ahmed said, “We practice AI/ML-driven personalized medicine. We are generating AI/ML-ready datasets based on clinical and multi-omics/genomics profiles and are developing automated pipelines to analyze and perform predictive analysis.
“Furthermore, we are addressing ethical issues, which involve protecting health information associated with multi-omics/genomic datasets,” he continued.
The analytics trend is shifting from generating big data to analyzing and interpreting that data and using it predictively. For those predictions to be accurate, the underlying assumptions also must be accurate, and that requires selecting the right algorithms for the questions.