J&J’s Christopher Whelan on How—and Why—13 Companies Teamed up to Tackle Proteomics

A DNA strand and a protein/Getty

A DNA strand and a protein/Getty

The Pharma Proteomics Project isn’t the first precompetitive collaboration between pharma companies, but it’s one of the largest. Members recently published associations they’ve uncovered in UK Biobank data.

Pictured: A DNA strand and a protein/Getty

Earlier this month, Nature published results from a large-scale study of genetic variation in proteins related to human health. Rather than originating at a university or with the National Institutes of Health, the study came from a consortium of pharmaceutical companies.

The Pharma Proteomics Project (PPP) was first announced in 2020 and has been described by participants as “one of the biggest industry consortia of its kind.” Pharmaceutical companies partnered with the UK Biobank, a repository of environmental, lifestyle and genetic data based in the United Kingdom, and Olink Proteomics to analyze circulating proteins in more than 50,000 biomedical samples. The team looked at proteins related to cardiometabolic function, inflammation, neurology, oncology and COVID-19 outcomes.

Christopher Whelan

Christopher Whelan

Pre-competitive collaborations like the PPP are not uncommon. The Pistoia Alliance formed in 2007 to foster innovation in R&D and now has over 100 members globally. In 2011, Pfizer stated it had become part of multiple precompetitive consortia.

BioSpace spoke with project leader Christopher Whelan, a director on the Data Science and Digital Health team at Johnson & Johnson Innovative Medicine, about how the team came together and its findings. Responses are lightly edited for clarity.

BSp: How did the PPP come to focus on proteomics versus any other genomic technology?

CW: The consortium primarily [consists] of geneticists. . . . And I think there’s increasing recognition that genetics gives us the blueprints of diseases, but we need to go beyond genetics to get the whole picture. So we came together to essentially translate the language of the human genome through proteomics. Proteins are the end products of genes, and they represent the majority of our drug targets and a large proportion of our circulating biomarkers. So the idea would be, can we tie together the vast amount of knowledge that we have gathered on the human genome with a larger body of evidence on circulating biomarkers and on our drug targets?

BSp: How did you all decide to come together as a collective to do this rather than [just] one company funding the study?

CW: A couple of explanations there. The first is that we built upon some preexisting consortia. There are, for example, two existing consortia focused on exome sequencing of the UK Biobank, and there’s also another consortium focused on whole genome sequencing of the UK Biobank. So there was precedent for pre-competitive pharma collaboration, especially around the UK Biobank. I think the second component would be [that] proteomics is a very exciting technology. It’s a new technology, and like a lot of new technologies, [it] can be expensive. So the idea was to come together and cost share, and by cost-sharing we were able to do something on a population scale that no one company could achieve by itself.

BSp: The press release from 2020 said there were 10 companies, and when the final study came out there were 13. Is the consortium going to continue to expand?

CW: That would be our hope. As you highlighted, we were 10 companies [J&J, Amgen, AstraZeneca, Biogen, Bristol Myers Squibb, Genentech, GSK, Pfizer, Regeneron and Takeda] when we began. We grew to 13. Alnylam, Novo Nordisk and Calico joined after that announcement. We have received interest from other companies. On joining, we intentionally placed sort of a hard stop or a barrier just so we could get this initial project and 55,000 samples underway. But now that that project is complete, we are sitting down as a group and considering what our next steps would be. And I definitely think that there will be openness for additional companies to join the next phase.

BSp: What was something that was challenging about this process, whether on the front of getting all these people to work together or dealing with the actual research?

CW: I think the contractual process was very challenging, just dealing with the legal components and all of the red tape that might be involved in getting 13 companies to agree on how we should do this. On the scientific end, it was actually quite a pleasant process. I think the vast majority of the companies were very eager to work together in a truly precompetitive, truly collaborative manner. So scientifically, [a] very, very, very, very positive experience.

Just logistically, it can be challenging, but there are 13 companies all working towards a common goal. You’ll see in the paper that we aim towards the lower hanging fruits in conducting [genome-wide association studies] and doing some basic cross-sectional regressions of the most prevalent diseases. We all agreed that it made sense for us to do that together. There was an implicit acknowledgment that if we didn’t do it together, each of the 13 companies would go away and do their own version of the same analysis, and that can be very confusing for the scientific community.

BSp: Do you primarily hope to use the data from this study for developing new molecules or repurposing existing drugs?

CW: It’s a dataset with multiple high-impact use cases. Certainly, finding new targets is one. Validating some of our existing targets is a second. Repurposing of targets could be another opportunity. Beyond drug targets, I think in the future there could be opportunities to use this dataset to develop new diagnostics—you know, multiprotein markers that may be predictive of a disease or predictive of drug response.

We are interested in using this data set to find subtypes of a certain disease. So for example, different biological subtypes of Alzheimer’s disease or major depressive disorder, we can conduct advanced [machine learning] to find those subtypes and then align those subtypes to the mechanisms of action of the drugs that are in our portfolios.

BSp: Can you summarize some of the study’s more detailed findings?

CW: First, we found thousands of associations between proteins and demographic factors like BMI, liver and kidney functions and the top 20 prevalent illnesses in UK Biobank. Some of the more notable associations included increased levels of inflammatory cytokines like CXCL17 in depression. We also saw broad upregulation of a protein called GDF15 across 18 of those 20 prevalent illnesses, and that would mean that’s a stress response cytokine that’s induced after injury. And our study would suggest that it could be a marker of general health, as it’s strongly upregulated across so many unique health conditions.

Next, we found that proteins in combination do an excellent job of predicting age and sex and BMI, and liver function and kidney function. And then next, perhaps most notably, we built the world’s largest pQTL [protein quantitative trait loci] library, or more simply, associations between gene variants and proteins. So we built a collection of over 14,000 associations between commonly occurring gene variants and protein levels. And then we have a few exemplars of why that proteogenomic library is important. We show how it helps us gain more insights into complex biological networks like the complement system, and that there’s a combination of short- and long-range interactions within the complement system.

We also briefly touch upon the utility of the proteogenomic library to find causal associations between proteins and diseases. Those causal associations could potentially flag more powerful drug targets. The example that we give is [the enzyme] PCSK9 and lipids just because it’s a ubiquitous example—it’s been talked about quite a bit in the literature already. We recapitulate the causal relationship between PCSK9 and lipids and atherosclerosis, just as an example of how this dataset could be useful to find new drug targets. We don’t talk about new drug targets . . . but there will be papers in the future that look at that more systematically.

BSp: What do you think the public should take away from this?

CW: I think that the study is helping us translate the language of the human genome. It’s helping us translate our own instruction books. Basically, we’ve provided a foundation to find new drug targets, we’ve provided a foundation to build new blood tests that could screen for diseases or predict . . . your response to a particular therapeutic. The paper mainly describes how we’ve built that foundation. And we have touched upon some of the exciting things you can do with this dataset. But over the next six to 12 months, I think we’re going to see papers that are as high-impact if not more high-impact based on the foundation that we built.

One thing we didn’t touch upon in the paper was the use of advanced [artificial intelligence] on this data set; there is a lot of untapped potential there and it’s something we’re absolutely exploring further within the data sciences group at J&J. We have such big data [that] we can use artificial intelligence and machine learning to uncover some of these paths that will help us better understand disease and finding targets.

Nadia Bey is a freelance writer from North Carolina. She can be reached at beynadiaa@gmail.com.