EXCLUSIVE: DNAnexus, Stanford University Genomic Catalog Will Accelerate Science in the Field, Says CEO
Published: Jul 08, 2015
July 7, 2015
By Riley McDermid, BioSpace.com Breaking News Sr. Editor
A new project from DNAnexus, Inc. and Stanford University to comprehensively catalog biochemical activities of the human genome is a game-changing repository of information that will fundamentally change the way science studies the genomic basis for disease, the company’s CEO told BioSpace this week.
Richard Daly, CEO, DNAnexus, said in a long-ranging interview that its new ENCODE Phase III data will provide a foundation for studying the genomic basis of human biology and disease.
“The project is the flagship initiative of the NHGRI, and in the first tier of all NIH-funded genome projects,” he said. “Stanford University is the ENCODE Data Coordination Center (DCC) and is in charge of this process. Phase III is using the next generation of technologies and methods to deepen the catalog, expected to produce 10X-100X more data than Phase II.”
That’s an enormous task that will take time, money and manpower—but is worth every second, said Daly. "It’s expected this analysis will require 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform," he said.
BioSpace interviewed Daly about what scientists and investors both can expect from such a deep dive into genomic data compilation.
BioSpace: What are the key takeaways from this announcement?
The ENCODE DCC was tasked with centralizing the project’s raw sequencing data with uniform metadata standards and bioinformatics analysis. They chose DNAnexus because the company provides:
• A secure and unified platform already connecting thousands of scientists around the world
• A scalable environment to process thousands of datasets and allow collaboration around petabyte-sized genomic analysis results
• Transparency, reproducibility, and data provenance for consistency amongst ENCODE pipelines and results.
BioSpace: How does gathering data like this help standardize research?
In the past, researchers wishing to perform large-scale analysis on ENCODE data have generally had to download the data and tools to their own research computing infrastructure. This contributed to an environment where there were no uniform processing pipelines, each lab processed data on pipelines that were optimized for their local infrastructure and submitted the results to the DCC. Therefore results were not being uniformly process for publications.
Now it is possible for researchers from around the world to run uniform ENCODE bioinformatics pipelines in the cloud. The DNAnexus platform enables uniform analytical treatment via version-controlled analyses and tools. Researchers at institutions worldwide have access to data and tools in real-time, promoting secure collaboration and accelerating scientific discovery.
BioSpace: What did this Phase III data show?
Phase III of the ENCODE project has just begun.
The ENCODE project has proceeded in three phases. Phase I (2003-2007) was a pilot effort for a small portion of the genome. Phase II (2007-2012) scaled up to full-genome wide analyses and received widespread media coverage upon completion. The current Phase 3 (2012-2016) is using the next generation of technologies and methods to deepen the catalog, expected to produce petabyte-scale data (10X-100X more than phase 2). It’s expected Phase 3 analysis will require 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform.
BioSpace: How did the Stanford partnership come about?
Stanford University heads the Data Coordination Center (DCC) for the National Institutes of Health (NIH)-funded ENCyclopedia of DNA Elements (ENCODE) Project, a flagship functional genomics consortium funded by the National Human Genome Research Institute at the NIH. A competitive evaluation between DNAnexus and alternatives was conducted and documented. Professor Mike Cherry is Principal Investigator for the ENCODE DCC and professor of genetics at Stanford University.
BioSpace: How many samples were involved in this project?
1,640 datasets were analyzed in Phase II of the ENCODE Project. An important future goal will be to enlarge this dataset to additional factors, modifications and cell types, complementing the other related projects in this area. These projects will constitute foundational resources for human genomics, allowing a deeper interpretation of the organization of gene and regulatory information and the mechanisms of regulation and thereby provide important insights in human health and disease.
As New Jersey Biotech Booms, Will It Overtake Other States As Prime Location?
A week after Celgene Corporation announced it is officially the mystery buyer of Merck & Co. ’s former 1 million-square-foot R&D site in Summit, N.J., it quickly became our most popular story last week.
The company announced last Wednesday that it is buying the space, ending months of speculation about what Big Pharma company might move into the neighborhood.
The Summit, N.J. site is zoned research/office. The New Jersey site would put operations closer to some of the major biotech and pharmaceutical hubs on the East Coast.
But, by far, the most tempting part of doing business in the state remains New Jersey’s operating tax credit, which allows companies to sell their net operating losses to the New Jersey Treasury. One of the state’s most recognizable biotechs, Celgene, used the program until it became profitable, which was key to it staying in the state, said local officials.
That has BioSpace is wondering if New Jersey is becoming the new face of biotech. What do you think? Can the Garden State compete with other longtime stalwarts like California or Boston?