Comprehensive Study Finds Mutations In Non-coding Genome Are Infrequent Drivers Of Cancer

A clearer picture of how DNA changes lead to cancer has emerged, following the most comprehensive evaluation of non-coding driver mutations to date by researchers at the Wellcome Sanger Institute, the Broad Institute of MIT and Harvard, Massachusetts General Hospital (MGH), Aarhus University Hospital and their collaborators.

The study, published today (5 February) in Nature as part of a global Pan-Cancer Project*, discovered several new cancer drivers in non-coding genes. The overall conclusion, however, reaffirms that the vast majority of cancer drivers occur in protein-coding regions of the human genome. This knowledge will help to focus efforts on discovering new causes and treatments for cancer.

Also published today in Nature and related journals, are 22 further studies from the Pan-Cancer Project. The project represents an unprecedented international exploration of 2,600 cancer genomes, which significantly improves our fundamental understanding of cancer and zeros-in on mechanisms of cancer development.

Driver mutations are DNA changes that ‘drive’ cells down the path towards cancer. Depending on the type of cancer, anywhere from one to ten driver mutations are required for cancer to develop**.

Most large-scale genomic studies of cancer to date have focused on detecting driver mutations in protein-coding genes. As these coding sequences represent less than two per cent of the human genome, investigations into the remaining 98 per cent of the ‘non-coding’ genome*** have taken place in recent years. In 2013, driver mutations were discovered in the non-coding TERT gene across many cancer types, raising the possibility that there may be numerous non-coding driver mutations in the ‘dark matter’ of the genome.

This study is the most comprehensive evaluation of the extent of non-coding driver mutations in cancer to date, in terms of the number of methods employed, number of samples analysed, and the number of cancer, genome region and mutation types studied. Overall, 2,600 genomes of 38 different tumour types were analysed.

The team identified a number of new non-coding cancer-driving mutations, such as non-coding mutations in the 5’ untranslated region of the TP53 gene, which are associated with this gene being less strongly expressed, or ‘turned off’.

The results concluded, however, that mutations in the regulatory sequences surrounding cancer genes are relatively rare. Excluding mutations in the TERT gene, the number of non-coding driver mutations identified equated to around one (or fewer) in every 100 tumours. In comparison, protein-coding regions often harbour several driver mutations per tumour. Some non-coding drivers identified in previous studies were found to be the result of less accurate methodologies or the result of previously uncharacterised hyper-mutation processes.

Dr Federico Abascal, of the Wellcome Sanger Institute, said: “The fact that our results contrast so strongly with other studies is largely down to how rigorous our analysis has been. Despite using numerous methods, the largest dataset currently available and surveying a wide range of non-coding regions of the genome, we found very few genuine driver mutations outside protein-coding genes.”

Dr Gad Getz, of the Broad Institute and MGH, said: “The non-coding driver mutations we identified, such as in the TP53 gene, add to the short list of non-coding driver mutations that already includes TERT, FOXA1 and a few other genes. By rigorously analysing the mechanisms that contribute to increased mutation rates, we were not only able to find new drivers but also raise doubts about previously reported ones that are affected by local mutational processes and artefacts uncovered in our study. We hope that our analysis will serve as the basis for future cancer genome studies.”

This unexpected result has important implications for the treatment of cancer. While technological advancements and larger cohorts will undoubtedly lead to the discovery of more non-coding driver mutations, it is unlikely that the ratio of coding to non-coding drivers will change significantly. This implies that efforts to develop new cancer treatments should primarily focus on protein-coding genes.

Dr Inigo Martincorena, of the Wellcome Sanger Institute, said: “Overall, our study suggests that while increasingly large datasets will continue to yield new coding and non-coding driver mutations, the vast majority of cancer drivers occur in the two per cent of the genome that codes for proteins. To us, this was an unexpected and important result. For cancer patients, this means that the vast majority of clinically-relevant mutations in a cancer are likely to be found in protein-coding sequences, which will simplify efforts for the clinical use of genome sequencing in cancer.”


Contact details:
Dr Matthew Midgley
Press Office
Wellcome Sanger Institute
Cambridge, CB10 1SA
Phone: 01223 494856

Notes to Editors:

Telephone Press Briefing:

**Please note that a Nature telephone press briefing for the entire Pan-Cancer Project will take place UNDER STRICT EMBARGO on Tuesday 4th February at 15:00 London time (GMT) / 10:00 US Eastern Time**

Authors Peter Campbell and Lincoln Stein will discuss the research. This will be followed by a Q&A session.

To attend this briefing you will need to pre-register by following the link here. Once you are registered, you will receive an email containing the dial-in details for the conference. You will also be provided with the option to save the details of the briefing to your calendar.

The papers and this briefing are subject to an embargo of 18:00 London time (GMT) / 13:00 US Eastern Time on Wednesday 5th February

Further information on the Pan-Cancer Project:
Background information on the Pan-Cancer Project, related studies and images are available at: WeTransfer Link.

*The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG), known as the Pan-Cancer Project, is the largest and most comprehensive study of whole cancer genomes yet. The collaboration involving more than 1,300 scientists and clinicians from 37 countries, analysed more than 2,600 genomes of 38 different tumour types, and has created a huge resource of primary cancer genomes, available to researchers worldwide to advance cancer research.

Main findings from the Pan-Cancer project:

  • The cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, we can characterise every genetic change found in a cancer, all the processes that have generated those mutations, and even the order of key events during a cancer’s life history.
  • We are close to cataloguing all of the biological pathways involved in cancer and having a fuller picture of their actions in the genome. At least one causal mutation was found in virtually all of the cancers analysed and the processes that generate mutations were found to be hugely diverse -- from changes in single DNA letters to the reorganization of whole chromosomes. Multiple novel regions of the genome controlling how genes switch on and off were identified as targets of cancer-causing mutations.
  • Through a new method of “carbon dating”, the Pan-Cancer Project discovered that we can identify mutations which occurred years, sometimes even decades, before the tumour appears. This opens, theoretically, a window of opportunity for early cancer detection.
  • Tumour types can be identified accurately according to the patterns of genetic changes seen throughout the genome, potentially aiding the diagnosis of a patient’s cancer where conventional clinical tests could not identify its type. Knowledge of the exact tumour type could also help tailor treatments.

For access to all the open tier data in the Pan-Cancer project, go to

**For more information on driver mutations in different types of cancer, see the Sanger Institute website

***More information on protein-coding and non-coding genes is available at:


Esther Rheinbay, Morten Muhlig Nielsen and Federico Abascal et al. (2019). Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. DOI:

The Nature collection landing page with all PanCancer publications will go live when the papers publish:


This research was funded by GDAC, the Broad Institute of MIT and Harvard, Independent Research Fund Denmark, The Danish Cancer Society, National Institutes of Health and Wellcome.

Selected websites:

Aarhus University and Aarhus University Hospital

Aarhus University is a Danish research-intensive university founded in 1928. It is a top-ten university among universities founded within the past 100 years, and it has a long tradition of partnerships with some of the world’s best research institutions and university networks. The Faculty of Health at Aarhus University seeks to improve public health and benefit society with outstanding basic research, clinical translation, and innovation.

Aarhus University cooperates closely with Aarhus University Hospital, which is one of the largest and most advanced hospital complexes in Northern Europe. The hospital has a large focus on development and application of precision medicine based on genomics and integrative molecular profiling. For more information about The Faculty of Health at Aarhus University and Aarhus University Hospital, go to and

About Massachusetts General Hospital

Massachusetts General Hospital, founded in 1811, is the original and largest teaching hospital of Harvard Medical School. The MGH Research Institute conducts the largest hospital-based research program in the nation, with an annual research budget of more than $1 billion and comprises more than 8,500 researchers working across more than 30 institutes, centers and departments. In August 2019 the MGH was once again named #2 in the nation by U.S. News & World Report in its list of "America’s Best Hospitals."

About the Broad Institute of MIT and Harvard

Broad Institute of MIT and Harvard was launched in 2004 to empower this generation of creative scientists to transform medicine. The Broad Institute seeks to describe all the molecular components of life and their connections; discover the molecular basis of major human diseases; develop effective new approaches to diagnostics and therapeutics; and disseminate discoveries, tools, methods, and data openly to the entire scientific community.

Founded by MIT, Harvard, Harvard-affiliated hospitals, and the visionary Los Angeles philanthropists Eli and Edythe L. Broad, the Broad Institute includes faculty, professional staff, and students from throughout the MIT and Harvard biomedical research communities and beyond, with collaborations spanning over a hundred private and public institutions in more than 40 countries worldwide. For further information about the Broad Institute, go to

The Wellcome Sanger Institute
The Wellcome Sanger Institute is a world leading genomics research centre. We undertake large-scale research that forms the foundations of knowledge in biology and medicine. We are open and collaborative; our data, results, tools and technologies are shared across the globe to advance science. Our ambition is vast – we take on projects that are not possible anywhere else. We use the power of genome sequencing to understand and harness the information in DNA. Funded by Wellcome, we have the freedom and support to push the boundaries of genomics. Our findings are used to improve health and to understand life on Earth. Find out more at or follow us on Twitter, Facebook, LinkedIn and on our Blog.

About Wellcome
Wellcome exists to improve health by helping great ideas to thrive. We support researchers, we take on big health challenges, we campaign for better science, and we help everyone get involved with science and health research. We are a politically and financially independent foundation.

Back to news