BioSpace Collaborative

Academic/Biomedical Research
News & Jobs
Biotechnology and Pharmaceutical Channel Medical Device and Diagnostics Channel Clinical Research Channel BioSpace Collaborative    Job Seekers:  Register | Login          Employers:  Register | Login  

Free Newsletters
My Subscriptions

News by Subject
News by Disease
News by Date
Search News
Post Your News

Job Seeker Login
Most Recent Jobs
Search Jobs
Post Resume
Career Fairs
Career Resources
For Employers

Regional News
US & Canada
  Biotech Bay
  Biotech Beach
  Pharm Country
  Bio NC
  Southern Pharm
  BioCanada East
  C2C Services & Suppliers™


Company Profiles

Research Store

Research Events
Post an Event
Real Estate
Business Opportunities

PLoS By Category | Recent PLoS Articles
Computer Science - Mathematics - Oncology - Pathology - Science Policy

A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
Published: Wednesday, December 07, 2011
Author: Maud H. W. Starmans et al.

by Maud H. W. Starmans, Glenn Fung, Harald Steck, Bradly G. Wouters, Philippe Lambin


Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures.

Principal Findings

A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ~40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number.


We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited.