BioSpace.com

Biotech and Pharmaceutical
News & Jobs
Search the Site
 
   
Biotechnology and Pharmaceutical Channel Medical Device and Diagnostics Channel Clinical Research Channel BioSpace Collaborative    Job Seekers:  Register | Login          Employers:  Register | Login  

NEWSLETTERS
Free Newsletters
Archive
My Subscriptions

NEWS
News by Subject
News by Disease
News by Date
PLoS
Search News
Post Your News
JoVE

CAREER NETWORK
Job Seeker Login
Most Recent Jobs
Browse Biotech Jobs
Search Jobs
Post Resume
Career Fairs
Career Resources
For Employers

HOTBEDS
Regional News
US & Canada
  Biotech Bay
  Biotech Beach
  Genetown
  Pharm Country
  BioCapital
  BioMidwest
  Bio NC
  BioForest
  Southern Pharm
  BioCanada East
  US Device
Europe
Asia

DIVERSITY

INVESTOR
Market Summary
News
IPOs

PROFILES
Company Profiles

START UPS
Companies
Events

INTELLIGENCE
Research Store

INDUSTRY EVENTS
Biotech Events
Post an Event
RESOURCES
Real Estate
Business Opportunities

PLoS By Category | Recent PLoS Articles
Biochemistry - Biophysics

SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification
Published: Thursday, June 28, 2012
Author: Ming Fan et al.

by Ming Fan, Ka-Chun Wong, Taewoo Ryu, Timothy Ravasi, Xin Gao

With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
  More...

 

//-->