Breakthrough AI Model Maps Hidden Genetic Connections Among Global Populations Through Self-Discovery

Ecotone’s Seq2KING Uses Pioneering “Neural Excavation” to Map Human Heritage Patterns, Advancing Precision Medicine Development

San Francisco, CA – Ecotone today announced the publication of groundbreaking research introducing Seq2KING, the first artificial intelligence model to use transformer attention mechanisms to discover global human genetic relationships without requiring pre-labeled population data. This research demonstrates how the emerging "LifeLanguage Companies" framework—treating biological molecules as encoded information languages—enables breakthrough innovations in genomic AI.

 

The Challenge: Reading 3 Billion Genetic "Words"

The human genome contains approximately 3 billion base pairs—essentially a 3-billion-word language that scientists are still learning to read. Understanding the genetic relationships between any individual and the world's 8 billion people is crucial for developing targeted therapies for the estimated 10,000 genetic diseases, but current methods rely on predetermined population categories that can introduce bias and miss subtle genetic connections.

 

The LifeLanguage Framework: A New Way of Thinking

Ecotone exemplifies the emerging "LifeLanguage Companies" that treat DNA, RNA, and proteins as encoded information languages rather than physical substances to be manipulated in laboratories. This fundamental shift in thinking—from molecules to information—enables breakthrough innovations like Seq2KING and DeepMind's AlphaFold that would be impossible with traditional biotech approaches.

 

"The LifeLanguage framework allows us to apply the same computational principles that revolutionized natural language processing to biological sequences," explained Dr. eMalick Njie, CEO and founder of Ecotone. "When you treat DNA as a language, you can use language AI techniques like transformer attention to discover patterns that were previously invisible."

 

The Innovation: AI That Discovers Rather Than Assumes

Seq2KING employs a revolutionary approach called "neural excavation"—diving deep into the internal workings of transformer AI models to uncover hidden patterns of genetic relatedness. Unlike traditional methods that require researchers to pre-define population groups, Seq2KING discovers these relationships purely from genetic data by treating the genome as an information language.

 

"Just as words in a sentence gain meaning through their relationships to other words, we found that individual genomes can be understood through their genetic connections to others," explained Dr. Njie. "Seq2KING discovered that individuals from the same continent had 85.34% stronger genetic attention connections compared to those from different continents—all without ever being told which continent anyone was from."

 

Key Research Findings

The study, conducted using data from 2,503 individuals across five continents from the 1000 Genomes Project, revealed several breakthrough discoveries:

 

-Unsupervised Heritage Discovery: The AI model successfully identified familiar population groupings (European, African, Asian, American, and South Asian) entirely through its own analysis, without any human guidance about geographic origins.

-Continuous Relationship Mapping: Unlike existing methods that assign people to discrete categories, Seq2KING provides a continuous spectrum of genetic relatedness, offering more nuanced insights into human population structure.

-Computational Efficiency: By using compressed kinship matrices instead of raw genomic data, the approach could theoretically scale to map relationships among all 8 billion humans—a computational feat previously impossible.

 

Revolutionary Visualization Technology

The research team adapted BERTViz, a tool originally designed for language models, to create the first human-readable visualizations of dense genetic relationship networks. This enables scientists to intuitively understand complex genetic connections that would otherwise be buried in incomprehensible numerical data.

 

Implications for Precision Medicine

The ability to map genetic relationships without bias has profound implications for developing personalized treatments. Different genetic backgrounds can affect how individuals respond to medications, and understanding these relationships helps remove "background noise" that can obscure disease-causing genetic variants.

 

"This research brings us one step closer to reading the human genome as a first language," said Dr. Njie. "By providing more precise genetic coordinates for CRISPR-based therapies, we're potentially accelerating the development of treatments for genetic diseases that affect 800 million people worldwide."

 

Technical Achievement: "Neural Excavation"

The research represents the first application of "neural excavation" to genetics—a cutting-edge technique that explores the internal mathematical representations of AI models to extract useful insights. The team discovered that different layers of their transformer model processed genetic relationships in unexpected ways, with some layers requiring mathematical inversion to reveal meaningful patterns.

 

Future Applications

Beyond precision medicine, Seq2KING could help scientists trace human migration patterns, discover previously unknown population subgroups, and improve genetic studies by accounting for population structure. Most importantly, Seq2KING will serve as a critical component in Ecotone's upcoming Hawaiian Diamond reasoning model—the first true DNA reasoning model that integrates capabilities to comprehend the human genome as a first language.

 

Hawaiian Diamond will incorporate dnaSORA (the company’s previously released diffusion model) and Seq2KING as key component parts for obtaining genetic coordinates to guide CRISPR molecules as they crawl on DNA strands to locate and replace disease-causing elements. This reasoning model will establish the first true semantic understanding of genomic data, enabling treatment development for thousands of currently untreatable genetic diseases that collectively impact hundreds of millions of people.

 

About Ecotone

Ecotone is a pioneering LifeLanguage company building foundational AI platforms to read the human genome as a first language. This unique framework—treating biological molecules as encoded information languages—enables breakthrough innovations that would be impossible with traditional biotech approaches. The company's dnaSORA model represents the world's first diffusion transformer for DNA, and Seq2KING is the second component of its comprehensive genomic AI platform. These models will integrate into Hawaiian Diamond, Ecotone's upcoming DNA reasoning model that will establish the first true semantic understanding of genomic data. The company is based in New York City and San Francisco.

 

Availability and Access

The complete research paper is available on bioRxiv https://www.biorxiv.org/content/10.1101/2025.06.17.660172v1

The complete code and data are publicly available on GitHub at https://github.com/EcotoneAI/Seq2KING

 

---

Contact:

info@ecotone.ai

---