How Pangaea Data is Using Artificial Intelligence to Drive Innovation in Cancer Care

Leveraging novel unsupervised AI in a federated privacy preserving manner, Pangaea extracts new insights from clinical text data such as doctors’ notes and discharge summaries, which is validated by clinicians and has shown the potential to save $1 billion in annual healthcare costs related to cancer cachexia cases.

SAN FRANCISCO - April 11, 2022 – At two upcoming conferences, oncologists and surgeons will present details from two studies that used Pangaea Data’s novel artificial intelligence-based software product.

Pangaea has developed a software product called Pangaea’s Intelligence Extraction and Summarization (PIES) that uses unsupervised artificial intelligence methods relating to natural language processing and natural language generation to extract and summarize actionable insights from unstructured text, which accounts for 80% of electronic health records. This data is largely underutilized in healthcare and the biopharmaceutical industry. Pangaea enables clinicians to combine such new intelligence with discrete structured values and use it to characterize difficult to diagnose conditions as well as catch undiagnosed or miscoded patients.

The first study, which will be presented by UK-based oncologists at the Bio-IT World Conference & Expo in Boston in May, is focused on how PIES was used to identify 6 times more undiagnosed and miscoded cases of cachexia in cancer patients. Also at Bio-IT, US-based oncologists will present results from a second study designed to assess whether PIES could extract tumor genetic testing (TGT) results from electronic health records (EHRs).

The oncologists will also present the results from the second study at the American Association for Cancer Research Annual Meeting (AACR) in New Orleans this week through a poster titled “Creating research quality cancer genomic data from electronic health records.”

“Both studies demonstrate the value of our unsupervised natural language processing and natural language generation methods to identify miscoded or misdiagnosed patients,” Vibhor Gupta, Pangaea’s Founder and CEO, said in a statement. “Furthermore, Pangaea’s technology makes it possible to extract and use this information without compromising the privacy and security of patients data.”

Cachexia is a muscle-wasting condition frequently underdiagnosed in cancer patients. Cases of cachexia may account for as much as 20% of cancer-related deaths. Patients have impaired responses to treatment and an overall poor quality of life. In the UK, early diagnosis may cut treatment costs by as much as 50% potentially saving as much as $1 billion per year.

Manually assessing individual medical records or searching for relevant ICD-9 codes to identify instances of cachexia is time consuming and error prone. Pangaea collaborated with oncologists from NHS Lothian and University of Edinburgh to analyze data from nearly 60,000 ICU patients including discharge summaries and family histories. PIES successfully identified 6x more patients with 90% accuracy, compared to relying on ICD-9 codes alone. PIES also identified patients at high risk of cachexia using clinical features such as neoplasm, weight loss, malnutrition, and poor appetite.

“Early identification of cancer cachexia remains a challenge, but the earlier it is identified, the quicker we can start treatment to prevent or treat cachexia.” Barry Laird and Richard Skipworth of the Caledonian Cachexia Collaborative said in a joint statement. “Pangaea is helping to fill this gap by providing software that successfully extracts this information from unstructured text available in the electronic health record.”

For the second study focusing on extracting TGT results from EHRs, researchers and oncologists assessed whether natural language processing algorithms could extract information from unstructured clinical text and PDF results from two breast cancer gene expression tests: 21-gene Recurrence Score (OncotypeDx) and/or the 70-gene signature (Mammaprint). They applied PIES to 800 medical records and gene expression test results from 21 breast cancer patients. They were able to extract 26 variables from the documents including age at diagnosis and cancer histology. The results showed that PIES had an average accuracy of over 97% across all 26 variables and 100% accuracy for 14 of these variables.

Both studies were conducted through virtual machines from Microsoft Azure, which allowed scalability across multiple hospitals and complimented Pangaea’s privacy preserving approach.

About Pangaea

Pangaea Data is headquartered in London with offices in San Francisco and Hong Kong. Pangaea provides an AI driven software product, which has proven to find 6x more suitable patients including those who are undiagnosed and to automatically generate clinical narratives for regulatory reports thereby saving 90% in time. This is achieved through Pangaea's novel unsupervised Natural Language Processing and first of its kind Natural Language Generation methods which respectively extract and summarize intelligence from textual and multi-modal health data at scale and in a federated privacy preserving manner. Pangaea's product has demonstrated 85 - 90% accuracy through high impact peer reviewed publications, which is transformatively higher than generic language models like GPT-3, supervised NLP, and relational extraction approaches. Pangaea's product has successfully scaled in the pharmaceutical and healthcare industry and has been presented by leading clinicians and scientists at global conferences.

The founders at Pangaea have secured more than $200 million in research funding through their work. Pangaea has received investments from several life science VC funds including the former CIO of Novartis, a GSK Board Member, and Managing Partner of Novo Holdings. Pangaea was recently named a leading digital health innovator by the UK government.

Media Contact: Dr. Carlos Pittol

Email: pr@pangaeadata.ai

Phone number: +44-7887-504-975

Back to news