Brain Implants Capable of Turning Thought into Speech
Researchers with the University of California, San Francisco (UCSF) developed a “neural decoder” that translates brain activity to synthesize audible speech. They published their research in the journal Nature.
Current technology, such as those used by the late cosmologist Stephen Hawking, involve assistive devices that use eye or facial muscle movements into letter-by-letter spelling. Those produce a high error rate and are slow, usually about 10 words per minute. Natural speech averages about 150 words per minutes, although most everyone knows people who talk much faster or much slower.
The USCF technology was a two-stage method. They began by collecting high-resolution brain activity data from five volunteers over several years. All of these volunteers had normal speech function, but had electrodes implanted directly in their brains because they were being monitored for epilepsy treatment. The electrodes were used to track brain activity in speech-related areas of the brain while the patients read out loud.
The researchers then developed a two-stage process to create those words. The first step was to build a decoder to interpret the recorded brain activity patterns. These were analyzed as a set of instructions for movements related to speech, including lips, tongue, jaw and larynx. They then engineered a synthesizer that translated the virtual movements to produce speech.
Gopala Anumanchipalli, co-lead author of the study, told Smithsonian Magazine that in patients who would benefit from the technology, such as ALS patients, “The brain is intact in these patients, but the neurons—the pathways that lead to your arms, or your mouth, or your legs—are broken down. These people have high cognitive functioning and abilities, but they cannot accomplish daily tasks like moving about or saying anything. We are essentially bypassing the pathway that’s broken down.”
The system worked pretty well. Listeners had problems with about 30% of the synthetic speech. Examples include hearing “rabbit” when the computer said “rodent.” There were also some misunderstandings with uncommon words, examples being, “At twilight on the twelfth day we’ll have Chablis” and “Is this seesaw safe?” These were chosen because they include all the phonetic sounds in English.
Listeners transcribed 43% of sentences with a 25-word vocabulary perfectly, and 21% of words perfectly with a 50-word vocabulary. Overall, about 70% were correctly transcribed.
Edward Chang, a UCSF neurosurgeon and study leader, indicates the next step will be to improve the quality of the audio so it is more natural and understandable. He also notes that the movements of the vocal tract were similar from person to person, which should mean it’s possible to create a “universal” decoder. “An artificial vocal tract modeled on one person’s voice can be adapted to synthesize speech from another person’s brain activity,” Chang told IEEE Spectrum.
The technology is a long way from being used in clinical practice. One reason is simple—it currently requires inserting electrodes directly into the brain. “That’s a heck of a constraint,” neuroscientist Marcel Just of Carnegie Mellon University told STAT. He is currently working on noninvasive methods to detect thoughts.
One more step the researchers took was to have a participant silently mime the sentences instead of reading them out loud. The resulting sentences reproduced synthetically weren’t as accurate, but the correlation between speech and non-vocalized speech has a lot of implications.
Josh Chartier, a co-lead author on the study and a UCSF bioengineering graduate student, stated, “It was really remarkable to find that we could still generate an audio signal from an act that did not generate audio at all.”