Technology uses patient’s voice recordings to create synthetic speech
PRAGUE: Vlastimil Gular’s life took an unwelcome turn a year ago: minor surgery on his vocal cords revealed throat cancer, which led to the loss of his larynx and with it, his voice. But the 51-year-old father of four is still chatting away using his own voice rather than the tinny timbre of a robot, thanks to an innovative app developed by two Czech universities. “I find this very useful,” Gular said, using the app to type in what he wanted to say, in his own voice, via a mobile phone.
“I’m not very good at using the voice prosthesis,” he added, pointing at the hole the size of a large coin in his throat. This small silicon device implanted in the throat allows people to speak by pressing the hole with their fingers to regulate airflow through the prosthesis and so create sound. But Gular prefers the new hi-tech voice app. It was developed for patients set to lose their voice due to a laryngectomy, or removal of the larynx, a typical procedure for advanced stages of throat cancer.
The joint project of the University of West Bohemia in Pilsen, Prague’s Charles University and two private companies-CertiCon and SpeechTech-kicked off nearly two years ago. The technology uses recordings of a patient’s voice to create synthetic speech that can be played on their mobile phones, tablets or laptops via the app. Ideally, patients need to record more than 10,000 sentences to provide scientists with enough material to produce their synthetic voice.
“We edit together individual sounds of speech so we need a lot of sentences,” said Jindrich Matousek, an expert on text-to-speech synthesis, speech modeling and acoustics who heads the project at the Pilsen university.
A matter of weeks
But there are drawbacks: patients facing laryngectomies usually have little time or energy to do the recordings in the wake of a diagnosis that requires swift treatment. “It’s usually a matter of weeks,” said Barbora Repova, a doctor at the Motol University Hospital, working on the project for Charles University. “The patients also have to tackle issues like their economic situation, their lives are turned upside down, and the last thing they want to do is to make the recording,” she told AFP.
To address these difficulties, scientists came up with a more streamlined method for the app, which is supported by the Technology Agency of the Czech Republic. Working with fewer sentences-ideally 3,500 but as few as 300 — this method uses advanced statistical models such as artificial neural networks. “You use speech models with certain parameters to generate synthesized speech,” said Matousek.
“Having more data is still better, but you can achieve decent quality with less data of a given voice.” The sentences are carefully selected and individual sounds have to be recorded several times as they are pronounced differently next to different sounds or at the beginning and end of a word or sentence, he added. So far, the Pilsen university has recorded 10 to 15 patients, according to Matousek. Besides Czech, the Pilsen scientists have also created synthesized speech samples in English, Russian and Slovak.
Gular-an upholsterer who lost his job due to his handicap-managed to record 477 sentences over the three weeks between his diagnosis and the operation. But he was stressed and less than satisfied with the quality of his voice. “Throat cancer patients often suffer from some form of dysphonia (hoarseness) before the surgery, so in combination with a limited speech sample it makes the voice sound unnatural,” said Repova.
In a studio at the Pilsen university meanwhile, entrepreneur Jana Huttova is recording outlandish phrases. The 34-year-old mother of three faces the risk of losing her voice to minor throat surgery-an operation on her parathyroid gland. “The Chechens have always preferred a dagger-like Kalashnikov,” she says, reading from the text before her. “I have small kids and I want them to hear my own voice, not a robot,” Huttova said. Then she moved on to her next sentence: “We were attacked by a tyrannosaur’s baby dinosaurs.”
Connected to the brain
Matousek believes that in the future, patients will be able to use the app to record their voice at home using a specialized website to guide them through the process. And he hopes that one day it will go even further. “The ultimate vision is a miniature device connected to the brain, to the nerves linked to speech-then patients could control the device with their thoughts,” he said. This kind of advanced solution is a very long way off, said Repova.
“But look at cochlear implants — 40 years ago when they started, we had no idea how it would develop, how widely they would end up being used,” she said, referring to the inner-ear implants used to tackle severe deafness. “A happy end would be a device implanted in the throat that could talk with the patient’s own voice,” she told AFP. “It’s realistic: it may not come in a year or even in 10 years, but it’s realistic and we’re on the way.”–AFP