A newly developed Artifical Intelligence has the ability to mimic any human voice by analyzing a person’s speech patterns.
Scientific American reports that Montreal-based startup Lyrebird has developed a new type of A.I. that analyzes a person’s speech patterns and corresponding text transcripts in order to mimic their voice almost perfectly. Lyrebirds speech synthesis function works faster than any existing speech recognition methods, generating thousands of sentences per second, but already ethical questions are arising relating to the A.I.’s ability to copy any human’s voice.
Personal assistants such as Siri and Microsoft’s Cortana can generate neutral sounding speech using pre-recorded audio files containing all of the spoken words that the device may need to communicate with the user. Lyrebird’s system, however, can learn the pronunciations of characters, phonemes, and words in a voice by analyzing hours of spoken audio. It then uses this data to create entirely new sentences, even adding new intonations and emotions to the sentence.
Lyrebird’s A.I. uses artificial neural networks to access deep learning techniques that allow the A.I. to transform sections of sound into actual speech. Neural networks record data and learn the patterns of that data, strengthening connections between their layered artificial neuron units. Once the system has learned how to generate speech, it only needs to listen to a one-minute sample of someone’s speech to mimic them.
Lyrebird co-founder Alexandre de Brébisson, a Ph.D. student at the Montreal Institute for Learning Algorithms Laboratory at the University of Montreal, said, “Different voices share a lot of information. After having learned several speakers’ voices, learning a whole new speaker’s voice is much faster. That’s why we don’t need so much data to learn a completely new voice. More data will still definitely help, yet one minute is enough to capture a lot of the voice ‘DNA.’”
Lyrebird plans to sell the AI to companies wishing to use the technology for a number of applications such as personal AI assistants, audio book narration and speech synthesis for people with disabilities. Google-owned company DeepMind released a similar product last year called WaveNet, but de Brébisson believes that Lyrebird’s AI is much more advanced.
“Lyrebird is significantly faster than WaveNet at generation time,” he says. “We can generate thousands of sentences in one second, which is crucial for real-time applications. Lyrebird also adds the possibility of copying a voice very fast and is language-agnostic.”