Google is developing a new deep learning system that can pick out a single voice from a crowd of people — here is how the technology works.
TechTimes reports that Google is reportedly working on a new deep learning system that will be capable of singling out one person’s voice amongst a crowd of people. The system does this by analyzing users faces when they’re talking. Researchers first trained the system to recognize the voice of a single individual voice talking, this gave the system a base noise to focus on. They then added virtual noises mimicking a crowd, all playing at the same time, to teach the system to separate multiple audio tracks into different parts so it could learn to differentiate between each sound.
In a video posted to YouTube, the deep learning system can be seen analyzing the speech of two comedians and differentiating between the two, even when their voices overlap:
A research paper titled “Looking to Listen at the Cocktail Party,” details Google’s research and is named after the “cocktail party effect” in which people are able to focus on and isolate one person’s voice at a crowded party despite the number of other noisy distractions around them. The researchers behind the project wrote in a blog post; “Our method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear, or to have such a person be selected algorithmically based on context.”
While this technology may have a multitude of uses for Google’s products such as improved voice detection of their smart-speakers or better voice recognition of users during Google Hangout chats, the security implications of this technology are also quite worrying. Imagine if this A.I. could be used to pick out a single voice speaking in a crowded street or at a loud meeting, many would be worried about this invasion of privacy. For the moment though, the project is still in development, how it will be used is yet to be seen.