Voice Imitation: A new kind of identity theft

Issues with password security and cyber identity theft have been the bane of most financial and retail institutions in the last decade. Places such as Target and JPMorgan Chase were hacked as a result of poor identification protection. As more people begin to share lots of content over the Internet and are less careful with their content's security, there are more opportunities for people with malicious intent to steal identities and invade privacy.

Information security researchers at the University of Alabama at Birmingham found that both speaker verification software and humans can easily be duped by a “morphed” voice, in which a hacker steals a victim’s voice and manipulates it through basic audio editing software. This is the latest in content abuse that takes one of the most valuable pieces of a person’s identity. The research was presented Sept. 25 at the 20th European Symposium on Research in Computer Security in Vienna, Austria and will also be published in the symposium’s annual research catalog.

“People should be aware of the vulnerability of systems that rely on automated speaker recognition,” said Maliheh Shirvanian, co-author of the report and a postdoctoral fellow of computer and information science at UAB. 

Shirvanian said people use their voices for almost everything, and anyone could record a short clip of an individual on the street, grab a snippet from a YouTube video or spam call the individual to get a short soundbite. In order to test how effectively voice morphing can trick people and machines, the research group created one machine-verification experiment and two human-verification experiments.

The machine-verification test had two verification systems that analyzed two datasets. Sample recordings were split into 127 male recordings and 53 female recordings. The recordings were also split into three groups: the original speaker recording, a different speaker, and a morphed recording of the original speaker. According to the researchers' findings, in the first dataset both systems believed 98 percent of the morphed recording for both males and females was the original speaker. In the second set, both systems on average analyzed approximately 80 percent of the morphed male voices were the original, and 60 percent of the morphed female voices were the original.

The two human-verification tests were called the “famous speaker study” and the “briefly familiar speaker study.” According to the group’s research, the first study had 65 online participants that had to decipher Oprah Winfrey and Morgan Freeman’s voice among the original, different and morphed recordings. Approximately 70 percent of people thought the edited audio clip was “somehow similar” or “very similar” to the original, so they would then opt to say that the clip was the original speaker.

In the second study, 32 online users had to do a similar test but with two unknown male and two unknown female speakers. On average, a little over 50 percent of the participants mistook the edited audio to be the original. These answers were more finite than the ones seen with the celebrities. Shirvanian said part of the reason the participants in the second “briefly familiar speaker study” had more concrete answers was mostly due to the lack of exposure to that voice and not knowing the nuances of the way the unknown person speaks.

The tool that was used for high-quality voice editing was Carnegie Mellon University’s Festvox voice imitation and manipulation software. Shirvanian said it was a crucial component to getting these results.

“I tried many of the audio editing and voice creator tools from many places,” she said. “First, it is not easy to create a particular voice by manually changing the characteristics of the audio. Second, even if the output sounds like the victim to a human-user, still the machine might find the characteristic different from the model and reject it. The CMU Festvox was easy to obtain and to use in all these respects.”

Aside from the methods of manipulating a voice to deceive people or machines, the study also considers the social ramifications of losing this kind of identity. “We show that the consequences of imitating one’s voice can be grave,” the researchers reported in the study. “Since voice is regarded as a unique characteristic of a person, it forms the basis of the authentication of the person.”

Though the results of the tests were interesting, they are also preliminary. More research is needed, and Shirvanian said more institutions need to get on board with voice-recognition security.

Navid Shokouhi, a Ph.D. student in philosophy and electrical engineering at the University of Texas at Dallas Center for Robust Speech Systems, is aware of exactly what Shirvanian and her research team are concerned about. 

“Students and staff at CRSS work on a variety of topics related to speech and language technology,” Shokouhi said, who did not contribute to Shirvanian's work. “Speaker recognition is one of the subjects we've worked on, and I personally have been involved in some problems that involve speaker recognition.”

One major problem in speech recognition is that the first time a person records themselves so the machine knows his or her voice, may not always match up again if the person speaks to the machine later. Shokouhi said most of the work conducted at CRSS is for better systems to understand all of the components of speech, or “to solve this mismatch problem” between a person and machine after that initial setup of voice recognition.

Because of the work that CRSS and other institutions conduct, Shokouhi said that it makes the work of Shirvanian’s teams that much harder but all the more necessary.

“Recently a few groups have put together an organized ‘spoofing’ [or voice imitation] challenge that looks at the various ways speaker verification systems can breakdown,” Shokouhi said. “Their counter approach has been to detect spoofing attempts before feeding the recordings into the speaker verification system.”

The research taking place at UAB, and the few other places looking into voice security and “anti-spoofing” methods, needs to increase, Shirvanian said. Organizations are already vulnerable to other breaches in security, and this is the next area that has that potential. “Though it seems there is no straight-forward solution to this problem,” she said.