Voice cloning

Voice cloning is a deepfake deception where a cybercriminal uses AI to replicate, with high accuracy, the voice of someone the victim knows.

9 min read

Back to Glossary

What is voice cloning?

Imagine that you answer a call from a familiar colleague or family member urgently asking for personal information or prompting a financial transaction. Everything appears real because the voice is unmistakably theirs. But what if it’s not? What if the person on the other end of the line is a complete stranger, one who is trying to deceive you? This is voice cloning – the sophisticated process of creating a digital replica of someone’s voice through artificial intelligence.

And with the latest advancements in AI, voice cloning has become a powerful tool used for both innovative and malicious purposes. Used responsibly, it gives us access to creative and accessible solutions like customized digital assistants, voiceovers in different languages, and restoring speech for people with medical conditions. However, when misused, it can become a means for fraud and misinformation. This ability to quickly and effectively create realistic voices gives way to an era where hackers are leveraging social engineering tactics more than ever.

A recent study by McAfee showed that one in four people has experienced or knows someone who has experienced a voice cloning attack, with 77% losing money as a result. But the attackers don’t simply resort to voice cloning for pocket money. They have gone as far as using a young girl’s voice for a kidnapping scam, urging her mother to pay a $1 million ransom. While the mother quickly managed to call her daughter and confirm it was a fake voice calling her, the horrifying four minutes she experienced left a lasting mark.

How cybercriminals use generative AI for voice cloning scams

Deepfake technology is not new, but its widespread – and effective – application to voice cloning has evolved tremendously in the past few years, adding to the vast arsenal of cybercriminals. AI-driven deep learning models have made voice clones sound more real than ever, and new neural network-powered tools like Google’s Tacotron and WaveNet and Lyrebird allow users to replicate any voice and use it to “read” text input. These models do not only imitate but replicate the subtleties, intonations, and distinctive features of an individual’s voice with astonishing accuracy, often requiring only a brief sample audio – although the longer the sample, the more accurate the voice clone.

This technology has not gone unnoticed by hackers, who see it as a powerful aid in their deception tactics. Voice cloning allows them to craft more convincing threats by combining this technology with other tactics in the same attack – which is called “multi-channel attacks.” For example, they might call victims, giving them a heads-up about an email to eliminate their suspicion when the actual phishing email arrives. This boosts criminals’ success rates by making their victims trust them.

The potential of voice cloning technology for cyberattacks became evident when, in 2023, a journalist successfully accessed her bank account using a recording of her own cloned voice. Although the journalist’s experiment posed no personal risk, it served as an example of the potential consequences of the misuse of this technology.

However, voice cloning is no longer a potential threat but a very real one. Cybercriminals are already leveraging this technology to capitalize on human emotions and trust in cases like the fake kidnapping example mentioned above. It’s so common that police departments are warning about the rapid increase of these threats. But these attacks go beyond the personal sphere and are already affecting businesses worldwide. In a sophisticated scam in Hong Kong, a finance employee at a multinational corporation was deceived into transferring $25 million to hackers. These fraudsters leveraged deepfake technology to impersonate the company’s CFO and other employees during a fake video conference. Even if the employee was suspicious of their email, he put aside his early doubts after the video call because other attendees had looked and sounded just like colleagues he recognised.

Voice cloning is also a powerful tool for social manipulation and disinformation. In January 2024, a voice message allegedly coming from President Joe Biden urged voters not to vote in the New Hampshire Presidential Primary Election. The perpetrators relied on AI to generate this message, and while it was quickly reported, it could have very easily turned into a social manipulation tool with the potential to change the election results. In a year where there will be elections in 77 countries, representing around half the world’s population and almost 60% of the global gross domestic product, it’s crucial to recognise the potential of voice cloning technology as a significant threat to the integrity of global democratic processes and find ways to solve this problem – before it’s too late.

A world map showing regions highlighted to indicate locations with elections occurring in 2024.

Voice Cloning-as-a-Service: A new cybercrime commodity

As we have already seen, the availability of voice cloning technology – including open-source apps – has opened new possibilities for cybercriminals, who are always looking for new ways to capitalize on new methods for financial gain. They are now using platforms like ElevenLabs to enhance their cybercrime tactics, but this is just the beginning.

They have also seen the opportunity to create a very efficient and advanced business model known as voice cloning-as-a-service (VCaaS). This model consists of cybercriminals offering subscription-based or fee-based voice cloning services on the dark web to any user who wants to perform an impersonation attack. It enables anyone interested in impersonation attacks to do so without any technical skills, significantly lowering the entry threshold to become a cybercriminal.

This synergy between AI advancements and the professionalization of cybercrime is forcing security experts to redefine and adapt their security strategies to stay ahead of emerging threats. If you want to know more about how hackers are professionalizing and finding new ways to turn cyber threats into profitable business models, read the latest Cybercrime Trends Report 2024.

Cybercrime
Trends
2024

Read the report

Discover the top 8 cybercrime trends for 2024, expert insights on the current threat landscape, and best security practices.

Can voice cloning be detected?

With the rise of voice cloning sparking concerns about its use by cybercriminals in attacks, a critical question surfaces: Can voice cloning be detected? While it can be a challenge, several methods and techniques attempt to address the issue:

Spectral analysis: By carefully analyzing the spectral properties of audio, experts can pinpoint irregularities not commonly present in authentic human speech. This analytical approach examines the unique patterns within the frequency spectrum to reveal potential anomalies in the voice file.
Machine learning models: Sophisticated machine learning algorithms are trained to distinguish between genuine human voices and synthetic or cloned voices based on a range of acoustic features.
Temporal features analysis: An in-depth examination of temporal aspects, such as speech timing, rhythm, and intonation, provides insights into subtle inconsistencies that are challenging for voice cloning algorithms to replicate accurately. This method relies on the nuances of the passage of time within spoken language.
Artifacts detection: Voice cloning often leaves digital artifacts or imperfections in the audio signal. Detecting these anomalies serves as an effective means to identify instances of artificial voice generation, enhancing the reliability of authentication.
Biometric voice recognition: Biometric voice recognition systems enhance the ability to identify specific characteristics unique to an individual’s voice. These systems are adept at detecting alterations or synthetic modifications of voice recordings.
Human auditory perception: Trained listeners have a heightened sensitivity to subtle nuances, allowing them to notice discrepancies between cloned and authentic voices, particularly when familiar with the original voice.
Comparative analysis: By comparing a suspected cloned voice with a known genuine recording of the same individual, analysts can notice discrepancies in voice qualities. This method relies on the examination of distinct features within voice patterns.

It’s important to note that however practical these methods may be, not everyone has the time or resources to implement them when the time comes. Imagine a loved one calls, and you’re nearly certain it’s their voice asking you for help. Would you wait for a professional to conduct an in-depth analysis of the voice’s authenticity? Would you wait to confirm it’s actually your loved one? Most people will answer “no,” which is why voice cloning attacks are so successful. They prey on people’s emotions, prompting swift action.

Companies are not immune to the threats of voice cloning either. Impersonation of executives or employees using cloned voices poses serious risks, potentially leading to unauthorized transactions or the disclosure of sensitive information. As these attacks target human emotions and prompt swift action, organisations must proactively adopt adequate countermeasures. Investing in strong authentication methods, employee education, and technological solutions is imperative to mitigate the risks associated with voice cloning in the corporate landscape.

How to protect against voice cloning

As technology’s capabilities advance, staying vigilant and keeping up with the latest threat trends is essential. Individuals and organisations should take a strategic approach to protect themselves from voice cloning, employing the following tactics:

List of recommendations on how to best protect yourself against voice cloning.

Implement multi-factor authentication (MFA): Prioritize MFA for an extra layer of security. Voice instructions should be corroborated by another form of verification, preferably fingerprints, as they are one of the hardest biometrics to duplicate and are widely used on everyday devices. This approach significantly raises the bar for scammers attempting to access sensitive information.
Awareness and training: Regularly conduct employee training to update yourself and your team on the latest cyber threats. Staying informed is a proactive defence.
Establish protocols: Businesses should define clear protocols for financial transactions and sensitive data sharing. No instruction, even from a familiar voice, should bypass these protocols.
Verify independently: When faced with a suspicious phone request, even from a familiar voice, hang up and directly contact the person who supposedly called you. Some experts suggest creating a codeword and sharing it with your loved ones. This way, if they ever receive a suspicious call allegedly coming from you, they can ask the caller for the codeword and figure out if it’s someone trying to impersonate you or if it’s you.
Encrypt data: Data encryption does not directly protect you from voice cloning scams. However, having an additional layer of protection by employing strong encryption protocols for data both in transit and at rest is always a good idea, especially if you or your employees fall for a scam like voice cloning.
Limit the information you share: Mitigate risks by limiting public access to information that could facilitate voice cloning. Avoid publishing phone numbers and email addresses online, reducing the likelihood of scammers creating voice clones based on publicly available data.
Stay up to date: Just as AI clones voices, it can also detect anomalies. Companies are actively developing AI systems to spot cloned voices, so keep an eye on emerging technologies that detect voice cloning.

How SoSafe can help you protect against voice cloning

Artificial intelligence is moving forward quickly, and hackers are keeping pace. Voice cloning is a key example of how cybercriminals can misuse the latest technology to create more convincing threats and increase the efficiency of their attacks.

In this context, knowing the strategies used by cybercriminals to manipulate our emotions and training your team to detect these threats is critical. But navigating such a complex threat landscape requires not only training but also a holistic human risk management approach.

SoSafe’s human risk management solution raises awareness beyond mere compliance. It offers up-to-date dynamic training modules with valuable insights and detection tips to avert cyber threats, but it also allows you to quantify and minimize the overall human risk in your organisation. With a holistic approach, we aim to cultivate a secure work environment where you and your employees integrate secure behaviours into your daily routines.