OpenAI’s new voice synthesizer will copy your voice from just 15 seconds of audio. This new development just made the generative AI contest stiffer.
In the past year, OpenAI has been working steadily and rapidly to build its ChatGPT generative AI chatbot and Sora AI video generator. Voice Generation, which can create synthetic voices from just 15 seconds of audio has been added to its earlier accomplishments, this boosts the tech giants’ artificial intelligence tools bank to help them stay ahead in the competition.
OpenAI claims to have been operating “a small-scale preview” of Voice Engine in a blog post (via The Verge). Voice Engine has been under development since late 2022. It’s already in use in the ChatGPT app’s Read Aloud function, which does exactly what its name implies—reads responses aloud to you.
Also Read: Lei Jun Teased Xiaomi SU7 Price Ahead of Official Launch
You can teach the voice to read whatever text you want in an “emotive and realistic” manner by listening to a 15-second sample. According to OpenAI, it might be applied to serve non-verbal individuals, reach rural communities, translate podcasts into new languages, and improve education.
While it’s not currently available to everyone, you may go check out Voice Engine’s samples. Although there is a certain robotic and stiff edge to the videos that OpenAI has released, they still sound very impressive.
The primary cause of Voice Engine’s current limited preview is concerns over misuse: According to OpenAI, additional research is needed to determine how to prevent the dissemination of false information and unauthorized voice copying using technologies similar to these.
“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” says OpenAI. “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
This year’s significant elections in the US and the UK are concerning, and as generative AI tools continue to improve, it’s becoming harder to determine which AI content—text, video, and audio—is reliable.
Despite the excitement that comes with this news, we are not ignorant about the fact that this could lead to issues with voice authentication systems and phone frauds where you don’t know who’s calling you or who’s left you a message, as OpenAI and this has also been noted by OpenAI too. Although these are difficult problems to resolve, we must find solutions.