I’d say just another 1-2 years is when the quality will be high enough to be basically indistinguishable from real humans. GANs were first introduced in 2014, and since then we’ve gone from tiny black and white images of hand-drawn digits to being able to generate HD images of practically anything.
I’m not entirely sure when the research on AI-based TTS started, but I know it’s had a lot less attention and interest than image generation. Still, there have been a lot of improvements and with the recent AI boom more people are interested on the topic, and there’s certainly plenty of money to be made with this technology as demonstrated by ElevenLabs itself.
While AI TTS is not quite at its peak yet, it’s already good enough to fool some people as we’ve seen from the fake Mr. Beast and Joe Rogan audios, and as many people have said, this is the worst that this technology is going to be, and it’ll only become more realistic from here.
small44@lemmy.world 1 year ago
Dub sucks and AI won’t make it any better. Both like emotions in the voice
max@feddit.nl 1 year ago
I tend to agree with you. Something that subtitles even miss sometimes are the subtle jokes or nuances in the source language. Human dubs often miss those, and I doubt AI dubs will be any better, at least for the foreseeable future.