[deleted]

⁨21⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨PumpkinDrama@reddthat.com⁩ to ⁨[deleted]⁩

[deleted]

source

Comments

Sort:hotnew top

small44@lemmy.world ⁨1⁩ ⁨year⁩ ago
Dub sucks and AI won’t make it any better. Both like emotions in the voice

source
- max@feddit.nl ⁨1⁩ ⁨year⁩ ago
  I tend to agree with you. Something that subtitles even miss sometimes are the subtle jokes or nuances in the source language. Human dubs often miss those, and I doubt AI dubs will be any better, at least for the foreseeable future.
  
  source
Ragdoll_X@lemmy.world ⁨1⁩ ⁨year⁩ ago
I’d say just another 1-2 years is when the quality will be high enough to be basically indistinguishable from real humans. GANs were first introduced in 2014, and since then we’ve gone from tiny black and white images of hand-drawn digits to being able to generate HD images of practically anything.

I’m not entirely sure when the research on AI-based TTS started, but I know it’s had a lot less attention and interest than image generation. Still, there have been a lot of improvements and with the recent AI boom more people are interested on the topic, and there’s certainly plenty of money to be made with this technology as demonstrated by ElevenLabs itself.

While AI TTS is not quite at its peak yet, it’s already good enough to fool some people as we’ve seen from the fake Mr. Beast and Joe Rogan audios, and as many people have said, this is the worst that this technology is going to be, and it’ll only become more realistic from here.

source
RightHandOfIkaros@lemmy.world ⁨1⁩ ⁨year⁩ ago
I think AI voicing could really be useful in dubbing the same character voice in other languages. So the character “sounds the same” no matter what language you view the anime in. This is good for creators that are particular about picking the cast of voices in their works.

Something else I think would be smart is for voice actors to pivot to selling “AI Voice Packs” that are tied directly to a character. Basically, the company can use the AI voice for a specific character, forever. And the actor receives a payment and potentially royalties or something. Each actor can train new data for a new “voice model” which they can market to anime studios. This can be useful for many scenarios, including but not limited to:

The VA cannot make it to recording sessions. If for some reason the VA cannot be in the recording studio, an AI voice model would be helpful.

The VA dies. This is especially a problem if it happens in the middle of production. An AI voice model never gets sick and never dies. Characters would not need to be retired or recast in the case of tragedy, and the VAs legacy would continue to live on.

The VA gets old or their voice changes. Similar to point 2, as people age, their voice changes. Some older actors cannot portray characters as they used to, so even VAs reprising roles can end up sounding different. An AI voice model would not have this problem.

Perhaps the biggest problem with AI voice models currently is the inability to properly tune exactly how it sounds. You can’t really specify the exact enunciation and emotion you want in a sentence, you can’t really give the AI the timing it needs to say certain phrases while keeping a natural sound. So maybe once these issues are ironed out completely, and artifacting is reduced to an indiscernible level, then using AI voices in place of live actors may become viable.

The biggest issue is getting past all the politicians that will try to shut it down. They don’t care about anyone’s job, they only care about their own political career. And an AI voice clip of them saying something nefarious that is indiscernible from an actual recording is going to get the technology banned forever, not because 50k people lost their jobs.
source
- ryathal@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
  The law is probably the easy part, being able to properly add emotion and emphasis correctly is so far out of the realm of possibility at the moment. It’s not something you can easily train a model on, because you need additional context beyond the words. May the force be with you, is a good example as it has about a dozen different meanings and emotions depending on the specific usage, but current models understand none of that
  
  source
  - RightHandOfIkaros@lemmy.world ⁨1⁩ ⁨year⁩ ago
    i think that at the current rate of development, we will probably begin to see tools where a voice recording input is given, and the voice model is used as a sort of overlay to generate a new clip where the voice model follows the same delivery as the input recording. This means that a lead sound designer will be able to say the voice lines exactly like they hear in their own head, and the voice model will sound the same. This doesn’t sound like something that is too far away, and there can be textual based tools that follow this kind of development to mimic that. xVAsynth has a lot of parameters that cna be tweaked to tune these types of things already, but it is a free tool developed by I think one person. Its not a commercial tool developed by a company out to make money, so I would imagine the quality of commercial voice tools would be much higher.
    
    I think within perhaps the next 10 or 20 years the technology will either far surpass what I expect, or will be shut down by politicians.
    
    source
    -> View More Comments