Comment

RightHandOfIkaros@lemmy.world ⁨2⁩ ⁨years⁩ ago

I think AI voicing could really be useful in dubbing the same character voice in other languages. So the character “sounds the same” no matter what language you view the anime in. This is good for creators that are particular about picking the cast of voices in their works.

Something else I think would be smart is for voice actors to pivot to selling “AI Voice Packs” that are tied directly to a character. Basically, the company can use the AI voice for a specific character, forever. And the actor receives a payment and potentially royalties or something. Each actor can train new data for a new “voice model” which they can market to anime studios. This can be useful for many scenarios, including but not limited to:

The VA cannot make it to recording sessions. If for some reason the VA cannot be in the recording studio, an AI voice model would be helpful.
The VA dies. This is especially a problem if it happens in the middle of production. An AI voice model never gets sick and never dies. Characters would not need to be retired or recast in the case of tragedy, and the VAs legacy would continue to live on.
The VA gets old or their voice changes. Similar to point 2, as people age, their voice changes. Some older actors cannot portray characters as they used to, so even VAs reprising roles can end up sounding different. An AI voice model would not have this problem.

Perhaps the biggest problem with AI voice models currently is the inability to properly tune exactly how it sounds. You can’t really specify the exact enunciation and emotion you want in a sentence, you can’t really give the AI the timing it needs to say certain phrases while keeping a natural sound. So maybe once these issues are ironed out completely, and artifacting is reduced to an indiscernible level, then using AI voices in place of live actors may become viable.

The biggest issue is getting past all the politicians that will try to shut it down. They don’t care about anyone’s job, they only care about their own political career. And an AI voice clip of them saying something nefarious that is indiscernible from an actual recording is going to get the technology banned forever, not because 50k people lost their jobs.

source

Sort:hotnew top

ryathal@sh.itjust.works ⁨2⁩ ⁨years⁩ ago
The law is probably the easy part, being able to properly add emotion and emphasis correctly is so far out of the realm of possibility at the moment. It’s not something you can easily train a model on, because you need additional context beyond the words. May the force be with you, is a good example as it has about a dozen different meanings and emotions depending on the specific usage, but current models understand none of that

source
- RightHandOfIkaros@lemmy.world ⁨2⁩ ⁨years⁩ ago
  i think that at the current rate of development, we will probably begin to see tools where a voice recording input is given, and the voice model is used as a sort of overlay to generate a new clip where the voice model follows the same delivery as the input recording. This means that a lead sound designer will be able to say the voice lines exactly like they hear in their own head, and the voice model will sound the same. This doesn’t sound like something that is too far away, and there can be textual based tools that follow this kind of development to mimic that. xVAsynth has a lot of parameters that cna be tweaked to tune these types of things already, but it is a free tool developed by I think one person. Its not a commercial tool developed by a company out to make money, so I would imagine the quality of commercial voice tools would be much higher.
  
  I think within perhaps the next 10 or 20 years the technology will either far surpass what I expect, or will be shut down by politicians.
  
  source
  - ryathal@sh.itjust.works ⁨2⁩ ⁨years⁩ ago
    Being able to copy an input is likely possible or will be soon, bur you still need a voice actor to provide input.
    
    source
  - Acamon@lemmy.world ⁨2⁩ ⁨years⁩ ago
    I think you’re right that (at least for good quality voice acting) we’ll need an input source, but then it can be adapted to sound like the desired voice. Which will be great for keeping characters sounding the same, or having one person ‘voicing’ a whole team of characters.
    
    But I think good voice acting is hard, and a lot of stuff is very subtle, so I don’t think it’ll be as easy as “the sound designer” records all the voices, unless they are also a good actor. If i read out a Shakespeare monologue and then use AI to make it sound like Patrick Stewart or Christian Bale, that won’t make it sound very good. Because I won’t emote and pace the text in the way that they would.
    
    But for simpler stuff (narrating a nonfiction book) or stuff where the quality doesn’t matter that much (lots of cheaper voice acting on shows and games doesn’t seem like it would be hard to replace with an AI) the tech will be amazing.
    
    But we’re so tuned into human communication and voice, that I think it a lot of it will be passable but underwhelmingly mediocre for a long time. Even Carrie Fishers lines in the last Star Wars movie sounded flat and fake, even though they were actually delivered by her, because they were used out of context, so the timing and emphasis and pace all sounded off.
    
    source