Comment on [deleted]
ryathal@sh.itjust.works 1 year agoThe law is probably the easy part, being able to properly add emotion and emphasis correctly is so far out of the realm of possibility at the moment. It’s not something you can easily train a model on, because you need additional context beyond the words. May the force be with you, is a good example as it has about a dozen different meanings and emotions depending on the specific usage, but current models understand none of that
RightHandOfIkaros@lemmy.world 1 year ago
i think that at the current rate of development, we will probably begin to see tools where a voice recording input is given, and the voice model is used as a sort of overlay to generate a new clip where the voice model follows the same delivery as the input recording. This means that a lead sound designer will be able to say the voice lines exactly like they hear in their own head, and the voice model will sound the same. This doesn’t sound like something that is too far away, and there can be textual based tools that follow this kind of development to mimic that. xVAsynth has a lot of parameters that cna be tweaked to tune these types of things already, but it is a free tool developed by I think one person. Its not a commercial tool developed by a company out to make money, so I would imagine the quality of commercial voice tools would be much higher.
I think within perhaps the next 10 or 20 years the technology will either far surpass what I expect, or will be shut down by politicians.
ryathal@sh.itjust.works 1 year ago
Being able to copy an input is likely possible or will be soon, bur you still need a voice actor to provide input.
Acamon@lemmy.world 1 year ago
I think you’re right that (at least for good quality voice acting) we’ll need an input source, but then it can be adapted to sound like the desired voice. Which will be great for keeping characters sounding the same, or having one person ‘voicing’ a whole team of characters.
But I think good voice acting is hard, and a lot of stuff is very subtle, so I don’t think it’ll be as easy as “the sound designer” records all the voices, unless they are also a good actor. If i read out a Shakespeare monologue and then use AI to make it sound like Patrick Stewart or Christian Bale, that won’t make it sound very good. Because I won’t emote and pace the text in the way that they would.
But for simpler stuff (narrating a nonfiction book) or stuff where the quality doesn’t matter that much (lots of cheaper voice acting on shows and games doesn’t seem like it would be hard to replace with an AI) the tech will be amazing.
But we’re so tuned into human communication and voice, that I think it a lot of it will be passable but underwhelmingly mediocre for a long time. Even Carrie Fishers lines in the last Star Wars movie sounded flat and fake, even though they were actually delivered by her, because they were used out of context, so the timing and emphasis and pace all sounded off.