Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨remington@beehaw.org⁩ to ⁨technology@beehaw.org⁩

https://arstechnica.com/information-technology/2024/04/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track/

source

Comments

Sort:hotnew top

casmael@lemm.ee ⁨1⁩ ⁨year⁩ ago
Why would you develop this technology I simply don’t understand. All involved should be sent to jail. What the fuck.

source
- Even_Adder@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
  They worded the headline that way to scare you into that reaction. They’re only interested in telling you about the negative uses because that drives engagement.
  
  source
  - BolexForSoup@kbin.social ⁨1⁩ ⁨year⁩ ago
    I understand some of you AI evangelist types look down on us Luddites who dare to ask questions, but you seriously can’t see any potential issue with this technology without some sort of restrictions in place?
    
    source
    -> View More Comments
  - CanadaPlus@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
    Honestly that’s a good rule of thumb for all headlines at this point.
    
    source
  - casmael@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Good point good point
    
    source
- some_guy@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
  They mentioned one potential use that I thought has value and that I hadn’t considered. For video conferencing, this could transmit data without sending video and greatly reduce the amount of bandwidth needed by rendering people’s faces locally. I don’t think that outweighs the massive harms this technology will unleash. But at least there was some use that would be legit and beneficial.
  
  I’m someone who has a moral compass and I don’t like that scammers will abuse this shit so I hate it. But there’s no keeping it locked away. It’s here to stay. I hate the future / now.
  
  source
  - flora_explora@beehaw.org ⁨1⁩ ⁨year⁩ ago
    Wouldn’t you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don’t really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.
    
    source
    -> View More Comments
  - Lemjukes@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Also I would argue sending the actual video of what is happening in front of the camera is kind of the entire point of having a video call. I don’t see any utility in having a simulated face to face interaction where neither of you is even looking at an actual image of the other person.
    
    source
- henfredemars@infosec.pub ⁨1⁩ ⁨year⁩ ago
  You can’t simply not develop a technology. Progress is going to move forward. If they don’t do it, somebody else is going to figure out how. The tools are out there. The math works. Better researchers to do it now and scare us into finding solutions than criminals to develop it first.
  
  source
- notfromhere@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  Other than the obvious malicious uses of this technology, it could be great for multimedia, great for creative control for cast, great for virtual meetings to always look “your best” (as determined by each individual, e.g. clean-cut pristine, and/or preferred gender, and/or favorite anime, etc.). There are also use cases to hear letters spoken by a lost loved one, or replace the Three Stooges with politicians. Tons of “safe” use cases that I am looking forward to.
  
  source
  - floofloof@lemmy.ca ⁨1⁩ ⁨year⁩ ago
    I’m not convinced any of these uses are actually beneficial.
    
    source
    -> View More Comments
  - henfredemars@infosec.pub ⁨1⁩ ⁨year⁩ ago
    This is a really positive take. I would love to create such an AI of myself in my likeness so that if one day I come to pass before my wife, she could enjoy having that comfort. I imagine it speaking like: while I’m not your husband, here’s what I think he would’ve said.
    
    Deep faking myself so I don’t have to use my camera in meetings? I would pay for that feature.
    
    source
- ultratiem@lemmy.ca ⁨1⁩ ⁨year⁩ ago
  Because bags of money. And MS is a hyper toxic entity that’s been siphoning the data of every Windows user for decades now. That company is basically IBM during WW2.
  
  source
- BraveSirZaphod@kbin.social ⁨1⁩ ⁨year⁩ ago
  If something is possible, and this simply indeed is, someone is going to develop it regardless of how we feel about it, so it's important for non-malicious actors to make people aware of the potential negative impacts so we can start to develop ways to handle them before actively malicious actors start deploying it.
  
  Critical businesses and governments need to know that identity verification via video and voice is much less trustworthy than it used to be, and so if you're currently doing that, you need to mitigate these risks. There are tools, namely public-private key cryptography, that can be used to verify identity in a much tighter way, and we're probably going to need to start implementing them in more places.
  
  source
- PM_ME_VINTAGE_30S@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
  Would be great for me and others who have trouble with body language. I could deepfake a version of myself with neurotypical body language and offload the effort of “acting normal” to the AI for interviews and video calls. Genuinely I’m super pumped for this.
  
  source
  - BolexForSoup@kbin.social ⁨1⁩ ⁨year⁩ ago
    Now that is interesting, I've never heard this consideration before.
    
    source
- CanadaPlus@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
  They’re also releasing a detector, for what it’s worth.
  
  Yeah, this one seems like it will have more negative applications than positive. Usually you’ll have a lot more content from someone you want to copy for non-deceptive reasons. It’s inevitable all video will be easily fake-able one day soon, but why hasten it?
  
  source
halm@leminal.space ⁨1⁩ ⁨year⁩ ago
What can possibly go wrong?

source
luciole@beehaw.org ⁨1⁩ ⁨year⁩ ago
The actual research page is so awkward. The TLDR at the top goes:

single portrait photo + speech audio = hyper-realistic talking face video

Then a little lower comes the big red warning:

We are exploring visual affective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.

No siree! Big “not what it looks like” vibes.

source
perishthethought@lemm.ee ⁨1⁩ ⁨year⁩ ago
Someone help me out please. Who was the 90s sci-fi author who predicted actors would go away and all movies would be made using cgi /ai? She had characters in the book, watching movies starring Humphrey Bogart and John Wayne, as detectives solving crimes (and so on). She also predicted “ractors”, people who act in front of a camera, so a computer can use their motion and expressions to animate a character on screen in real time.

My feeble brain, I swear… In any case, thanks to her, knew this day was coming. Gonna be a wild ride though.

source
- notfromhere@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  According to Le Chat,
  
  The author you’re thinking of is Neal Stephenson, and the book is “Snow Crash” published in 1992. In the book, he coined the term “ractors” for actors who perform in front of motion-capture cameras to create lifelike animations. He also predicted the use of CGI and AI in filmmaking to create movies with long-dead actors.
  
  source
  - halm@leminal.space ⁨1⁩ ⁨year⁩ ago
    It just cited the wrong Neal Stephenson book, so not way off.
    
    source
    -> View More Comments
  - 14th_cylon@lemm.ee ⁨1⁩ ⁨year⁩ ago
    archive.is/ZTU90
    
    The Atlantic | Neal Stephenson’s Most Stunning Prediction
    
    The sci-fi legend coined the term metaverse. But he was most prescient about our AI age. By Matteo Wong
    
    source
    -> View More Comments
  - perishthethought@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Oh snap, thanks - I was mixing up The Diamond Age with another book, yes. Ractors are from Stephenson, but I also had another author’s books in my head. See? Feeble mind. There’s still another woman author I need to track down and re-read here.
    
    source
    -> View More Comments
  - GammaGames@beehaw.org ⁨1⁩ ⁨year⁩ ago
    I asked Perplexity with “What is the scifi book from the 90s that had “ractors,” where a person would act in front of a camera and a computer would animate a CG model?”and got (what other commenters are saying) is the correct answer:
    
    The science fiction book from the 90s that featured “ractors,” where a person would act in front of a camera and a computer would animate a CG model, is not directly mentioned in the provided search results. However, based on the description of “ractors” and the context of computer animation and CG models, it seems you might be referring to “The Diamond Age” by Neal Stephenson, published in 1995. In this novel, the term “ractor” (short for “interactive actor”) is used to describe performers who participate in interactive theater through virtual reality environments, which could align with the concept of acting in front of a camera to animate a CG model. However, since this specific detail is not found in the search results, this answer is based on existing knowledge outside of the provided sources.
    
    Sources [1] [PDF] TEAM LinG - The Swiss Bay theswissbay.ch/…/Thinking_Animation_Bridging_Gap_… [2] CG Historical Timeline – Computer Graphics and Computer Animation …pressbooks.pub/…/cg-historical-timeline/ [3] Psychological Impact and Influence of Animation on Viewer’s Visual … www.ncbi.nlm.nih.gov/pmc/articles/PMC9453061/ [4] 3D film - Wikipedia en.wikipedia.org/wiki/3D_film [5] Sci-fi interfaces | Stop watching sci-fi. Start using it. scifiinterfaces.com [6] Looking for SciFi novels where people are turned / copied into AIs - Reddit reddit.com/…/looking_for_scifi_novels_where_peopl… [7] Reviews: Terminator 2: Judgment Day - IMDb www.imdb.com/title/tt0103064/reviews/ [8] Sci-fi room Challenge 2021 WIP - Personal Space Ration - CG Cookie cgcookie.com/…/11796-sci-fi-room-challenge-2021-w…
    
    source
some_guy@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
The eyes still have uncanny valley vibes, but that’s because I’m looking for it. If I wasn’t watching demo videos about generated video, I might not have noticed.

source
- davehtaylor@beehaw.org ⁨1⁩ ⁨year⁩ ago
  And that’s the problem. The average person isn’t looking for it, and will absolutely not see it. As long as it’s good enough, that’s all that matters. A plausible enough video of Joe Biden talking about rounding up Christians into internment camps that gets shared on Facebook, or something like that which panders to right-wing bigotry, is enough to get people going. Even real images and videos that are misconception are enough, and even when a link is there that disproves the caption.
  
  People seriously underestimate just how horrifying the possibilities are with this shit. And as high stakes as this election cycle is, and the state of politics in this country, the tendency for people to latch on to anything that affirms their preexisting ideals creates a fucking minefield
  
  source
  - Powderhorn@beehaw.org ⁨1⁩ ⁨year⁩ ago
    This is an education problem as much as – if not moreso than – a tech problem. Before the GOP gutted critical thinking wherever they held a majority and two generations were able to grow up under those circumstances, a video of any current president rounding up Christians would have been roundly rejected as either satirical or disinformation by the vast majority of the population, owing to the absurdity of the idea.
    
    Once we got to the point of a not-insignificant minority of the population believing that the true power in the United States lies in the basement of a pizza shop with no basement …
    
    source
thingsiplay@beehaw.org ⁨1⁩ ⁨year⁩ ago

Trained on YouTube clips

It could have been worse. Imagine trained by Tik Tok clips.

source
p03locke@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
Sigh, not this article again. No, they can’t “deepfake a person with one photo”. They can create a bad uncanny-valley 75% accurate version of one.

source
- thingsiplay@beehaw.org ⁨1⁩ ⁨year⁩ ago
  
  a bad uncanny-valley 75% accurate version of one
  
  Actually a perfect description of what a deepfake is.
  
  source
  - DdCno1@beehaw.org ⁨1⁩ ⁨year⁩ ago
    I’ve seen far more convincing deepfakes, to the point I couldn’t tell until I was told. I’ve experimented with this myself. After a bit of trial and error, almost anyone can easily create shockingly convincing deepfakes. One interesting method is using 3D rendered characters with deepfake faces.
    
    source
esaru@beehaw.org ⁨1⁩ ⁨year⁩ ago
I think this has an effect most people don’t think of: Media will just lose it’s value as a trusted source for information. We’ll just lose the ability of broadcasting media as anything could be faked. Humanity is back to “word of mouth”, I guess.

source
- westyvw@lemm.ee ⁨1⁩ ⁨year⁩ ago
  This milestone was reached a long time ago. For some reason uncle bobs Facebook post has been just as reliable a media source as any other for a lot of people already.
  
  source
grrgyle@slrpnk.net ⁨1⁩ ⁨year⁩ ago
Omg stop what are you guys thinking

source
flango@lemmy.eco.br ⁨1⁩ ⁨year⁩ ago
Well, just watch " The masked scammer " documentary and you’ll see how this can ( and definitely will ) go wrong. For summary, there’s this article on Wikipedia: Gilbert Chikli.

source
phoenixz@lemmy.ca ⁨1⁩ ⁨year⁩ ago
Yeah Microsoft isn’t releasing this until we can use it responsible.

we’ll never be able to guarantee that. There will always be people abusing this.

Though right now it’s in the hands of Microsoft and likely requires a shit tonne of hardware to run (I’d imagine a collection of specialized servers), this tech WILL come out eventually, and eventually, everyone will be able to run it.

I give it 5-10 years tops before anyone can just do this with anyone. Want to make a movie of trump or Hilary fucking a donkey? Done. Want to make a video of your 5 year old daughter in a gangbang? Done. The future is very bleak.

I’m honestly unsure if the internet was a good idea and I’m even less sure if humanity was a good idea.
source