Comment

Comment on Automation

<- View Parent

OsrsNeedsF2P@lemmy.ml ⁨2⁩ ⁨years⁩ ago

While I believe that, it’s an issue with the training data, and not the hardest to resolve

source

Sort:hotnew top

dondelelcaro@lemmy.world ⁨2⁩ ⁨years⁩ ago
Maybe not the hardest, but still challenging. Unknown biases in training data are a challenge in any experimental design. Opaque ML frequently makes them more challenging to discover.

source
- nova_ad_vitum@lemmy.ca ⁨2⁩ ⁨years⁩ ago
  The unknown biases issue has know real solution. In this same example if instead of something simple like snow in the background, it turned out that the photographs of wolves were taken using zoom lenses (since photogs don’t want to get near wild animals) while the dog photos were closeup and the ML was really just training to recognize subtle photographic artifacts caused by the zoom lenses, this would be extremely difficult to detect let alone prove.
  
  source
  - dondelelcaro@lemmy.world ⁨2⁩ ⁨years⁩ ago
    Exactly.
    
    The general approach is to use interpretable models where you can understand how the model works and what features it uses to discriminate, but that doesn’t work for all ML approaches (and even when it does our understanding is incomplete.)
    
    source
Mirodir@discuss.tchncs.de ⁨2⁩ ⁨years⁩ ago
So is the example with the dogs/wolves and the example in the OP.

As to how hard to resolve, the dog/wolves one might be quite difficult, but for the example in the OP, it wouldn’t be hard to feed in all images (during training) with randomly chosen backgrounds to remove the model’s ability to draw any conclusions based on background.

However this would probably unearth the next issue. The one where the human graders, who were probably used to create the original training dataset, have their own biases based on race, gender, appearance, etc. This doesn’t even necessarily mean that they were racist/sexist/etc, just that they struggle to detect certain emotions in certain groups of people. The model would then replicate those issues.

source
- Grandwolf319@sh.itjust.works ⁨2⁩ ⁨years⁩ ago
  I bet ML would also think people with glasses are smarter or some dumb thing like that.
  
  source
merc@sh.itjust.works ⁨2⁩ ⁨years⁩ ago
Yes, “Bias Automation” is always an issue with the training data, and it’s always harder to resolve than anyone thinks.

source
StaticFalconar@lemmy.world ⁨2⁩ ⁨years⁩ ago
Old data adage. Garbage in, garbage out.

source
- Knock_Knock_Lemmy_In@lemmy.world ⁨2⁩ ⁨years⁩ ago
  Actually, in this case the data sounds pretty clean.
  
  source