Comment on OpenEvidence Sounds Promising, but is it Reliable?
jarfil@beehaw.org 4 weeks ago
When OpenEvidence took the US Medical Licensing Exam recently, it was wrong 9% of the time
The passing score for USMLE is ~200 out of 300… how many “wrong times” is that?
liv@lemmy.nz 4 weeks ago
When we look at passing scores, is there any way to quantitatively grade them for magnitude?
Not all bad advice is created equal.
jarfil@beehaw.org 4 weeks ago
The grading is a mess. It goes about qualitative, quantitative… and statistical corrections “to make it fair”.
Anyway, there is ~30% margin on the scores for passing, so chances are that 9% is better than the worst doctor who still “passed”.
liv@lemmy.nz 4 weeks ago
I’d hope the bar for medical advice is higher than “better than the worst doctor”.
Will be interesting to see where liability lies with this one. In the example given, following the advice could permanently worsen patients.
Given that the advice is proven to be wrong and goes against official medical guidance for doctors, that could potentially be material for a class action lawsuit.
jarfil@beehaw.org 4 weeks ago
It’s like in the joke: “What do you call someone who barely finished medical school?.. Doctor.”
Every doctor is allowed to provide medical advice, even those who should better shut up. Liabilities are like what a friend got after a botched operation, when confronting her doctor: “Sue me, that’s what my insurance is for”.
I’d like to see the actual final assessment of an AI on these tests, but if it’s just “9% vs 15% error rate”, I’d take it.
My guess would be the AI might not be great at all kinds of assessments, but having a panel of specialized AIs, like we now have multiple specialist cooperating, sounds like a reasonable idea. Having a transcript of such meeting analyzed by a GP, could be even better.