Comment on ChatGPT's o3 Model Found Remote Zeroday in Linux Kernel Code

<- View Parent
shnizmuffin@lemmy.inbutts.lol ⁨4⁩ ⁨days⁩ ago

In 100 runs only 8 correctly identify the targeted vulnerability, the rest are false positives or claim that there are no vulnerabilities in the given code. … [The] signal to noise ratio is very low, and one has to sift through a lot of wrong reports to get a realistic one.

It was right 8% of the time when presented the least amount of input to find a known bug. Then, when they opened it up to more of the codebase, its performance decreased.

I’m not going to use something that’s wrong over 92% of the time. That’s insane. That’s like saying my Magic 8 Ball “could be used as a useful tool for helping to detect vulnerabilities.” The fucking rubber ducky on my desk has a more reliable clearance rate.

source
Sort:hotnewtop