@Kissaki In another thread, people are mocking AI because the free language models they are using are bad at drawing accurate maps. "AI can't even do geography". Anything an AI says can't be trusted, and AI is vastly inferior to human ability.
These same people haven't figured out the difference between using a language AI to draw a map, and simply asking it a geography question.
knokelmaat@beehaw.org 2 days ago
I hate AI. Why?
However
I also took the time to read the original blog post, and it is a fascinating story.
The author starts out with using an existing vulnerability as a benchmark for ChatGPT testing. They describe how they took the code specific to the vulnerability and packaged it for ChatGPT, how they formatted the query and what their results were. In 100 runs only 8 correctly identify the targeted vulnerability, the rest are false positives or claim that there are no vulnerabilities in the given code.
Then they take their test a step further and increase the amount of code shared with ChatGPT so that it also includes stuff of the module that had nothing to do with the original vulnerability. As expected, this larger input decreases performance and also reduces the vulnerability detection rate for the targeted vulnerability. However, in those 100 runs, another vulnerability was described that wasn’t a false positive. An actual new vulnerability that the author didn’t know about was discovered. Again, the signal to noise ratio is very low, and one has to sift through a lot of wrong reports to get a realistic one, but this proved that it could be used as a useful tool for helping to detect vulnerabilities.
I highly recommend reading the blog post.
As much as I like to be critical about AI, it doesn’t help if we put our heads in the sand and act as if it never does something cool.
shnizmuffin@lemmy.inbutts.lol 2 days ago
It was right 8% of the time when presented the least amount of input to find a known bug. Then, when they opened it up to more of the codebase, its performance decreased.
I’m not going to use something that’s wrong over 92% of the time. That’s insane. That’s like saying my Magic 8 Ball “could be used as a useful tool for helping to detect vulnerabilities.” The fucking rubber ducky on my desk has a more reliable clearance rate.
knokelmaat@beehaw.org 2 days ago
This is literally the very first experiment in this use case, done by a single person on a model that wasn’t specifically designed for this. The fact that it is able to formulate a correct response at all in this situation impresses me.
It would be easy to criticize this if it were the endpoint and this was being advertised as a tool for vulnerability research, but as discussed at the end of the post, this “quick little test” shows both initial promising results and had the fortunate byproduct of actually revealing a new vulnerability. By no means is it implied that it is now ready for use in this field.
The issue with hallucinations is one that in my opinion is never going to be totally fixed. That is why I hate the use of AI as a final arbiter of truth, which is sadly how a lot of people use it (I’ll quickly ask ChatGPT) and companies advertise it. What it is good at however, is coming up with plausible ideas, and in this case having an indication for things to check in code can be a great tool to discover new stuff, as is literally the case for this security researcher finding a new vulnerability after auditing the module themselves.
LukeZaz@beehaw.org 1 day ago
Interesting. I feel like the headline is still bad though. I get why they ran with it, at least — “ChatGPT finds kernel exploit” is more interesting and gets more clicks than “Monkey finally writes Shakespeare.”