This benchmark is presenting AI with a challenge that's greater than what human devs normally face. It's supposed to be really hard, it's not surprising that current models get 0%.
The point is that over time models will continue to improve and this benchmark will measure that improvement. A lot of current benchmarks have been saturated, once models are getting near 100% scores there's no point to them any more.
kibiz0r@midwest.social 1 week ago
It’s incredible how we went from everyone laughing at the YNGMI crypto bros to the entire economy being built on top of YNGMI AI bros.
u_tamtam@programming.dev 1 week ago
No it’s not, and part of that is the current legislative laissez-faire in the US that put its regulatory bodies on a hiatus. Under normal circumstances, this stuff should have been under much more scrutiny and regulations. I’m not saying that the state should control what LLMs do or who’s access to them, but they could very much tackle the deceptive marketing, environmental and societal impact, unsound financing, abnormal market consolidation, and mitigate the overall financial risk.