This benchmark is presenting AI with a challenge that's greater than what human devs normally face. It's supposed to be really hard, it's not surprising that current models get 0%.
The point is that over time models will continue to improve and this benchmark will measure that improvement. A lot of current benchmarks have been saturated, once models are getting near 100% scores there's no point to them any more.