Comment on Do LLM modelers maintain a list of manual corrections fed by humans?

brucethemoose@lemmy.world ⁨3⁩ ⁨days⁩ ago

Yes. Absolutely.

The meme in the research community is that current LLMs are literally trained on benchmarks and common stuff people test in LM-Arena, like the how many r’s in strawberry question.

I’m not talking speculatively: Meta literally got caught red-handed doing this. They ran a separate finetune just to look good on lm-arena. And some benchmarks like MMLU have errors in them that many LLMs *answer ‘correctly’.

source
Sort:hotnewtop