I don’t deny that this kind of thing is useful for understanding the capabilities and limitations of LLMs but I don’t agree that “the best match of a next phrase given his question, and not because it can actually consider the situation.” is an accurate description of an LLM’s capabilities.
While they are dumb and unworldly they can consider the situation: they evaluate a learned model of concepts in the world to decide if the first word of the correct answer is more likely to be yes or no. They can solve unseen problems that require this kind of cognition.
But they are only book-learned and so they are kind of stupid about common sense things like frying pans and ovens.
kromem@lemmy.world 7 months ago
Let’s try with Claude 3 Opus:
Bonus:
Sean Carrol may be a good physicist, but if he’s using an outdated model to make a point his point doesn’t mean shit because of his Physics credentials.
boomzilla@programming.dev 7 months ago
Someone in comments to the original twitter-thread showed the Claude solution for above “riddle”. It was equally sane as in your example, correctly answered the man and the goat can just row together to the other side and correctly identified that there are no hidden restrictions like other items to take aboard. It nevertheless used an excessive amount of text (like myself).
Gemini: The man rows the goat across.
No work ethics there.