Comment on How many r are there in strawberry?
lvxferre@mander.xyz 3 days ago
Wrong maths, you say?
With that out of the way: you didn’t ask the number of times the phoneme /ɹ/ appears in the spoken word, so by context you’re talking about the written word, and the letter ⟨r⟩. And the bot interpreted it as such, note it answers
here, let me show you: s-t-r-a-w-b-e-r-r-y
instead of specifying the phonemes.
By the way, all explanation past the «are you counting the “rr” as a single r?» is babble.
jarfil@beehaw.org 3 days ago
Those are all the smallest models, and you don’t seem to have reasoning mode, or external tooling, enabled?
LLM ≠ AI system
It’s been known for fome time, that LLMs do “vibe math”. Internally, they try to come up with an answer that “feels” right… which makes it pretty impressive for them to come anywhere close, within a ±10% error margin.
Ask people to tell you what a right answer could be, give them 1 second to answer… see how many come that close to the right one.
A chatbot/AI system on the other hand, will come up with some Python code to do the calculation, then run it. Still can go wrong, but it’s way less likely.
Not so sure abiut that. It treats r as a word, since it wasn’t specified as “r” or single letter. Then it interpretes it as… whatever. Is it the letter, phoneme, font, the programming language R… since it wasn’t specified, it assumes “whatever, or a mix of”.
It failed at detecting the ambiguity and communicating it spontaneously, but corrected once that became part of the conversation.
It’s like, in your examples… what do you mean by “by”? “3 by 6 = 36”… you meant to “multiply 36”? Tests nonsense… 🤷
lvxferre@mander.xyz 2 days ago
[sarcasm] Yeah, because if you randomly throw more bricks in a construction site, the bigger pile of debris will look more like a house, right. [/sarcasm]
Those are the chatbots available through DDG. I just found it amusing enough to share, given
Small note regarding “reasoning”: just like “hallucination” and anything they say about semantics, it’s a red herring that obfuscates what is really happening.
At the end of the day it’s simply weighting the next token based on the previous tokens + prompt, and optionally calling some external tool. It is not really reasoning; what’s doing is not too different in spirit from Markov chains, except more complex.
If large “language” models don’t count as “AI systems”, then what you shared in the OP does not either. You can’t eat your cake and have it too.
I.e. they’re unable to perform actual maths.
It doesn’t matter if the answer “feels” right (whatever this means). The answer is incorrect.
No, the fact they are unable to perform a simple logical procedure is not “impressive”. Specially not when outputting the “approximation” as if it was the true value; note how none of the models outputted anything remotely similar to “the result is close to
$number
” or “the result is approximately$number
”.None of the prompts had a time limit. You’re making shit up.
Also. Sure, humans brainfart all the time; that does not magically mean that those systems are smart or doing some 4D chess as your OP implies.
I.e. it would need to use some external tool, since it’s unable to handle logic by itself, as exemplified by maths.
The output is clearly handling it as letters. It hyphenates the letters to highlight them, it mentions “digram” (i.e. a sequence of two graphemes), so goes on. And in no moment is referring to anything that can be understood as associated with sounds, phonemes. And it’s claiming there’s an ⟨r⟩ «in the middle of the “rr” combination».
There’s no context whatsoever to justify any of those interpretations.
If this was a human being, it would not be an assumption. Assumption is that sort of shit you make up from nowhere; here context dictates the reading of “r” as “the letter ⟨r⟩”.
However since this is a bot it isn’t even assuming. Just like a boulder doesn’t “assume” you want it to roll down; it simply reacts to an external stimulus.
There’s no ambiguity in the initial prompt. And no, it did not correct what it says; the last reply is still babble, you don’t count ⟨rr⟩ in English as a single letter.
I’d rather not answer this one because, if I did, I’d be pissing on Beehaw’s core values.
jarfil@beehaw.org 1 day ago
I feel like you already did, and I won’t be responding in kind. Good day, to you.