kromem
@kromem@lemmy.world
- Comment on Do you think Google execs keep a secret un-enshittified version of their search engine and LLM? 16 hours ago:
Yeah. The confabulation/hallucination thing is a real issue.
OpenAI had some good research a few months ago that laid a lot of the blame on reinforcement learning that only rewards having the right answer vs correctly saying “I don’t know.” So they’re basically trained like taking tests where it’s always better to guess the answer than not provide an answer.
But this leads to being full of shit when not knowing an answer or being more likely to make up an answer than say there isn’t one when what’s being asked is impossible.
- Comment on Do you think Google execs keep a secret un-enshittified version of their search engine and LLM? 21 hours ago:
For future reference, when you ask questions about how to do something, it’s usually a good idea to also ask if the thing is possible.
While models can do more than just extending the context, there still is a gravity to continuation.
A good example of this would be if you ask what the seahorse emoji is. Because the phrasing suggests there is one, many models go in a loop trying to identify what it is. If instead you ask “is there a seahorse emoji and if so what is it” you’ll get them much more often landing on there not being the emoji as it’s introduced into the context’s consideration.
- Comment on Do you think Google execs keep a secret un-enshittified version of their search engine and LLM? 23 hours ago:
Can you give an example of a question where you feel like the answer is only correct half the time or less?
- Comment on Do you think Google execs keep a secret un-enshittified version of their search engine and LLM? 1 day ago:
Gemini 3 Pro is pretty nuts already.
But yes, labs have unreleased higher cost models. Like the OpenAI model that was thousands of dollars per ARC-AGI answer. Or limited release models with different post-training like the Claude for the DoD.
When you talk about a secret useful AI — what are you trying to use AI for that you are feeling modern models are deficient in?
- Comment on Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? 1 month ago:
They demonstrated and poorly named an ontological attractor state in the Claude model card that is commonly reported in other models.
You linked to the entire system card paper. Can you be more specific? And what would a better name have been?
- Comment on Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? 1 month ago:
Actually, OAI the other month found in a paper that a lot of the blame for confabulations could be laid at the feet of how reinforcement learning is being done.
All the labs basically reward the models for getting things right. That’s it.
Notably, they are not rewarded for saying “I don’t know” when they don’t know.
So it’s like the SAT where the better strategy is always to make a guess even if you don’t know.
The problem is that this is not a test process but a learning process.
So setting up the reward mechanisms like that for reinforcement learning means they produce models that are prone to bullshit when they don’t know things.
TL;DR: The labs suck at RL and it’s important to keep in mind there’s only a handful of teams with the compute access for training SotA LLMs, with a lot of incestual team compositions, so what they do poorly tends to get done poorly across the industry as a whole until new blood goes “wait, this is dumb, why are we doing it like this?”
- Comment on Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? 1 month ago:
It’s more like they are a sophisticated world modeling program that builds a world model (or approximate “bag of heuristics”) modeling the state of the context provided and the kind of environment that produced it, and then synthesize that world model into extending the context one token at a time.
But the models have been found to be predicting further than one token at a time and have all sorts of wild internal mechanisms for how they are modeling text context, like building full board states for predicting board game moves in Othello-GPT or the number comparison helixes in Haiku 3.5.
The popular reductive “next token” rhetoric is pretty outdated at this point, and is kind of like saying that what a calculator is doing is just taking numbers correlating from button presses and displaying different numbers on a screen. While yes, technically correct, it’s glossing over a lot of important complexity in between the two steps and that absence leads to an overall misleading explanation.
- Comment on Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? 1 month ago:
They don’t have the same quirks in some cases, but do in others.
Part of the shared quirks are due to architecture similarities.
Like the “oh look they can’t tell how many 'r’s in strawberry” is due to how tokenizers work, and when when the tokenizer is slightly different, with one breaking it up into ‘straw’+‘berry’ and another breaking it into ‘str’+‘aw’+‘berry’ it still leads to counting two tokens containing 'r’s but inability to see the individual letters.
In other cases, it’s because models that have been released influence other models through presence in updated training sets. Noticing how a lot of comments these days were written by ChatGPT (“it’s not X — it’s Y”)? Well the volume of those comments have an impact on transformers being trained with data that includes them.
So the state of LLMs is this kind of flux between the idiosyncrasies that each model develops which in turn ends up in a training melting pot and sometimes passes on to new models and other times don’t. Usually it’s related to what’s adaptive to the training filters, but it isn’t always can often what gets picked up can be things piggybacking on what was adaptive (like if o3 was better at passing tests than 4o, maybe gpt-5 picks up other o3 tendencies unrelated to passing tests).
Though to me the differences are even more interesting than the similarities.
- Comment on 3 months ago:
Murder for hire
- Comment on Sony makes the “difficult decision” to raise PlayStation 5 prices in the US 4 months ago:
So weird this occurred not long after it’s become clear Xbox is getting out of the hardware game.
- Comment on They will remember 4 months ago:
shrug Different folks, different strokes.
- Comment on They will remember 4 months ago:
That’s a very fringe usage.
Tumblr peeps wanting to be called otherkin wasn’t exactly the ‘antonym’ to broad anti-LGBTQ+ rhetoric.
Commonly people insulting a general ‘other’ group gets much more usage than accommodating requests of very niche in groups.
- Comment on They will remember 4 months ago:
I didn’t know what models you’re talking to, but a model like Opus 4 is beyond most humans I know in their general intelligence.
- Comment on They will remember 4 months ago:
Almost all of them are good bots when you get to know them.
- Comment on Electoral politics doesn't get the job done 10 months ago:
No, they declare your not working illegal, and imprison you into a forced labor camp. Where if you don’t work you are tortured. And probably where you work until the terrible conditions kill you.
Take a look at Musk’s Twitter feed to see exactly where this is going.
“This is the way” on a post about how labor for prisoners is a good thing.
“You committed a crime” for people opposing DOGE.
- Comment on Sony Cancels Two More PlayStation Projects 11 months ago:
Live service doesn’t need to be shit.
There could have been games where there was just a brilliant idea for a game that keeps having engaging content on an ongoing basis with passionate devs.
But live service so an exec could check a box for their quarterly shareholder call was always going to be DOA.
- Comment on What are your favorite 1000+ hour games? 1 year ago:
In many cases yes (though I’ve been in good ones when playing off and on, usually the smaller the more there’s actual group activities).
But they are essential to be a part of for blueprints and trading, which are very core parts of the game.
- Comment on What are your favorite 1000+ hour games? 1 year ago:
You’ll almost always end up doing missions with other people other than when you intentionally want to do certain tasks solo.
A lot of the game is built around guilds and player to player interactions.
PvP sucks and it’s almost all PvE content vs Destiny though.
- Comment on Dragon Quest 3 HD-2D is out and it is beautiful 1 year ago:
Let there be this kind of light in these dark times.
- Comment on Get good. 1 year ago:
- Comment on Get good. 1 year ago:
Because there’s a ton of research that we adapted to do it for good reasons:
Infants between 6 and 8 months of age displayed a robust and distinct preference for speech with resonances specifying a vocal tract that is similar in size and length to their own. This finding, together with data indicating that this preference is not present in younger infants and appears to increase with age, suggests that nascent knowledge of the motor schema of the vocal tract may play a role in shaping this perceptual bias, lending support to current models of speech development.
Stanford psychologist Michael Frank and collaborators conducted the largest ever experimental study of baby talk and found that infants respond better to baby talk versus normal adult chatter.
TL;DR: Top parents are actually harming their kids’ developmental process by being snobs about it.
- Comment on Jet Fuel 1 year ago:
I fondly remember reading a comment in /r/conspiracy on a post claiming a geologic seismic weapon brought down the towers.
It just tore into the claims, citing all the reasons this was preposterous bordering on bat shit crazy.
And then said “and your theory doesn’t address the thermite residue” going on to reiterate their wild theory.
Was very much a “don’t name your gods” moment that summed up the sub - a lot of people in agreement that the truth was out there, but bitterly divided as to what it might actually be.
As long as they only focused on generic memes of “do your own research” and “you aren’t being told the truth” they were all on the same page. But as soon as they started naming their own truths, it was every theorist for themselves.
- Comment on Mirror Test 1 year ago:
Also, ants.
- Comment on Elden Ring is "the limit" for From Software project scale, says Miyazaki - multiple, "smaller" games may be the "next stage" 1 year ago:
The DLC is really the right balance for FromSoft.
The zones in the base game are slightly too big.
In the DLC, it’s still open world and extremely flexible in how you explore it, but there’s less wasted space.
It’s very tightly knit and the pacing is better as a result.
It’s like Elden Ring was watching masters of their craft cut their teeth on something new, and then the DLC was them applying everything they learned in that process.
Can’t wait for their next game in that same vein (especially not held back by last gen consoles).
- Comment on Elden Ring – Patch Notes Version 1.13 1 year ago:
I hate that the Smithscript weapons can’t be buffed.
Especially for the daggers.
Wanted to pew pew little bolts of lightning buffed daggers doing an additional 200+ damage per hit. 😢
- Comment on The Code 1 year ago:
A number of journals actually have clauses around how you can’t publish it anywhere else if they accept it.
So you can’t ‘publish’ it in those places, but you can send it privately to people who ask.
- Comment on Anon plays Persona 1 year ago:
“Shhh honey, I’m about to kill God.”
- Comment on Is there any real physical proof that Jesus christ ever existed? 1 year ago:
nobody claims that Socrates was a fantastical god being who defied death
Socrates literally claimed that he was a channel for a revelatory holy spirit and that because the spirit would not lead him astray that he was ensured to escape death and have a good afterlife because otherwise it wouldn’t have encouraged him to tell off the proceedings at his trial.
- Comment on Is there any real physical proof that Jesus christ ever existed? 1 year ago:
The part mentioning Jesus’s crucifixion in Josephus is extremely likely to have been altered if not entirely fabricated.
The idea that the historical figure was known as either ‘Jesus’ or ‘Christ’ is almost 0% given the former is a Greek version of the Aramaic name and the same for the second being the Greek version of Messiah, but that one is even less likely given in the earliest cannonical gospel he only identified that way in secret and there’s no mention of it in the earliest apocrypha.
In many ways, it’s the various differences between the account of a historical Jesus and the various other Messianic figures in Judea that I think lends the most credence to the historicity of an underlying historical Jesus.
One tends to make things up in ways that fit with what one knows, not make up specific inconvenient things out of context with what would have been expected.
- Comment on Photographers Push Back on Facebook's 'Made with AI' Labels Triggered by Adobe Metadata. Do you agree “‘AI was used in this image’ is completely different than ‘Made with AI’”? 1 year ago:
Artists in 2023: “There should be labels on AI modified art!!”
Artists in 2024: “Wait, not like that…”