Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

⁨153⁩ ⁨likes⁩

Submitted ⁨⁨5⁩ ⁨days⁩ ago⁩ by ⁨misk@sopuli.xyz⁩ to ⁨technology@beehaw.org⁩

https://www.tomshardware.com/tech-industry/artificial-intelligence/chatgpt-got-absolutely-wrecked-by-atari-2600-in-beginners-chess-match-openais-newest-model-bamboozled-by-1970s-logic

source

Comments

Sort:hotnewtop
  • 30p87@feddit.org ⁨5⁩ ⁨days⁩ ago

    Anyone even believing that a generic word auto completer would beat classic algorithms wherever possible probably belongs into a psychiatry.

    source
    • realitista@lemm.ee ⁨5⁩ ⁨days⁩ ago

      There are a lot of people out there that think LLM’s are somehow reasoning. Even reasoning models aren’t really doing it. It important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.

      source
      • theangriestbird@beehaw.org ⁨5⁩ ⁨days⁩ ago

        It is important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.

        THIS is the thing. The general public’s perception of ChatGPT is basically whatever OpenAI’s marketing department tells them to believe, plus their single memory of that one time they tested out ChatGPT and it was pretty impressive. Right now, OpenAI is telling everyone that they are a few years away from Artificial General Intelligence. Tests like this one demonstrate how wrong OpenAI is in that assertion.

        source
        • -> View More Comments
      • Photuris@lemmy.ml ⁨5⁩ ⁨days⁩ ago

        But the general public (myself included) doesn’t really understand how our own reasoning happens.

        Does anyone, really? i.e., am I merely a meat computer that takes in massive amounts of input over a lifetime, builds internal models of the world, tests said models through trial-and-error, and outputs novel combinations of data when said combinations are useful for me in a given context in said world?

        Is what I do when I “reason” really all that different from what an LLM does, fundamentally? Do I do more than language prediction when I “think”? And if so, what is it?

        source
        • -> View More Comments
      • ByteSorcerer@beehaw.org ⁨5⁩ ⁨days⁩ ago

        I think the problem is that, while the model isn’t actually reasoning, it’s very good at convincing people it actually is.

        I see current LLMs kinda like an RPG character build with all ability points put into Charisma. It’s actually not that good at most tasks, but it’s so good at convincing people that they start to think it’s actually doing a great job.

        source
    • jjjalljs@ttrpg.network ⁨5⁩ ⁨days⁩ ago

      I think I remember some doge goon asking online about using an LLM to parse JSON. Many people don’t understand things.

      source
      • Photuris@lemmy.ml ⁨5⁩ ⁨days⁩ ago

        Jesus Christ software’s about to get far, far worse innit?

        source
        • -> View More Comments
    • MadMadBunny@lemmy.ca ⁨5⁩ ⁨days⁩ ago

      That’s too much critical thinking for most people

      source
  • Showroom7561@lemmy.ca ⁨5⁩ ⁨days⁩ ago

    In a quite unexpected turn of events, it is claimed that OpenAI’s ChatGPT “got absolutely wrecked on the beginner level” while playing Atari Chess.

    Who the hell thought this was “unexpected”?

    What’s next? ChatGPT vs. Microwave to see which can make instant oatmeal the fastest? 😂

    source
    • valgarf@discuss.tchncs.de ⁨5⁩ ⁨days⁩ ago

      Considering how much heat the servers probably generate, ChatGPT might have a decent chance in that competition 😁

      source
      • Showroom7561@lemmy.ca ⁨5⁩ ⁨days⁩ ago

        Air-fried oatmeal, FTW!

        source
  • Michal@programming.dev ⁨5⁩ ⁨days⁩ ago

    A simple calculator will also beat it at math.

    source
  • thefartographer@lemm.ee ⁨5⁩ ⁨days⁩ ago

    Atari game programmed to know chess moves: knight to B4

    Chat-GPT: many Redditors have credited Chesster A. Pawnington with inventing the game when he chased the queen across the palace before crushing the king with a castle tower. Then he became the king and created his own queen by playing “The Twist” and “Let’s Twist Again” at the same time.

    source
  • Wytch@lemmy.zip ⁨5⁩ ⁨days⁩ ago

    This article makes ChatGPT sound like a deranged blowhard, blaming everything but its own ineptitude for its failure.

    So yeah, that tracks.

    source
  • Opinionhaver@feddit.uk ⁨5⁩ ⁨days⁩ ago

    Isn’t this kind of like ridiculing that same Atari for not being able to form coherent sentences? It’s not all that surprising that a system not designed to play chess loses to a system designed specifically for that purpose.

    source
    • GammaGames@beehaw.org ⁨5⁩ ⁨days⁩ ago

      Pretty much, but the marketers are still trying to tell people it can totally do logic anyway. Hopefully the apple paper opens some eyes

      source
      • mormund@feddit.org ⁨5⁩ ⁨days⁩ ago

        For anyone wondering what “the” apple paper is: machinelearning.apple.com/…/illusion-of-thinking

        source
  • cypherpunks@lemmy.ml ⁨5⁩ ⁨days⁩ ago

    This article buries the lede so much that many readers probably miss it completely: the important takeaway here, which is clearer in The Register’s version of the story, is that ChatGPT cannot actually play chess at all:

    “Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were."

    To actually use an LLM as a chess engine without manual intervention as this person did, you would need to combine it with some other software to automate continuing to ask it for a different next move every time it suggests an invalid one. And, if you did that, it would still tend to lose, even to much older chess engines than Atari’s Video Chess.

    source
    • MagicShel@lemmy.zip ⁨5⁩ ⁨days⁩ ago

      You probably could train an AI to play chess and win, but it wouldn’t be an LLM.

      In fact, let’s go see…

      • Stockfish: Open-source and regularly ranks at the top of computer chess tournaments. It uses advanced alpha-beta search and a neural network evaluation (NNUE).

      • Leela Chess Zero (Lc0): Inspired by DeepMind’s AlphaZero, it uses deep reinforcement learning and plays via a neural network with Monte Carlo tree search.

      • AlphaZero: Developed by DeepMind, it reached superhuman levels using reinforcement learning and defeated Stockfish in high-profile matches (though not under perfectly fair conditions).

      Hmm. neural networks and reinforcement learning. So non-LLM AI.

      you can play chess against something based on chatgpt, and if you’re any good at chess you can win

      You don’t even have to be good. You can just flat out lie to ChatGPT because fiction and fact are intertwined in language.

      “You can’t put me in check because your queen can only move 1d6 squares in a single turn.”

      source
  • oce@jlai.lu ⁨5⁩ ⁨days⁩ ago

    A PE teacher got absolutely wrecked by a former Olympic sprinter at a sprint competition.

    source
    • thefartographer@lemm.ee ⁨5⁩ ⁨days⁩ ago

      Change “PE teacher” to “stack of health magazines” and it’s a more accurate equivalence.

      source