• 0 Posts
  • 4 Comments
Joined 3 years ago
cake
Cake day: June 30th, 2023

help-circle

  • I’m confident enough about this that I’ve registered a prediction on Long Bets.

    “No LLM-based AI will surpass 70% on the ARC-AGI-3 leaderboard, with a cost of $1000 or less, before June 2028.” - https://longbets.org/973/

    I’m curious if you’d really disagree with the premise, and would you (or anyone here on Lemmy) be willing to put money down to challenge the bet? (Long Bets always donates any winnings to a registered non-profit of the winner’s choice, though it’s a $200 minimum).

    Are you saying that LLMs can currently reason? How do you explain their low score on ARC-AGI-3? Do you think Transformer LLM architectures will be capable of reasoning within the next two years without some new breakthrough? What mechanism in the architecture allows them to reason?


  • Companies are only shooting themselves in the foot in the long term if they stop hiring junior engineers, and most of that work is not being replaced, it’s being shifted to the senior engineers who now have to babysit AIs that can’t actually do the job for any extended period of time. If you’re accepting AI code into a codebase without thorough review, then you’re also shooting yourself in the foot in the long term, because even the senior engineers won’t know the codebase after a while. If you’re doing thorough reviews in order to catch the AI bugs, well then you’re probably better off coding it yourself correctly in the first place, unless you’ve already allowed your skills to atrophy.

    Do you really think AIs are reasoning when you ask them to troubleshoot technical issues? You may be lucky if the issue is already in their training data, but anything even slightly novel, and the AI is just going to bullshit an answer, and I guess you’re going to follow it blindly, since you don’t know enough to come up with an answer yourself.

    Besides all that, how is open source AI going to stop junior developers from losing their jobs?


  • The word “intelligence” is doing a lot of heavy lifting here. LLMs lack any mechanism for true logical reasoning, and they always will by nature. This is why they fail at simple questions like “the car wash test”. It’s also why agents are expensive; They just flail around in token hungry “reasoning loops” until they happen to come across a correct solution. And it’s why Claude Opus 4.8 (High) only scores 1.5% on the ARC-AGI-3 benchmark at a cost of $10,000.

    This kind of panic is just part of the hype. Wake me up when real intelligence arrives.