• 🌸𝓯𝓵𝓸𝔀𝓮𝓻🌸@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    19 hours ago

    You don’t want to replace them as that has legal issues. But an AI being backseat driver and evaluating their decisions and check what the consequences would be to report that to investors is also very useful.

    • shrugs@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      14 hours ago

      Don’t antropomorphize AI!

      An AI doesn’t evaluate anything, an AI doesn’t check for consequences. All AI does is predicting the next word.

      Do I take the car to the carwash or do i walk?

      If it’s only 300m away, you should walk

      Sure, now predict the future please *facepalm*

      • boonhet@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        13 hours ago

        The carwash thing applies to low end models and older models. Here’s Claude from lowest to highest model, ignoring the banned Fable

        • replicat@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          11 hours ago

          They altered the training data to address this challenge. The underlying issue wasn’t solved in any way. Don’t be naive.

          • boonhet@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 hours ago

            Takes months to train a model, there were already models that got it right when the question was popular, as long as thinking was enabled.

            Also if they were optimising for this question, why not update their lower end model (Haiku) as well?

            The interesting question would be what percent of humans get it wrong. Smaller than LLMs for sure, but I somehow doubt it’s 0.

            • mabeledo@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              11 minutes ago

              Models aren’t retrained from zero. They can be fine tuned or they could even have added a routine to handle specific cases like this.

              For example, Claude used to have a routine that would call external tools embedded in the app to parse structured data and transform it. Not sure about how it does it now.