• placebo@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    17 hours ago

    Subjectively, it feels similar to models we used a year or two ago. Not that drastically different from what Anthropic and OpenAI offer today, but slightly worse. For instance, for complex coding tasks it offers basic solutions, while Claude often offers more options and details - as if it knows more.

    Objectively, benchmarks. Mistral looks comparable to other open weight models (as another user mentioned), but not as good otherwise.

    • onnekas@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      14 hours ago

      With the results of chat out of the box I kind of agree with you that mistral feels behind.

      However, there are some features that I really like and make the experience even better than chatgpt.

      1. Agents are pretty cool and with some setup produce very good results
      2. managing libraries for documents/context is better than in ChatGPT. Also adding specific libraries to agents is nice.
      3. Scheduling tasks has just been added and I want to try that out.

      (I have never tried the paid version of any LLM chat so I can only compare free tiers)

    • TorstenTyp@feddit.nu
      link
      fedilink
      English
      arrow-up
      2
      ·
      16 hours ago

      I see, that’s about the time they all got so good that I stopped trying to keep up with the latest benchmarks. It works perfectly for my needs so I definitely wouldn’t dismiss it for anyone wanting to switch to a European alternative.

      • placebo@lemmy.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        16 hours ago

        Sure, totally depends on your needs. But it’d be great if we had one of them frontier models in Europe.

        • Jiral@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          15 hours ago

          Mistral has recently shown a good trajectory of improvement. It is already an important thing that there is a European mid range open weight model that can compete. (Frontier models need a lot more resources, it is important to compare apples with apples) This is good enough for many applications were data security and sovereignity are prime concerns. Of course, it would be good to have a frontier model, lets see how Large 4 will perform when we get there.