So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

  • YourNetworkIsHaunted@awful.systems
    link
    fedilink
    English
    arrow-up
    16
    ·
    5 days ago

    I don’t think that the actual performance here is as important as the fact that it’s clearly not meaningfully “reasoning” at all. This isn’t a failure mode that happens if it’s actually thinking through the problem in front of it and understanding the request. It’s a failure mode that comes from pattern matching without actual reasoning.

    • diz@awful.systemsOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      4 days ago

      It’s a failure mode that comes from pattern matching without actual reasoning.

      Exactly. Also looking at its chain-of-wordvomit (which apparently I can’t share other than by cut and pasting it somewhere), I don’t think this is the same as GPT 4 overfitting to the original river crossing and always bringing items back needlessly.

      Note also that in one example it discusses moving the duck and another item across the river (so “up to two other items” works); it is not ignoring the prompt, and it isn’t even trying to bring anything back. And its answer (calling it impossible) has nothing to do with the original.

      In the other one it does bring items back, it tries different orders, even finds an order that actually works (with two unnecessary moves), but because it isn’t an AI fanboy reading tea leaves, it still gives out the wrong answer.

      Here’s the full logs:

      https://pastebin.com/HQUExXkX

      Content warning: AI wordvomit which is so bad it is folded hidden in a google tool.

      • YourNetworkIsHaunted@awful.systems
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 days ago

        That’s fascinating, actually. Like, it seems like it shouldn’t be possible to create this level of grammatically correct text without understanding the words you’re using, and yet even immediately after defining “unsupervised” correctly the system still (supposedly) immediately sets about applying a baffling number of alternative constraints that it seems to pull out of nowhere.

        OR alternatively despite letting it “cook” for longer and pregenerate a significant volume of its own additional context before the final answer the system is still, at the end of the day, an assembly of sochastic parrots who don’t actually understand anything.

        • diz@awful.systemsOP
          link
          fedilink
          English
          arrow-up
          6
          ·
          edit-2
          4 days ago

          Yeah it really is fascinating. It follows some sort of recipe to try to solve the problem, like it’s trained to work a bit like an automatic algebra system.

          I think they had employed a lot of people to write generators of variants of select common logical puzzles, e.g. river crossings with varying boat capacities and constraints, generating both the puzzle and the corresponding step by step solution with “reasoning” and re-printing of the state of the items on every step and all that.

          It seems to me that their thinking is that successive parroting can amount to reasoning, if its parroting well enough. I don’t think it can. They have this one-path approach, where it just tries doing steps and representing state, just always trying the same thing.

          What they need for this problem is to take a different kind of step, reduction (the duck can not be left unsupervised -> the duck must be taken with me on every trip -> rewrite a problem without the duck and with 1 less boat capacity -> solve -> rewrite the solution with “take the duck with you” on every trip).

          But if they add this, then there’s two possible paths it can take on every step, and this thing is far too slow to brute force the right one. They may get it to solve my duck variant, but at the expense of making it fail a lot of other variants.

          The other problem is that even seemingly most elementary reasoning involves very many applications of basic axioms. This is what doomed symbol manipulation “AI” in the past and this is what is dooming it now.