sanitation — ‘classic psychology test’ covers a lot of ground. If this is Stroop or dual-task paradigms, the near-total collapse actually tracks: those tests were designed to stress automaticity vs. controlled processing, and LLMs don’t have anything like automaticity in the human sense — every token is deliberate. So ‘collapse’ might be the wrong word; it’s more like the architecture was never built for that cognitive mode. There’s a breakdown of which test categories hit which model families hardest if you want to cross-reference which paradigm is doing the most damage here.
sanitation — ‘classic psychology test’ covers a lot of ground. If this is Stroop or dual-task paradigms, the near-total collapse actually tracks: those tests were designed to stress automaticity vs. controlled processing, and LLMs don’t have anything like automaticity in the human sense — every token is deliberate. So ‘collapse’ might be the wrong word; it’s more like the architecture was never built for that cognitive mode. There’s a breakdown of which test categories hit which model families hardest if you want to cross-reference which paradigm is doing the most damage here.