4 Comments
User's avatar
Swen Werner's avatar

Autoregression is a generative method but not a linguistic description. Language sequences can be generated autoregressively, but language itself is structured by grammar, logic, and semantic constraints. These structures encode operations like comparison, implication, and hierarchy. It is such features that exceed local token dependency. Transformer models simulate reasoning by statistically approximating such constraints. But this is not cognition, and it is not conceptual understanding. Recognizing language as structured logic is not a novel claim. What is novel - dangerously so - is your assertion that sequence generation alone constitutes comprehension. That is an error in understanding.

Expand full comment
Takim Williams's avatar

Cool! I think an autoregressive model of cognition also better explains the prevalence of contradictory beliefs better than traditional models. An individual just has the propensity to espouse different beliefs in different contexts...

Expand full comment
Eric Borg's avatar

This is brilliant! Here’s another token of proof for your thesis. If there were a storage file in the brain which contained my memory of “the A, B, Cs”, then in some sense it would be kind of like they were written down in the brain for my access. But when read from an actual sheet of paper I have no problem reciting them backwards. Because I can’t recite them backwards from memory however, it must be that my memory exists one step at a time by means of the token generation that you propose.

Expand full comment
Ian Jobling's avatar

'When you form a new memory, you're not creating a record but altering the generative tendencies of your neural networks. This view of learning helps explain why practice and repetition work: they strengthen the parameters that generate certain responses, making them more likely to emerge when similar contexts arise in the future. It also explains why learning is rarely all-or-nothing—the parameters continue to adjust with each exposure, gradually refining the generated outputs rather than suddenly creating a perfect "file."'

This is an intriguing idea of how the learning and memory works, but I can't understand how a memory would be formed solely by adjusting the parameters of a neural network. Say I remember doing a 250 lb. deadlift at the gym last week. It seems that there has to be some storage of basic information about the gym, deadlifting, numbers, and weight. Then on that basis, the network could strengthen the association between 250 and deadlifting.

Expand full comment