One of the most profound insights revealed by large language models is that language is not just autoregressive—it is autogenerative (a term I made up; sue me).
Autoregression is a mechanism: it generates a sequence by producing each new item based on the ones that came before. This is how large language models like GPT work—they predict the next word one token at a time, using only the sequence so far. It’s a simple but powerful loop and it’s likely fundamental to how human language itself unfolds. Many sequence types can be modeled autoregressively: the Fibonacci sequence, for example, is generated by summing the two previous values. Some time-series forecasting models generate future values based on past observations by recursively feeding their own predictions back in as inputs. But in all these examples, the rule for generation is extrinsic to the observed sequence. It must be defined outside the data. The system doesn’t contain within itself the logic of its own continuation; it must be supplied.
Autogeneration, by contrast, is a property of the sequence itself: the instructions for its own continuation are embedded in its internal structure. In an autogenerative system, you don’t bring a rule to the data—the rule is latent in the data, waiting to be uncovered. Crucially, that embedded logic can be recovered by any learning mechanism capable of spotting the right patterns—autoregression is one such mechanism, but diffusion or other sequence-to-sequence methods could tap the same well.
That’s exactly what happens with LLMs. When a transformer is trained on a vast corpus, it isn’t handed a grammar or a semantic calculus; it simply optimizes next-token prediction. In doing so, it discovers the deep statistical regularities that already govern language. The model’s weights are an efficient compression of those regularities, not an externally imposed rulebook. So when an LLM continues a prompt, it isn’t applying some outside formula; it’s letting the corpus’s own autogenerative structure speak through next-token generation,
This is what I mean by autogenerative: a system is autogenerative if—and only if— the function needed to generate its next state is recoverable from the internal statistical structure of the system itself, without requiring any external symbolic rule or supervisory signal.
As noted above, autoregression is just one way to recover and exploit such autogenerative structure. You could use other methods, like diffusion models, or even non-sequential sampling schemes. Although I believe language was ‘designed’ to be generated autoregressively, the point is not the technique. It’s that the structure is already there. The generative engine is inside the data, waiting to be tapped.
This means that language, in a very real sense, is speaking for itself. LLMS, and (I argue) our brains are not imposing linguistic structure; they are instantiating it. We are channels for a self-unfolding system.
Right now, this newly minted category of sequence types has a membership of exactly one: natural language. But language likely didn’t appear in a computational vacuum. It may be just one visible expression of a deeper principle of cognition. Other domains —perception, memory, motor control—could also operate autogeneratively, unfolding from their own past states in order to generate the next ‘token’ of thought or behavior. The physical world is filled with deep, predictive structure—temporal, spatial, causal. That’s what physics is: the discovery that regularities in the present allow us to predict the future. That’s also how we navigate everyday life, physically, psychologically, socially. If cognition evolved to exploit that structure, then autogeneration may not be an exception—it may be the rule.
Language may be the clearest case—but it is likely not the only one.
Note: If you find this topic interesting, you might want to check out my interview on Theories of Everything with Curt Jaimungal
a view from a different perch. the origin of language is the discovery that moral cooperation is generative within a shared context. language expands the shared context and maintains the cooperation. this creates collective intelligence processing about the shared context. language scales moral cooperation endlessly, is composable and is antifragile to stressors of new language, new uses and narrative control. language is the cooperation protocol and contains the value exchanged of processed intelligence about the shared context. stressors welcome AO
I'm glad you're opening up to language not being unique! As you perhaps know, I think language is only unique by virtue of its complexity, which allows for memetic evolution and replication. This is indeed a big deal, but there are no unique fundamental features that no animals share with us.
Oh, language IS motor control! It's a special case of motor control to manipulate the external world (and reflectively the internal world in the case of talking/thinking to oneself).
Yes, given starting conditions and laws of nature everything unfolds.