Predicting the Demise of Predictive Coding

Why large language models point toward a better understanding of the brain

Jun 05, 2025

Predictive coding has become one of the most widely accepted frameworks for understanding how the brain works. According to this theory, the brain operates as a kind of hierarchical prediction engine. Higher cortical areas generate top-down expectations of what incoming sensory input should look like; lower levels compare these predictions with actual input and send back the difference—prediction error—which drives updating. Perception, in this view, is an inference process: the brain constantly tests hypotheses about the world and corrects them based on surprise.

The appeal of this model is clear. It offers a unified explanation for perception, learning, and attention. It aligns well with Bayesian inference. And it has found empirical support in experiments showing that neural activity tends to increase in response to unexpected stimuli. But despite its elegance, I believe predictive coding is built on a conceptual mistake—one that becomes clear when we consider what generative systems like large language models are actually doing.

The word ‘prediction’ is already somewhat abused and overused outside of the predictive-coding context. For example, we say that GPT “predicts the next word.” But what’s happening in these models is not prediction in the sense envisioned by predictive coding. GPT is trained to minimize next-token loss across a vast corpus of text. At each step, it learns to generate the next token in a sequence based on the ones that came before. But crucially, it is not predicting in the sense of modeling possible external outcomes. It does not guess what someone is likely to say and then check that guess. It does not forecast. Instead, it generates the next token directly—an output that flows from its internalized structure, constrained by the logic of the sequence itself.

What it learns through this process is not what is most likely to happen. It learns how language works—the structure of linguistic unfolding. Its job is not to match some external future, but to produce the next appropriate step in a coherent trajectory. This is what gives it the ability to generate novel and meaningful language. Crucially, what makes this ability useful—and what the model implicitly optimizes for—is the functional role of language in human life: to coordinate behavior across individuals by sharing plans, giving instructions, and communicating knowledge. This is better referred to as ‘optimized generation’ than ‘prediction.’

Still, there is something subtly and remarkably like prediction happening here. In the act of generating the next token, the model must account for the trajectory it is on—and that means implicitly representing where that trajectory might go. In this sense, language models do model the future, but only insofar as the structure of the present already constrains it. This is the miracle of autoregression: by learning to continue a sequence coherently, the model encodes the likely futures embedded in its own generative flow. But this isn’t prediction in any classical sense. It isn’t about guessing an outcome. It’s about preserving coherence consistent with the structure observed across the learned corpus. The future isn’t predicted, but it is represented implicitly in the generative process.

And this is why optimized generation offers a powerful alternative to the predictive coding view of the brain. Because once we understand how a system like GPT can produce complex, adaptive output—not by comparing expectations to reality, but by continually generating the next step based on context and learned structure—it becomes plausible to ask whether the brain might operate similarly.

Imagine the brain not as a predictor, but as a dynamical generator. At every moment, it produces an internal state—be it perceptual, motor, or cognitive—conditioned on the trajectory it is currently on. This generative process reflects what has happened so far, what the body is doing, what the environment affords, and what utilities are being optimized. The “expectation” is not represented as a separate hypothetical model. It is embedded in the current flow of neural activity.

So what happens when the world doesn’t cooperate—when input deviates sharply from the system’s generative path?

The system adjusts. It recalibrates. We observe heightened activity. In predictive coding, this is taken as evidence of prediction error: the brain guessed wrong. But under the generative view, it’s something else entirely. It’s a perturbation of an unfolding process. The brain was moving forward in time along a trajectory shaped by prior states. Now it needs to redirect. That redirection has a metabolic cost. It produces a signal. It may even look like surprise. But it’s not the result of failed forecasting. It’s the result of trajectory disruption.

This reframing thus accounts for the empirical findings that predictive coding was meant to explain—such as increased neural response to unexpected stimuli—but it removes the need for a two-stage architecture of guessing and checking. There is no need for the brain to represent predictions. There is no need for it to encode what it believes the world will do next. There is only the current state, and the need to generate the next one.

This view is not only more parsimonious. It may be more biologically plausible. Neural systems are fast, context-sensitive, and deeply embodied. They are not symbolic inference machines. They do not have time to simulate alternate futures and choose among them. What they need to do—what we observe them doing—is continuously generate states that support adaptive behavior. Sometimes those states match what the world delivers. Sometimes they don’t. But in either case, the system’s job is not to predict. It’s to keep going and to do so usefully.

This is what LLMs show us—not because they are predictive, but because they aren’t. They reveal that rich, structured, intelligent-seeming behavior can emerge from a system that simply generates what fits, without modeling an external future. Their success gives us a working example of what a generative cognitive architecture might look like—one that is internally directed, dynamically updated, and fully capable of adapting to new input without needing to guess what’s coming.

If the brain works more like this—if it is a generative system, not a predictive one—then we need to revise our theoretical models. We need to ask what the system is optimizing for, and how that optimization drives its generative flow. We need to stop imagining the brain as a forecaster and start understanding it as a moment-to-moment constructor of coherent and useful internal states.

Coherent how? Useful for what?

I predict that these will be the new questions for neuroscience.

Jared Peterson

Jun 6

This is the first time I've read a nonrepresentational account of cognition that made sense to me.

In research into expert decision-making, experts often feel like they don't make decisions. This is because they pattern match to a prototypical situation (not a particular situation, but some sort of amalgamation of similar situations). These prototypes are likely constructed in the moment and carry with them cues to look for, goals to pursue, expectations to watch out for, and actions to take.

This account strikes me as coherent with that view. The prototype serves as the non-representational constraint. And it makes sense why experts wouldn't feel like they were making a decision.

Expand full comment

1 reply by Elan Barenholtz, Ph.D.

chris j handel

Jul 5

consider a proposed view of intelligence processing this fits well in your framework. humans in coordinations are doing this, human brains are doing this, humans in creating language for communication are doing this and llm's are doing this. all the same. binary rejection selection. there is a traveling context in a conversation, a book, a video, ... the traveling context is moving forward and the job of intelligence is improving the traveling context with intention. humans and llm's in chat with each other are both adding context with language and trying to decipher context with added language. binary rejection selection is looking at the context for a large chunk of context to abandon to get closer to the intention in the context. it's like rapidly throwing stuff out of a pile that is obviously not what you are seeking. the intention is not estimated or forecast or predicted. the path to the intention is not architected it is emergent from each not-chosen path leaving the opportunity space. this is so efficient that we can speak by doing only this if we are willing to compromise how good our rejections are in exchange for faster processing. we are looking for words with the greatest chance of successfully rejecting unhelpful for our intention context, leaving only helpful context

2 replies

8 more comments...

Elan’s Substack

Discussion about this post