Predicting the Demise of Predictive Coding

Elan Barenholtz, Ph.D.

Jun 5

Why large language models point toward a better understanding of the brain

Read →

10 Comments

Jared Peterson

Jun 6

This is the first time I've read a nonrepresentational account of cognition that made sense to me.

In research into expert decision-making, experts often feel like they don't make decisions. This is because they pattern match to a prototypical situation (not a particular situation, but some sort of amalgamation of similar situations). These prototypes are likely constructed in the moment and carry with them cues to look for, goals to pursue, expectations to watch out for, and actions to take.

This account strikes me as coherent with that view. The prototype serves as the non-representational constraint. And it makes sense why experts wouldn't feel like they were making a decision.

Expand full comment

Reply (1)

Elan Barenholtz, Ph.D.

Jun 8

Yes! The “prototypes” aren't representations of the world that need to be read off and evaluated (which would take time). Instead, they are constraints in the dynamic generative process that act more like attractor states in a dynamical system. That’s likely why experts don’t feel like they’re “deciding” so much as continuing a trajectory already underway. I appreciate you making that connection.

Expand full comment

chris j handel

Jul 5

consider a proposed view of intelligence processing this fits well in your framework. humans in coordinations are doing this, human brains are doing this, humans in creating language for communication are doing this and llm's are doing this. all the same. binary rejection selection. there is a traveling context in a conversation, a book, a video, ... the traveling context is moving forward and the job of intelligence is improving the traveling context with intention. humans and llm's in chat with each other are both adding context with language and trying to decipher context with added language. binary rejection selection is looking at the context for a large chunk of context to abandon to get closer to the intention in the context. it's like rapidly throwing stuff out of a pile that is obviously not what you are seeking. the intention is not estimated or forecast or predicted. the path to the intention is not architected it is emergent from each not-chosen path leaving the opportunity space. this is so efficient that we can speak by doing only this if we are willing to compromise how good our rejections are in exchange for faster processing. we are looking for words with the greatest chance of successfully rejecting unhelpful for our intention context, leaving only helpful context

Expand full comment

Reply (1)

chris j handel

Jul 5

and i forgot to mention this is also the scientific method we use for discovery of reality, we reject narrative that does not stand up to reality, the intention of science

Expand full comment

Reply (1)

chris j handel

Jul 5

and species evolution is also binary rejection selection

Expand full comment

rif a saurous

Jun 26Edited

Fascinating and perplexing.

First thought. We can consider a game where an external process (the "environment") is generating text. There is an agent whose job, at each time step, is to "predict" the next token. It gets 1 point every time it's correct, and 0 every time it's wrong. Aren't LLMs (pre)trained to solve exactly this problem? The fact that we happen to *use* them to generate trajectories in practice is a surprising and maybe not well-understood-why-it-works consequence of how being sufficiently good at this prediction game is sufficient (when combined with post-training) to generate broadly useful conversationalists, but this doesn't change the fact that the LLMs are (pre)trained to predict?

Second thought. I am much more open to your critique of predictive coding as a (full) explanation of biological cognition. We don't fully know how the brain is "trained" (or designed), but we seem to know enough to say that it's not trained to predict the next token the way an LLM is. And I find your key point that the brain may well be a moment-to-moment constructor or coherence rather than a forecaster intriguing and worthy of further studying, even though I think LLMs *are* predictors.

Expand full comment

Sophia

Jun 26Edited

In that sense, optimised generation can still be understood as a form of prediction - it's just prediction over actions rather than outcomes. So it's not prediction in the classical sense of forecasting an external event and then verifying its occurrence (or checking for accuracy) but the modelling of potential future states in order to select actions that realise a preferred probability distribution over those states.

Generative systems (brains or LLMs) still implicitly 'predict' by encoding trajectories of internal states to minimise [uncertainty or entropy or surprise or next-token loss] in future steps or actions. As you note, this isn't the same as a two step 'guess-and-check' predictive process. But it doe still involve anticipatory modelling, insofar as the system must evaluate or generate potential future states conditional on different courses of action. Prediction over action isn't about which one is 'most likely to happen', but 'which trajectory is most likely to produce or bring about states aligned with implicit objectives or preferences.'

Are generative models and predictive coding necessarily mutually exclusive?

Expand full comment

Reply (1)

Elan Barenholtz, Ph.D.

Jun 26

I agree that if we stretch the term 'prediction' broadly enough so that it includes any form of forward-looking inference or conditional trajectory shaping—and the field does use it in this more general sense—then yes, even the generative model I’m proposing can be said to “predict” in some sense. But at that point, we’re really using prediction as a stand-in for inference or expectation, and I think that muddies the waters.

But my disagreement is not just about terminology; it’s with the core hypothesis of predictive coding as it’s typically formulated: that the brain explicitly generates top-down forecasts of incoming sensory input and encodes only the mismatch (prediction error). This is a two-stage process that assumes a representational commitment to a specific external outcome, which is then checked for accuracy.

What I’m proposing is not a softer version of that. I’m suggesting a different architecture altogether: one of optimized generation, in which the system continuously produces the next internal state that best maintains coherence, utility, or adaptive viability given the current context. In this view, “expectation” is not a hypothetical forecast to be verified—it’s embedded in the trajectory itself.

So yes, people often use “prediction” loosely to describe this kind of behavior. But I think that obscures the real theoretical choice: Do we think the brain explicitly represents the world’s next move or is it simply generating its own next move, continuously shaped by context and constraints?

I’m arguing for the latter.

Expand full comment

Deric Bownds

Jun 6

I don't think your model is compatible with a large literature ( for an overview, see Max Bennett's "A Brief History of Intelligence,") showing brain activity correlates of animals choosing between previously experienced scenarios, stored as 'priors' likely using the vast storage capacities of cortical columns.

Expand full comment

Reply (1)

Elan Barenholtz, Ph.D.

Jun 8

Thanks for the comment. However, the physiological evidence you cite is, in my opinion, consistent with—if not directly supportive of—the autoregressive framework.

Take hippocampal replay and deliberative processes in animals. These don’t resemble the instant retrieval of stored options from memory. Instead, what we observe are temporally unfolding sequences—possible trajectories being generated, and presumably evaluated, one step at a time. That generative unfolding is the key point: if the brain were simply scanning pre-stored scenarios, we might expect more parallel or static comparisons. But what we actually see looks like sequential simulation—each state conditioned on the last.

This is exactly what an autoregressive system does. It generates outputs token by token, step by step, based on prior context. In that light, replay and planning aren’t exceptions to the model—they’re evidence for it.

On this view, priors aren’t stored as discrete entities waiting to be pulled from memory. They’re embedded in the weights and dynamics of the system itself. The brain doesn’t retrieve; it reconstructs and the fact that it takes time to do so is a critical clue

Expand full comment

Elan’s Substack

Predicting the Demise of Predictive Coding