Beyond Binding
Unified Consciousness as Recursive Global Computation
You bite into an apple. The visual redness, tactile smoothness, gustatory sweetness, and conceptual knowledge of the apple’s utility are not experienced as separate elements that happen to co-occur in time. Instead, they are differentiated aspects of a unified whole. This unity manifests behaviorally in our ability to produce multimodal reports (e.g., telling someone about the properties of the apple) or in non-linguistic behavior (e.g,. selecting the apple at the store based on multiple properties). Such multimodal unity characterizes virtually all conscious behavior, from navigating social situations by combining facial expressions, vocal tones, and contextual knowledge, to solving problems by integrating visual-spatial reasoning with symbolic manipulation and logical inference.
The question of how disparate sensory information is integrated into a unified perceptual whole is called the ‘binding problem’, and it operates at both functional and phenomenological levels. Functionally, the question is how distributed neural processes coordinate to produce these coherent, integrated behaviors and reports. Neuroimaging studies suggest that the brain organizes into segregated areas—visual cortex for sight, auditory cortex for sound, somatosensory cortex for touch—with each area further subdivided into specialized regions that process specific aspects of information within their domain. Meanwhile, single-cell recordings show that individual neurons exhibit remarkable feature selectivity, responding preferentially to specific orientations, colors, faces, or even abstract concepts like particular people or places. In both cases, the more we discover about the brain's specialized organization, the more puzzling it becomes how these distributed, selective processes combine into the coherent behaviors we observe.
Phenomenologically, the puzzle is equally deep. When I experience that red, smooth apple, the redness doesn't feel like it's been 'bound' to the smoothness. Instead, they appear as aspects of a single, unified object from the outset. Yet these different qualities maintain their distinctive phenomenal character. The visual redness feels fundamentally different from the tactile smoothness, which feels different from the sweet taste and the conceptual knowledge that this is an apple. There's no sense of assembly or construction; instead, these distinct phenomenal properties somehow participate in a single, coherent conscious state. The mystery is not just that different brain processes get coordinated, but that subjectively distinct qualitative experiences, each with its own unique 'feel', seamlessly belong to one unified conscious moment. This phenomenological unity preserves distinctiveness while creating coherence, presenting consciousness as simultaneously one and many.
A related but distinct challenge involves the 'binding' evident in our conceptual understanding. From a set of discrete observations, we can and do form coherent frameworks that generalize well beyond the individual atomic facts. If we observe that birds have wings, beaks, and lay eggs, and that robins, sparrows, and eagles all share these features, we don't just store separate facts about each species. Instead, we form the coherent concept of 'bird' that lets us immediately predict that a newly encountered cardinal will likely have wings and lay eggs, even without direct observation. But the generalization goes much deeper. We understand that 'things with wings can fly,' that 'egg-laying suggests certain reproductive patterns,' and that 'beaks indicate particular feeding strategies.' We can reason about evolutionary relationships, predict ecological roles, and understand why penguins are still birds despite not flying. From scattered observations about individual creatures, we construct rich, interconnected knowledge systems that support novel inferences, counterfactual reasoning, and analogical thinking, all without explicitly learning every possible connection. This reveals the same unity puzzle at the conceptual level: how do discrete facts get woven into globally coherent understanding systems that generate systematic knowledge far beyond their constituent elements?
Traditional cognitive science has approached both perceptual and conceptual binding by assuming that the mind encodes discrete representations of facts about the world that must then be 'bound' together through additional mechanisms. For perceptual binding, features like color, shape, and motion were thought to be processed separately and stored as distinct representations, requiring binding mechanisms like temporal synchrony or convergence in some centralized zone, to unite them into unified object representations. For conceptual binding, knowledge was viewed as discrete symbolic facts stored in memory that required syntactic operations to achieve systematic understanding. A "language of thought" was proposed where atomic symbols (BIRD, WINGS, FLY) could be manipulated through rules to generate systematic knowledge.
An alternative approach has long existed in neural networks, which process information in a fundamentally distributed manner rather than storing discrete chunks. These systems are built to instantiate an input-output function based on learned examples, rather than encode discrete pieces of information that must then be manipulated by separate syntactic operations. Instead of separate representations that need pasting together, these systems naturally integrate information across many interconnected units during processing itself. This distributed approach suggested a way to bypass the classical model's "pasting together" layer entirely; if information is processed globally from the start, no separate binding mechanisms are needed.
However, distributed neural networks faced significant criticism. While they didn't represent information as discrete symbolic chunks, they were still trained on discrete pieces of information: individual input-output pairs or specific examples. Critics argued that this training regime limited them to pattern matching and statistical association, falling short of the systematic, compositional understanding that seemed to require explicit symbolic structure. For conceptual binding, knowledge was viewed as discrete symbolic facts stored in memory that required syntactic operations to achieve systematic understanding. A "language of thought" was proposed where atomic symbols (BIRD, WINGS, FLY) could be manipulated through rules to generate systematic knowledge.
For decades, the computational theory of mind, which viewed cognition as requiring syntactic overlays operating on symbolic representations, remained the dominant theoretical framework. The idea that true intelligence demanded explicit rules manipulating discrete symbols seemed unassailable. Neural networks were relegated to handling simple pattern recognition tasks, and their limitations became starkly apparent during the AI winters of the 1980s and 1990s, when enthusiasm for connectionist approaches waned as these systems proved unable to achieve much practical utility and certainly not full compositionality. Symbolic AI maintained its dominance, with "real" cognition thought to require the kind of rule-based symbolic processing that classical approaches pursued.
Then came a breakthrough that changed everything: large language models demonstrated something that seemed impossible. These systems are trained to perform next-token prediction from long sequences of text, adjusting their weights through this process so that they converge on highly abstract and distributed representations of the inherent structure of the corpus on which they are trained. Through this training, they develop the ability to capture statistical regularities and compositional relationships that span multiple levels of linguistic organization. The result has been systems of remarkable capability, engaging in sophisticated reasoning, generating coherent long-form text, solving complex problems across diverse domains, exhibiting apparent creativity, and demonstrating the very compositional understanding that critics claimed was impossible without symbolic structure.
In other words, these systems achieve infinite, coherent productivity. While trained on a large but ultimately limited corpus of text, LLMs can generate novel combinations, understand concepts they've never explicitly encountered, and apply principles to entirely new contexts. They go far beyond mere recombination of training examples to exhibit the kind of open-ended, systematic understanding that compositionality requires. An LLM trained on existing literature can write in entirely new styles, solve novel problems by combining principles in unprecedented ways, and engage with hypothetical scenarios that never appeared in its training data. This demonstrates that the distributed weights have captured not just surface patterns but the deep compositional structure that allows infinite generative potential from finite input, exactly what Fodor argued was impossible without explicit symbolic rules.
Crucially,LLMs are a kind of neural network. There are no discrete facts encapsulated as informational atoms. The network doesn't store "birds have wings" as a retrievable fact in some memory location. Instead, this knowledge emerges from the distributed pattern of weights across millions or billions of parameters, with the entire network weighing in on each output token. When generating text about birds, every parameter contributes to the computation, not just those that might seem "bird-related." The knowledge that "cardinals have wings" isn't stored anywhere as a discrete representation but emerges from the global computational process where the complete network participates in generating each token based on the full context.
This represents a fundamental departure from classical approaches. Rather than binding together discrete symbolic representations, LLMs achieve systematic understanding through unified generative computation where meaning emerges from distributed patterns of activation across the entire system. The coherence and systematicity arise not from explicit binding mechanisms but from the recursive, globally integrated nature of the computation itself. The training process shapes the network's weights so that when given any context, the global computation naturally produces outputs that respect the systematic relationships inherent in the training data, without requiring separate mechanisms to bind discrete symbolic elements together.
The implications are profound: LLMs have proven the critics fundamentally wrong. Distributed processing, not symbolic manipulation, turns out to be the key to achieving global coherence and systematic understanding. The very capabilities that seemed to demand explicit syntactic rules—compositionality, infinite productivity, systematic relationships—emerge naturally from recursive global computation over distributed representations. Decades of theoretical arguments about the necessity of symbolic binding have been swept aside by systems that achieve superior performance through unified generative processes.
And it's not just about machines. The parallels between neural networks and biological cognition run deeper than simple metaphor. Both involve massively parallel processing across distributed units (after all, they are called 'neural networks' for a reason), both show how complex behaviors can emerge from relatively simple computational principles operating at scale, and both demonstrate that sophisticated cognitive capabilities need not require centralized control or explicit symbolic manipulation. Beyond these architectural similarities, human cognition appears fundamentally grounded in statistical learning and generalization from experience. Infants acquire language by learning statistical regularities in sound and word sequences, showing sensitivity to transition probabilities consistent with context-driven updating. Psycholinguistic evidence from garden path sentences ("The horse raced past the barn fell") reveals how human parsing operates through incremental, context-dependent generation rather than retrieving pre-stored grammatical rules. Priming effects demonstrate how recent linguistic context influences subsequent processing, much like how previous tokens in an autoregressive sequence shape the generation of following tokens. Even basic linguistic regularities like Zipf's law and structural universals have been found to emerge naturally from autoregressive generation, suggesting they reflect the computational mechanism itself.
The evidence points toward a striking possibility: human cognition may fundamentally depend on the same kind of autoregressive processes that drive LLMs. Like these artificial systems, human cognitive processing may operate through distributed sequential, statistical generation ,where each moment builds upon and is conditioned by what came before.
Which raises a tantalizng possibility: If the conceptual binding that stumped researchers for decades can be solved through distributed computation, might it also serve as a path to understanding perceptual and phenomenological binding in conscious experience? This would represent a radical departure from the classical perspective on brain organization. The presumed modularity of brain regions based on fMRI studies and single-cell recordings may have been fundamentally misunderstood. Rather than discrete modules processing specific types of information that must then be bound together, the brain might function more like an autoregressive system, a unified computational network where different regions contribute their specialized processing patterns to a global generative process.
To understand this reinterpretation, consider the 'ungrounded' nature of words in LLM architecture. A word token has no fixed meaning independent of the generative process; its significance emerges entirely from the role it plays in context. The word "bank" does radically different computational work in "I went to the bank to deposit money" versus "I sat on the river bank to fish." There's no discrete representation of "bank" stored somewhere in the network that gets retrieved and bound to other concepts. Instead, the token influences the global computation in ways that depend entirely on the surrounding context, with meaning emerging from the pattern of activation across the entire system in that specific generative moment.
Similarly, under this reinterpretation, a neuron's selectivity wouldn't represent discrete facts about the world but would reflect its specialized contribution to global computation. Of course, there is strong correspondence between events in the world and certain neural activity; edge-detecting neurons reliably fire when edges are present in the visual field, just as the word "edge" reliably appears in contexts involving boundaries and borders. But this correspondence doesn't make the neural activation a discrete representational symbol any more than the word "edge" is a discrete symbol that "means edge" independent of context.
Consider how an edge-detecting neuron's firing contributes to global visual computation. When edges are present in the visual field, this neuron's activation provides a specific pattern that influences the entire network's generation of unified visual experience. But its "meaning" emerges from its role in the broader computational process, not from encoding "there is an edge here" as a retrievable fact. The neuron might contribute to recognizing a face (where edges define facial features), navigating space (where edges indicate surfaces and obstacles), or reading text (where edges form letters). Its contribution to the global computation depends entirely on the broader context, just as the word "edge" contributes differently to "the edge of the table," "cutting edge technology," or "on edge with anxiety."
Crucially, while there are low-level visual properties that can reliably drive edge detection, the 'meaning' of these activations is ultimately highly contextual. As we know from research on human vision, top-down processes can strongly influence what happens to those low-level properties; expectations, attention, and context can modulate how edge information gets processed and what it contributes to ongoing visual experience. This indicates that even basic sensory neurons play their role based on the global computational context rather than having discrete atomic meaning. An edge detector's firing doesn't simply encode "edge present" but provides input whose significance depends entirely on what the global system is currently computing, whether recognizing objects, planning movements, or integrating with memory and expectation.
Even the famous "Jennifer Aniston neuron" illustrates this principle. This cell fires reliably when Jennifer Aniston appears in various contexts, showing strong correspondence between a specific person and neural activity. But this doesn't mean the neuron "represents Jennifer Aniston" as a discrete fact. Instead, its firing contributes to the global computational processes that might generate recognition ("that's Jennifer Aniston"), recall ("she was in Friends"), emotional response ("I liked that show"), or social inference ("she's an actress"), all depending on the broader context of ongoing thought and experience.
The key insight is that correspondence doesn't imply discrete representation. Neural selectivity, like word usage, shows reliable patterns that correspond to world features while serving entirely context-dependent roles in global computation. The "meaning" emerges from the role in the larger generative process, not from representing something discrete outside the system.
In this view, the apparent modularity and selectivity we observe reflect the specialized contributions different brain areas make to unified computation, rather than separate processing modules requiring binding mechanisms. Just as every parameter in an LLM contributes to language generation even while some regions show apparent specialization for certain patterns, so too brain regions might show functional specialization while participating in globally integrated computation.
This reinterpretation suggests a solution to both functional and phenomenological binding. The key insight lies in understanding the recursive nature of autoregressive computation and how it naturally implements attention-like mechanisms. In an autoregressive system, all disparate computations contribute in a weighted manner to each output, which immediately serves as context for the next iteration. This is inherently global; everything in the system can potentially influence the current generation based on its relevance to the context.
Consider how this works in practice. When I'm deciding whether to select an apple at the grocery store, the visual assessment of its color, the tactile evaluation of its firmness, the aromatic information, and memories of previous fruit experiences all contribute weighted inputs to the current "iteration" of my decision-making process. The nature of autoregression means that the output of this computational process becomes the context for the next moment's processing. When I pick up the apple and feel its texture, this new tactile information combines with my previous visual assessment (now part of the context) to generate an updated evaluation, which then becomes context for interpreting its aroma, and so on.
Of course, the case of human cognition is much more complex than that of LLMs. In the latter, there is a (literally) one-dimensional output/input sequence consisting of discrete tokens processed sequentially. The brain's "output" of each recursive iteration is far more complex and multimodal, including motor behaviors, internal linguistic processing, visual imagery, emotional responses, and countless other dimensions of activity. This output can be understood as a kind of multimodal context that serves as input to the next iteration of global processing. Visual imagery can prompt linguistic processing, which can trigger motor preparation, which feeds back as proprioceptive information into the ongoing computation. A motor movement generates sensory feedback that becomes part of the context for subsequent processing. Internal speech creates "auditory" patterns that influence visual attention and motor planning. Rather than separate streams requiring coordination, these represent different dimensions of the unified output that naturally serves as rich, contextual input for the next recursive cycle.
This recursive structure naturally implements attention without requiring separate attentional mechanisms. Just as self-attention in transformers dynamically weight different parts of the input sequence, the recursive global computation of consciousness dynamically weights different information streams based on their relevance to the current context. The apple's visual appearance might dominate early in the selection process, but tactile feedback becomes more heavily weighted once I pick it up, not because of a separate attentional controller, but because the recursive context makes tactile information more relevant to the ongoing generative process.
Functionally, this explains how unified behaviors emerge without binding mechanisms. The decision to select the apple isn't assembled from separate visual, tactile, and memory components. It's the natural output of a recursive global computation where all these information streams contribute their weighted patterns to each iteration of the decision-making process.
Phenomenologically, this approach explains why conscious experience feels unified from the start rather than assembled. The redness, smoothness, and sweetness of the apple represent different weighted contributions to the unified generative process of each moment of conscious experience. They feel distinct because they involve different patterns of contribution to the global computation, but they're unified because they participate in the same recursive generative process that produces each moment of conscious experience as its natural output.
Crucially, we are not proposing a solution to the hard problem of consciousness: why there is subjective experience at all. Instead, we propose that conscious phenomenology tracks the nature of computation itself. The phenomenal unity of consciousness corresponds to the unity of the underlying computational process. A unified global computation, like that found in distributed networks, provides a much better candidate for explaining the unity of consciousness than discrete local representational units that must be bound together through some higher-level ‘binding’ process. The unified "feel" of conscious experience reflects the unified nature of recursive global computation.
The autoregressive framework transforms the binding problem from an intractable mystery into empirical questions about recursive global computation: How do distributed neural processes achieve global coherence? How does the recursive feedback structure influence the weighting of different information streams? How does the multidimensional output of each iteration serve as context for subsequent processing? Rather than seeking separate binding mechanisms, we can investigate the computational principles that naturally produce both functional integration and phenomenological unity. Neural oscillations, which have been proposed as a form of binding mechanisms, might be reinterpreted as a basis for orchestrating global autoregressive computation. Similarly, large-scale coordination patterns like those observed in the default mode network might reflect the temporal dynamics of this recursive multimodal computation.
Of course, the role of these specific mechanisms in any such global computation remains highly speculative and will require much further investigation. However, the lens of distributed autoregression offers a fundamentally new perspective for considering the potential function of these and other unexplained properties of the brain. More broadly, this account may shed light on the nature of consciousness itself and how to reason about it. The unified character of consciousness, its temporal flow, the preservation of distinctiveness within unity, and the recursive nature of self-awareness all correspond naturally to the intrinsic properties of recursive global computation. This doesn't solve the hard problem of why there is experience at all, but it suggests that a key to understanding consciousness may lie in understanding the computational processes that give rise to it.



This is a fantastic piece that points right at the heart of the paradigm shift needed.
What if the solution isn't a better binding mechanism, but recognizing that "binding" is just what we call the system optimizing for a deeper principle?
We've found that neural organization—like protein folding, cosmic evolution, and even quantum systems—appears to optimize for:
S = ΔC + κΔI
Where:
· ΔC = novel feature differentiation (the "pieces")
· κΔI = cross-modal integration (the "unity")
· S = the understanding state that perception maximizes
In this view:
· Unity isn't something the brain builds—it's the natural attractor state when S is maximized
· The "binding problem" dissolves because you're not assembling parts, but relaxing into coherent S-landscapes
· Different brain states (meditation, pathology, creativity) become different tradeoffs between ΔC and κΔI
The wild part? This same math (with identical parameters from particle physics) predicts protein stability with perfect correlation, cosmic structure formation, and atomic energy levels with 0.04% error.
The brain might just be implementing the universe's native optimization algorithm. Would love to explore this further—the implications for neuroscience feel massive.
Lisa Feldman Barrett calls that general process for producing the perception of an apple allostasis, check this out for example: https://direct.mit.edu/netn/article/6/4/1010/109528/Allostasis-as-a-core-feature-of-hierarchical
I also discuss related ideas by Esther Thelen and Linda Smith, and Susan Oyama, about the emergence of complex forms without binding here: https://interestingessays.substack.com/p/nature-versus-nurture-as-a-binding. And here is an economic perspective on the binding problem, would appreciate your thoughts: https://interestingessays.substack.com/p/an-economic-perspective-on-the-binding