The Unspoken Word
What people without inner monologues reveal about the true nature of language—and thought itself.
The autogenerative/autoregressive theory of cognition I have been making the case for argues that language should not be thought of as a simple tool for communication, nor a medium for encoding pre-existing thoughts—not in any direct or naive sense. Rather, LLMs show us that language is a generative engine that drives itself , each word conditioned on the last, unfolding thought recursively through the internal logic of language alone.
This generative capacity not only drives linguistic communication —it underlies the modes of thought that drive our distinctly human abilities: abstraction, long-term planning, conceptual identity, and moral reasoning. And ultimately, it interacts with other cognitive systems—perception, action, emotion—in ways we have only begun to think about.
In other words, language is a self-contained informational system that is fundamental to what it means to be human.
When people encounter this theory, they often ask: If that’s true—if language is the engine of thought—what about people who don’t have an inner monologue? Can they not think?
And they’re right to ask. Because this isn’t some fringe anomaly. Recent estimates suggest that as many as 15% of people report never experiencing internal speech at all, a condition now referred to as “anendophasia”. Many others experience it only occasionally. And yet all the evidence suggests that these individuals are generally perfectly capable of reasoning, planning, reading, and communicating—just the kind of functions I would argue depend on language.
So, RIP my theory?
Not quite. Instead, I believe that this apparent contradiction points us directly to a critical insight that actually deepens the theoretical framework.
To think linguistically does not mean to think phonologically. To think does not mean to experience.
The inner voice—the running commentary we associate with thought—is just one possible surface rendering of a deeper process. What actually drives linguistic thought is a silent, amodal system of relational prediction.
We see this most clearly in large language models. These systems don’t store words as sounds or concepts. Each word—or more precisely, each token—is represented as a vector in a high-dimensional space. These vectors contain no sound, no gesture, and no fixed meaning. They are defined entirely by their position relative to others, where they live in this high-dimensional space. Meaning arises not from the token itself, but from its place in a predictive structure. Or, as they say in real estate: location, location, location. The vector for “tree” is near “forest” and “leaf,” not because of any definition inherent in its representation, but because of how their relations drive the generative/predictive engine.
In other words, language within these systems is amodal: the model has no commitment to sound or text or gesture. It simply operates within a space of structured relationships. The only reason a token eventually gets rendered as speech or text is because of a “codec”—a final translation layer that converts abstract internal representations into a usable output format.
And, I would argue, the same principle appears to hold for us.
The clearest evidence for the amodal nature of language comes from sign language. Native signers—especially those who are congenitally deaf—rely entirely on visual-manual channels to communicate. And yet, they activate the same core brain regions associated with spoken language in hearing individuals. When these regions are damaged, signers experience the same kinds of aphasias: fluent but meaningless signing, or difficulty retrieving common signs—virtually identical to spoken-language breakdowns. The expressive medium changes. The structure doesn’t. This is amodality made visible: different codecs, same generative core.
We see the same principle at work in Broca’s aphasia, where speech production is disrupted but the underlying linguistic structure remains intact. Patients often know exactly what they want to say, but the output fails to surface fluently. The generator is running. The codec is impaired.
Even in everyday life, the tip-of-the-tongue experience gives us a glimpse of this dissociation. You know the concept. You can feel its shape, its associations. But the wordform won’t come. The internal structure is active but the final expressive layer stalls.
And then there are individuals with anendophasia—those who report no inner monologue at all. Despite this, they function perfectly well: they plan, reason, read, and communicate. What they lack isn’t language, but a particular rendering of it. The predictive structure is in place. It’s simply not being piped through the internal audio channel most of us mistake for thought itself.
All of this is increasingly supported by neural evidence. Recent studies show a striking geometric overlap between artificial language models and human brain activity. Activation patterns in regions like the inferior frontal gyrus mirror the vector relationships found in GPT-style embeddings: words that are close in predictive space are also close in cortical representation—not because of meaning or sound, but because of their role in structured prediction. The brain, like the model, appears to operate in a high-dimensional relational space. Thought unfolds through the geometry of what comes next.
This is why I believe that language is not sound, not speech, not even expression. It is structure. It is the recursive unfolding of prediction itself. And when we understand this, what seemed like a challenge—people thinking without an inner voice—actually points to something deeper: that language is running even when nothing is heard. That its true nature is not in sounds or gestures but the pure language of language.
This insight points to something deeper about the nature of phenomenal awareness itself. I don’t think the Cartesian theater is an illusion. But I do think we may have misunderstood its role. The vivid voice (or images) in the head may not be how thought happens. That phenomenal awareness may be a separate process whereby these thoughts gets rendered for awareness, rehearsal, or action, or whatever. The real work of cognition may happens before these subjective experiences, within the unspoken or unseen structure of predictive generation.
The implications of this are significant. It is as though Chomsky was right about innate language. Just not about spoken language. Meaning, your argument is that "language" is a manifestation of an inherently biological process. Not "culture" but brain structure. Which ought to mean that it is universal. It also suggests that we ought to be looking for the "unifying theory of language". What is true of what you might call tokenized thought. It is like the "natural" flow of experience is quantum-ized into thought tokens. Which incidentally might explain either language or our experience of it. Flow broken into discreet thought events. Brilliant!
Do some people have constant inner narration, like a never-ending episode of the Wonder Years? That would drive me insane. I am so glad my inner voiceover only kicks in once in a while.
I agree that language is not just a tool for communicating pre-existing thoughts. It's like the form and content conundrum where it doesn't make much sense thinking of the two as originating separately, even if they are conceptually separable once language has taken off. Thinking about the origin of language is hair-rippingly mind boggling for this reason. But even so, I would say given language, there are times when thought corrects language and vice versa. Sometimes we say things we don't mean—just a thoughtless slip up, like saying "Tuesday" instead of "Monday" for instance. I'm not seeing how structure alone can solve the problem of deciding which day of the week was really meant. And other such instances.
I agree about representation if representation amounts to pointing at particulars in the world (that much is clear from the relational nature of definitions, which I believe might be better thought of as "representing" Platonic Forms) but if natural language is nothing but structure, what causes it to change over time? What breathes life into this structure to make it move? What can it predict if it is not driven by desire, motivated by its search for new meaning?
Another way of putting it, if it were possible to leave a Shakespeare-era LLM (such as they are today) alone to do its own thing over time without our corrections, interactions, and interventions, what would it be like today? The same as when it started? Or evolved in some way? If the latter, would we still understand it? Curious to hear your thoughts.