8 Comments
User's avatar
Daniel Van Zant's avatar

Excited for the book. There is a mystery I have been pondering here about language, and LLMs and autoregression. I would be very interested in any ideas you had on it. It seems that there are two distinct "levels" of language in the animal kingdom.

1. One entity encodes some information about the external world as a signal, and another entity is able to decode that signal, in spite of the fact that the two entities are not directly physically linked. This exists in many forms of life at many levels of complexity, all the way from dolphins to bacteria, and it is fairly easy to understand how this would evolve from random mutations and then be selected for afterwards.

2. The second form is autoregressive arbitrary symbolic representation. Essentially you can "disconnect" your language from the real world. You can define new symbols like "Love" or "A 30 foot tall man" that don't exist in the real world, but nevertheless another entity can decode and understand. You can do this because of the autoregressive nature of language (hence the fact that you could have never seen "A 30 foot tall man" explicitly defined anywhere, yet you can still know what I mean). For this second form, it seems like potentially only humans (and now LLMs?) are capable of it.

Here then is my question. It is clear to me how life can evolve level 1, but it is not at all clear to me how life can go from level 1 to level 2. What would the "missing link" between these two even be? What set of random mutations would allow an entity to go from even a very complex level 1 to a very simplistic level 2?

Expand full comment
Elan Barenholtz, Ph.D.'s avatar

Excellent question and one at the heart of the deep mystery of the origin of language, a mystery I think is made much deeper by our revised understanding of language as having this autogenerative structure. As you say, it is easy enough to imagine non-linguistic creatures slowly stumbling upon basic associations between symbols and the objects and events they symbolize in the world. But the syntax of language and especially the structure revealed by LLMs suggest that language really is not about these symbol-symbolized correlations at all. Instead, it is an internally defined, unfathomably complex structure in which the symbols play their causal role in generation based on their myriad, context-dependent relations to other symbols and this generative code is written into the structure of the corpus itself. This seem to be an obscenely sophisticated system that the smartest team of modern humans couldn't possibly devise top down. How the, could it possibly have emerged bottom-up? The closest thing in nature, perhaps, is the genetic code, which also contains within itself the 'instrcutions' for self replicating. But DNA actually isn;t as impressive! A) It had a lot more time to develop and B) the 'code; for self replicaiton isn't really contained within its nucleotide sequence—it depends on a complex network of cellular machinery and environmental interactions to bring its instructions to fruition. Langauge, on the other hand, is this maximally self-contained system. HOW??? If it wasn't un-scientific I would say that langauge has an even greater mark of "design" that some see in the complex machinery of life!

Expand full comment
ekkolápto's avatar

On the “disconnect”comment: first thing that came to mind was UI, and the transition from skeuomorphism into the more common flat and abstract interface we see today.

I tend to view design as a type of language, and it’s interesting to see how users have adapted and learned the language of device interface to the point where we can have quite abstract symbols as logos for applications.

On the music side: you could almost say the modern trap and electronic drum machine hi-hat, kick, and snare are abstractions of traditional drum sets. If you switch around the context and expectation of the song (Aphex Twin does this a bit), some hi-hats can actually feel like snares and vice versa.

Expand full comment
Zinbiel's avatar

It's a truly fascinating development. I probably would have guessed that human language was enough to encode a world model without any additional grounding, but seeing it actually demonstrated in my lifetime is remarkable.

Expand full comment
Lindsey's avatar

I wonder what this says about our ability to describe elements of the human experience. When experiences are given language that’s attributed to objects (feeling blue, sharp pain, etc), is it because we lack language for things that aren’t directly observable? We may be able to gather that people are experiencing a feeling or sensation, but based more on cues. Could that be because language is a reflection of our historical perceptions and feelings are subjective?

Expand full comment
Elan Barenholtz, Ph.D.'s avatar

Interesting observation and very reminscent of Wittgenstein's claims about private language. We don't really have words for the purely subjective; for example, we can't describe a small at all except in relation to other smells ("citrusy") or other modalities ('strong', 'sharp'). This definitely seems like an important clue about the original origins of language as being based on the communication of SHARED sensory experiences. But of course this raises the question of how non-sensory 'synatactic-only' words like 'and' or 'the' got there, not to mention the generative syntax, once again raising the question: who the the heck 'made' language?

Expand full comment
Lindsey's avatar

Could syntactic-only words then reveal something about how we think? From what I’ve read, certain areas of the cortex expanded as language developed. If our brain structures adapted to facilitate language usage, could it be that the expansion follows the same patterns that the brain already had? A reflection of circuits that already existed for other cognitive functions expanded and repurposed. And then would that sort of answer why LLMs work so well — because language itself is a mirror of our thought processes?

Expand full comment
Elan Barenholtz, Ph.D.'s avatar

This sounds right. Here is an additional excerpt from the chapter that aligns with your suggestion:

"One possibility is that language emerged by co-opting pre-existing computational structures in the brain. These structures, which may be present in other species or represent ancient evolutionary adaptations, could have provided the raw computational machinery needed to generate complex, patterned activity. In humans, these ancient systems may have been refined and expanded upon, eventually giving rise to the sophisticated linguistic capabilities we now possess. This perspective would mean that language did not arise de novo as an entirely novel faculty but may have evolved by engaging neural processes that were already in place. For example, the capacity for autoregressive processing may have originally evolved for purposes other than language, such as motor planning or visual processing. Over time, this capacity may have been repurposed and elaborated upon to support the sequential generation of language."

It's hand-wavy for now, but there are testable predictions in there,such as the presence of autoregressive mechanisms in earlier systems and species.

Expand full comment