Discussion about this post

User's avatar
Daniel Van Zant's avatar

Excited for the book. There is a mystery I have been pondering here about language, and LLMs and autoregression. I would be very interested in any ideas you had on it. It seems that there are two distinct "levels" of language in the animal kingdom.

1. One entity encodes some information about the external world as a signal, and another entity is able to decode that signal, in spite of the fact that the two entities are not directly physically linked. This exists in many forms of life at many levels of complexity, all the way from dolphins to bacteria, and it is fairly easy to understand how this would evolve from random mutations and then be selected for afterwards.

2. The second form is autoregressive arbitrary symbolic representation. Essentially you can "disconnect" your language from the real world. You can define new symbols like "Love" or "A 30 foot tall man" that don't exist in the real world, but nevertheless another entity can decode and understand. You can do this because of the autoregressive nature of language (hence the fact that you could have never seen "A 30 foot tall man" explicitly defined anywhere, yet you can still know what I mean). For this second form, it seems like potentially only humans (and now LLMs?) are capable of it.

Here then is my question. It is clear to me how life can evolve level 1, but it is not at all clear to me how life can go from level 1 to level 2. What would the "missing link" between these two even be? What set of random mutations would allow an entity to go from even a very complex level 1 to a very simplistic level 2?

Expand full comment
Zinbiel's avatar

It's a truly fascinating development. I probably would have guessed that human language was enough to encode a world model without any additional grounding, but seeing it actually demonstrated in my lifetime is remarkable.

Expand full comment
6 more comments...

No posts