Thank you, Elan and Tommy! I have mostly praise, but also some critique, as you, Elan, encouraged.
I'm very happy to discover someone expressing the similarities between brains and LLM's as clearly as you do. This is very much in line with my views. Crucially, there seems to be a widespread misconception of LLM's as non-recursive - with a myopic focus on a single token generation. Even Hofstadter, the “king of recursion”, made this slip in a YouTube interview. If you zoom in enough on any recursive system, of course you are not going to find recursion! But this is not the level at which LLM's are impressive. I found your thoughts about short term memory especially clear!
I do think this is presented a bit too much as a ‘new’ model of cognition. To me, as a computational functionalist, this is not a radical new idea. As I see it it is very much in line with what Dennett and even more Hofstadter have been saying for many decades. Of course, newer predictive models and the demonstated “proof” LLM's provide change the playing ground a whole lot, making the ideas a lot more tractable. I do think your focus on comparing with LLM's is very constructive and enlightening, though!
My largest issue though, is your use of the word ‘real’. I'm an illusionist about consciousness, but as Dennett and others state clearly, illusionism is not questioning the realness of consciousness. The illusion is merely the sense of being an observer - a mental subject - that experiences consciousness as a mental object (or scene). Introspection isn't what it naively seems to be.
As such, I'm an illusionist about short - and long term memory, language, and so forth - but I still think they sure as hell are real! Weights and residual activition DOES store facts. It DOES “record” in a sense, a very real sense, even though it might not be anything like recording with a camera.
This parallels every child's realisation that video is ‘just’ still pictures in succession. Isn't video real? Notably, video is encoded in bit streams that are not even recognisable as video or pictures if you don't have the coded. Still perfectly real.
Anyway. A great listen, and I think your take and emphasis is really needed. I really wish I could work on this myself. Keep it going!
Thanks for the thoughtful comments. Yes, the recursive aspect of LLMs is absolutely fundamental to this framework. The core insight is that these processes are designed to unfold and that, as such, information is encoded specifically with stepwise generation in mind. As for the 'realness' of memory, I admit my phrasing is a deliberately provocative but my point is not that we don't end up with experiences of recalling information in sequence; its that the machinery for doing so is fundamentally different than the classic storage/retrieval model and in a profound way. Memories of sequences (including, in my view, episodic events) emerge as a product of the autoregressive process. In other words, the 'memory' is really just the potential to generate the specific sequence during active generation. They are not actually there, in the brain, at a fundamental level. I would compare this with genes. The idea that they 'code' for certain traits is not accurate. They contain properties that, in the right environment, lead to a cascade of processes (enzymes, mRNA, tRNA, amino acids, etc..) that ultimately give rise to certain proteins and eventually traits. An omniscent scientist could never 'decode' the genetic code as indicating eye color without the broader developmental environment in which the genes end up playing their role. Similarly, here, what is encoded in the weights does not have its meaning without the extra autoregressive milieu in which they got to 'express' themselves.
I lay out some of these same arguments in my Memory isn't Real posts on Substack. I have more to say about the STM conception than I've written about so far but I suspect we largely agree on that bit. I'm working on some computational modeling of that right now. Would be great to chat about that, consciousness, and I can tell lots of other stuff, maybe in another Subtack stream(?).
Thanks again for taking the time to consider my somewhat hyperbolic claims!
I believe I do understand your points decently! Let's see if you agree. I hope my English generated with a NN configured with a mix of very high and very low temperature settings (ADHD) will be clear enough.
I'm sure many people do, intuitively, think of memories as movies stored on film or on a hard drive. However, most people have an almost as misconceived idea of how movies are stored on a hard drive. I think the actual mechanics are probably more akin to human memories than analog film. Digital movies do contain images - but these images are stored as instructions on how to -generate- them with complex Fourier transform and other mathematical functions from very scarce data (I suspect you know much more about this than me). There is nothing straightforwardly 'imagy' about the bit sequence in the image file. The file truly is a set of instructions to compute and generate from. But this only holds for the key frames in the movie, which are only a small portion. Most frames are generated based on the previous frames, following stored data values that instruct the transformation. This is not unlike how a human memory or an LLM memory is retrieved and expressed. It is not even, to my understanding, the case that playing a movie is a computationally deterministic process with all codecs - lending to further comparison with recursive generation.
I am totally onboard with the DNA analogy and this is an example I use myself. DNA is a pattern that can replicate only in a certain environment. The DNA and the environment select for each other - they are symbiotic. Sure, one can make the case that DNA bears the instructions for synthesis of proteins, and by extension life itself - but this misses half the story. One can also make a strong case that proteins bear the instructions (by virtue of their structure) for building DNA." DNA builds proteins and cells to replicate itself" is not an obviously better description than "proteins and cells exploit DNA to replicate themselves". In any case, I think the DNA analogy is valid and instructive, but I think largely applies to movie files too. Yes, repeated replays of movies look identical to us. I could counter that identical twins often look identical to me, but admittedly, as soon as they say their names they have displayed evidence differentiation by diverging generative processes.
I may be hallucinating my own memories or misunderstanding Dennett and Hofstadter, but even though they don't use the word 'autoregressive' I feel like they (especially Hofstadter) is describing very much the same thing as you are saying. As soon as LLMs came around I applied what they say and have taken for granted that that they would agree that LLM's in this respect are similar to human brains. But I could be wrong, of course.
I have been surprised to not see anyone before you push this, but I figured that is just because I'm not following the field very closely. It seems from the way that you and Tommy talk that I am wrong, and your approach is more novel than I thought. In any case, it really doesn't matter. Therefore, I think it is very good, and very important, that you are saying these things. I wish I had been pushing it more myself (since I really am looking to changing gears and this is really interesting stuff) and I think I will have more time to do so in the future. In any case, I don't think I could do it nearly as elaborately and clearly as you can and I'm very glad that you are doing this. As I understand it we very much agree except on the word choices.
I understand that you are being provocative with "not real" but I think it is ultimately an unwise choice to phrase it like this. It may catch some peoples attention, but many are just going to discard it. The term "illusionism" about consciousness is already consistently misunderstood. Even though Dennett was very clear that consciousness is 'real', people still think he was saying consciousness is not real. If you want to be provocative I suggest you phrase it more like "memory is not what you think it is" or "memory is an illusion". Ultimately I think that "not real" is simply false. But this is just my opinion, obviously it is up to you what phrasing you want to use :)
I'm not sure if you are offering to chat about it with me specifically. If that's what you mean, I really would love to do that! If not, keep up what you're doing!
PS. Introspection doesn't tell me what's really going on in me, but in any case this is what I'm gonna generate now: I think there is probably some envy from my side that you are expressing these things in ways I had wished to have done already, and that is expressing itself as me saying it is not so novel. This is my 40-year crisis playing out, I feel like I got into the wrong field (medicine). But on the other hand, it was in my studies that I was introduced to Dennett, and my experience with patients perhaps mean that I understand AI and philosophy of mind better than if I had not become a physician. I hope I have not offended you!
Edit:
A few extra points. Sorry for very long comment.
-I agree that this autoregressive generative process is crucial also for episodic memory. I think of episodic memory simply as specific arrangements of semantic memory (or of 'conceps' - perhaps semantic memory is the wrong term). Likewise, imagination is made of precisely the same thing - except that we don't label it memory and say things like "this happened" (these features are, of course, themselves generative processes). I'm sure I've read about memory - imagination as similar somewhere but I can't remember where.
-I think that for highly functioning humans, when they report that they approx 5 minutes ago were imagining what a banana looks like - there is usually some truth to that - even though they cannot directly "access" that memory in the way it may seem to them. However, for dreams, I am not so certain that reports of dreams must correspond to that there was a dreamt experience in the way it seems when one "remembers" the dream. What do you think about this?
-If we had a perfect scan of a human, and infinite computational power, we would not be able to make a reliable statement about what the person believes or knows about A without simulating the appropriate interrogation. We could not find out if they would be loyal to Y without simulating a simulation where their loyalty is tested. We cannot predict the NN output without computing what the NN would compute. So, if society needs to find out something from a person and can only do so through torture, then no technology can reliably replace that torture without simulating the torture and bringing the torture into existence. Of course, the possibility of switching off that simulation without anyone being left with a trauma, might be viewed as more ethical than traditional torture.
Will aim to respond in more detail in a few days (very tight schedule next few days) but I want to share some initial thoughts as your critique is excellent and has been very helpful in helping me clarify some things. (I am sure you are an excellent physician but you definitely have the right stuff to be an academic. But who needs, the academy? We've got substack!)
So there is a subtle but critical point that I think may help clarify the specific claims I am making. You are absolutely right; when we think of a video stored on a hard drive, that it is not recognizable as such independently and requires being translated into sequential images. But in that case, while the information must be translated, it is, in effect, all there, holistically, in its pre-translated form. we can think of this as a kind of structural isomorphism between different version of the same information. The storage-retrieval model of the brain would similarly see information stored in a very different form: e.g. in the case of a memories text, instructions for what words to say rather than the words per se. And this could be a distributed representation.
What I am arguing is that nothing like this actually exists in autoregressive systems (and, by extension the brain) because the only information that is stored is how to generate the next token autoregressive. so any sequence is not structurally 'in there' at all, even under transformation. The only way the sequence can emerge is through the dynamic autoregressive process. No translation can take the econding to the expression. You have to run it. It is a true dynamical system. as such its properties are manifest only in its unfolding AND (this is another critical point I haven't clarified well) it is dependent on the input, with completely different behaviors depending on 'initial conditions'—in this case the prompt.
So, this is wholly different than a static structure that can be translated into some other static form, like a video. And what goes for LLMS, I claim, goes for langauge and memory, etc..
Hope to address other points soon. Thanks again for the extremely thoughtful feedback!
Toulmin's framework claim/grounds/warrant/backing etc for argumentation should apply to personal decision making as a form of self argument where the autoregression operates over these higher level conceptual tokens.
Okay let's say that it is generated autoregresdively. What in your view is it that maintains any form of fidelity to the actual original experience being remembered. What keeps the auto regression from heading off in some to some aleatory direction
I would say the same thing that allows LLMS to maintain a high degree of fidelity to a specific text that they've learned. It's still statistical generation but the distribution is fairly narrow around the most 'likely' output, constraining it. This is related to the 'temperature' conversation in the recording.
Hi, Elan.... The question that came to my mind was how you explain things like Pribram's Ability to elicit very clear.Almost movie like recollections when stimulating localized regions of the brain and it seems that they were repeatable as well. How does that fit into your conceptualization
Hey Russ! Despite my provocative title, I don't deny memories of sequential events are encoded in some form. I am just arguing that the way they are encoded is as the ABILITY to generate the sequence autoregressively rather than the sequence being encoded in some static holistic form. In the case of highly vivid memory stimulation (or, for that matter, just regular ol' remembering something), what is happening, in my view, is that the generative sequence is elicited and then runs autoregressively. It's a somewhat subtle point about how information is encoded and produced when 'remembered; It's about generation, not retrieval.
Thank you, Elan and Tommy! I have mostly praise, but also some critique, as you, Elan, encouraged.
I'm very happy to discover someone expressing the similarities between brains and LLM's as clearly as you do. This is very much in line with my views. Crucially, there seems to be a widespread misconception of LLM's as non-recursive - with a myopic focus on a single token generation. Even Hofstadter, the “king of recursion”, made this slip in a YouTube interview. If you zoom in enough on any recursive system, of course you are not going to find recursion! But this is not the level at which LLM's are impressive. I found your thoughts about short term memory especially clear!
I do think this is presented a bit too much as a ‘new’ model of cognition. To me, as a computational functionalist, this is not a radical new idea. As I see it it is very much in line with what Dennett and even more Hofstadter have been saying for many decades. Of course, newer predictive models and the demonstated “proof” LLM's provide change the playing ground a whole lot, making the ideas a lot more tractable. I do think your focus on comparing with LLM's is very constructive and enlightening, though!
My largest issue though, is your use of the word ‘real’. I'm an illusionist about consciousness, but as Dennett and others state clearly, illusionism is not questioning the realness of consciousness. The illusion is merely the sense of being an observer - a mental subject - that experiences consciousness as a mental object (or scene). Introspection isn't what it naively seems to be.
As such, I'm an illusionist about short - and long term memory, language, and so forth - but I still think they sure as hell are real! Weights and residual activition DOES store facts. It DOES “record” in a sense, a very real sense, even though it might not be anything like recording with a camera.
This parallels every child's realisation that video is ‘just’ still pictures in succession. Isn't video real? Notably, video is encoded in bit streams that are not even recognisable as video or pictures if you don't have the coded. Still perfectly real.
Anyway. A great listen, and I think your take and emphasis is really needed. I really wish I could work on this myself. Keep it going!
Hi Mark,
Thanks for the thoughtful comments. Yes, the recursive aspect of LLMs is absolutely fundamental to this framework. The core insight is that these processes are designed to unfold and that, as such, information is encoded specifically with stepwise generation in mind. As for the 'realness' of memory, I admit my phrasing is a deliberately provocative but my point is not that we don't end up with experiences of recalling information in sequence; its that the machinery for doing so is fundamentally different than the classic storage/retrieval model and in a profound way. Memories of sequences (including, in my view, episodic events) emerge as a product of the autoregressive process. In other words, the 'memory' is really just the potential to generate the specific sequence during active generation. They are not actually there, in the brain, at a fundamental level. I would compare this with genes. The idea that they 'code' for certain traits is not accurate. They contain properties that, in the right environment, lead to a cascade of processes (enzymes, mRNA, tRNA, amino acids, etc..) that ultimately give rise to certain proteins and eventually traits. An omniscent scientist could never 'decode' the genetic code as indicating eye color without the broader developmental environment in which the genes end up playing their role. Similarly, here, what is encoded in the weights does not have its meaning without the extra autoregressive milieu in which they got to 'express' themselves.
I lay out some of these same arguments in my Memory isn't Real posts on Substack. I have more to say about the STM conception than I've written about so far but I suspect we largely agree on that bit. I'm working on some computational modeling of that right now. Would be great to chat about that, consciousness, and I can tell lots of other stuff, maybe in another Subtack stream(?).
Thanks again for taking the time to consider my somewhat hyperbolic claims!
Elan, thank you very much for your reply!
I believe I do understand your points decently! Let's see if you agree. I hope my English generated with a NN configured with a mix of very high and very low temperature settings (ADHD) will be clear enough.
I'm sure many people do, intuitively, think of memories as movies stored on film or on a hard drive. However, most people have an almost as misconceived idea of how movies are stored on a hard drive. I think the actual mechanics are probably more akin to human memories than analog film. Digital movies do contain images - but these images are stored as instructions on how to -generate- them with complex Fourier transform and other mathematical functions from very scarce data (I suspect you know much more about this than me). There is nothing straightforwardly 'imagy' about the bit sequence in the image file. The file truly is a set of instructions to compute and generate from. But this only holds for the key frames in the movie, which are only a small portion. Most frames are generated based on the previous frames, following stored data values that instruct the transformation. This is not unlike how a human memory or an LLM memory is retrieved and expressed. It is not even, to my understanding, the case that playing a movie is a computationally deterministic process with all codecs - lending to further comparison with recursive generation.
I am totally onboard with the DNA analogy and this is an example I use myself. DNA is a pattern that can replicate only in a certain environment. The DNA and the environment select for each other - they are symbiotic. Sure, one can make the case that DNA bears the instructions for synthesis of proteins, and by extension life itself - but this misses half the story. One can also make a strong case that proteins bear the instructions (by virtue of their structure) for building DNA." DNA builds proteins and cells to replicate itself" is not an obviously better description than "proteins and cells exploit DNA to replicate themselves". In any case, I think the DNA analogy is valid and instructive, but I think largely applies to movie files too. Yes, repeated replays of movies look identical to us. I could counter that identical twins often look identical to me, but admittedly, as soon as they say their names they have displayed evidence differentiation by diverging generative processes.
I may be hallucinating my own memories or misunderstanding Dennett and Hofstadter, but even though they don't use the word 'autoregressive' I feel like they (especially Hofstadter) is describing very much the same thing as you are saying. As soon as LLMs came around I applied what they say and have taken for granted that that they would agree that LLM's in this respect are similar to human brains. But I could be wrong, of course.
I have been surprised to not see anyone before you push this, but I figured that is just because I'm not following the field very closely. It seems from the way that you and Tommy talk that I am wrong, and your approach is more novel than I thought. In any case, it really doesn't matter. Therefore, I think it is very good, and very important, that you are saying these things. I wish I had been pushing it more myself (since I really am looking to changing gears and this is really interesting stuff) and I think I will have more time to do so in the future. In any case, I don't think I could do it nearly as elaborately and clearly as you can and I'm very glad that you are doing this. As I understand it we very much agree except on the word choices.
I understand that you are being provocative with "not real" but I think it is ultimately an unwise choice to phrase it like this. It may catch some peoples attention, but many are just going to discard it. The term "illusionism" about consciousness is already consistently misunderstood. Even though Dennett was very clear that consciousness is 'real', people still think he was saying consciousness is not real. If you want to be provocative I suggest you phrase it more like "memory is not what you think it is" or "memory is an illusion". Ultimately I think that "not real" is simply false. But this is just my opinion, obviously it is up to you what phrasing you want to use :)
I'm not sure if you are offering to chat about it with me specifically. If that's what you mean, I really would love to do that! If not, keep up what you're doing!
PS. Introspection doesn't tell me what's really going on in me, but in any case this is what I'm gonna generate now: I think there is probably some envy from my side that you are expressing these things in ways I had wished to have done already, and that is expressing itself as me saying it is not so novel. This is my 40-year crisis playing out, I feel like I got into the wrong field (medicine). But on the other hand, it was in my studies that I was introduced to Dennett, and my experience with patients perhaps mean that I understand AI and philosophy of mind better than if I had not become a physician. I hope I have not offended you!
Edit:
A few extra points. Sorry for very long comment.
-I agree that this autoregressive generative process is crucial also for episodic memory. I think of episodic memory simply as specific arrangements of semantic memory (or of 'conceps' - perhaps semantic memory is the wrong term). Likewise, imagination is made of precisely the same thing - except that we don't label it memory and say things like "this happened" (these features are, of course, themselves generative processes). I'm sure I've read about memory - imagination as similar somewhere but I can't remember where.
-I think that for highly functioning humans, when they report that they approx 5 minutes ago were imagining what a banana looks like - there is usually some truth to that - even though they cannot directly "access" that memory in the way it may seem to them. However, for dreams, I am not so certain that reports of dreams must correspond to that there was a dreamt experience in the way it seems when one "remembers" the dream. What do you think about this?
-If we had a perfect scan of a human, and infinite computational power, we would not be able to make a reliable statement about what the person believes or knows about A without simulating the appropriate interrogation. We could not find out if they would be loyal to Y without simulating a simulation where their loyalty is tested. We cannot predict the NN output without computing what the NN would compute. So, if society needs to find out something from a person and can only do so through torture, then no technology can reliably replace that torture without simulating the torture and bringing the torture into existence. Of course, the possibility of switching off that simulation without anyone being left with a trauma, might be viewed as more ethical than traditional torture.
HI Mark,
Will aim to respond in more detail in a few days (very tight schedule next few days) but I want to share some initial thoughts as your critique is excellent and has been very helpful in helping me clarify some things. (I am sure you are an excellent physician but you definitely have the right stuff to be an academic. But who needs, the academy? We've got substack!)
So there is a subtle but critical point that I think may help clarify the specific claims I am making. You are absolutely right; when we think of a video stored on a hard drive, that it is not recognizable as such independently and requires being translated into sequential images. But in that case, while the information must be translated, it is, in effect, all there, holistically, in its pre-translated form. we can think of this as a kind of structural isomorphism between different version of the same information. The storage-retrieval model of the brain would similarly see information stored in a very different form: e.g. in the case of a memories text, instructions for what words to say rather than the words per se. And this could be a distributed representation.
What I am arguing is that nothing like this actually exists in autoregressive systems (and, by extension the brain) because the only information that is stored is how to generate the next token autoregressive. so any sequence is not structurally 'in there' at all, even under transformation. The only way the sequence can emerge is through the dynamic autoregressive process. No translation can take the econding to the expression. You have to run it. It is a true dynamical system. as such its properties are manifest only in its unfolding AND (this is another critical point I haven't clarified well) it is dependent on the input, with completely different behaviors depending on 'initial conditions'—in this case the prompt.
So, this is wholly different than a static structure that can be translated into some other static form, like a video. And what goes for LLMS, I claim, goes for langauge and memory, etc..
Hope to address other points soon. Thanks again for the extremely thoughtful feedback!
Toulmin's framework claim/grounds/warrant/backing etc for argumentation should apply to personal decision making as a form of self argument where the autoregression operates over these higher level conceptual tokens.
Okay let's say that it is generated autoregresdively. What in your view is it that maintains any form of fidelity to the actual original experience being remembered. What keeps the auto regression from heading off in some to some aleatory direction
I would say the same thing that allows LLMS to maintain a high degree of fidelity to a specific text that they've learned. It's still statistical generation but the distribution is fairly narrow around the most 'likely' output, constraining it. This is related to the 'temperature' conversation in the recording.
Hi, Elan.... The question that came to my mind was how you explain things like Pribram's Ability to elicit very clear.Almost movie like recollections when stimulating localized regions of the brain and it seems that they were repeatable as well. How does that fit into your conceptualization
Hey Russ! Despite my provocative title, I don't deny memories of sequential events are encoded in some form. I am just arguing that the way they are encoded is as the ABILITY to generate the sequence autoregressively rather than the sequence being encoded in some static holistic form. In the case of highly vivid memory stimulation (or, for that matter, just regular ol' remembering something), what is happening, in my view, is that the generative sequence is elicited and then runs autoregressively. It's a somewhat subtle point about how information is encoded and produced when 'remembered; It's about generation, not retrieval.