ChatGPT answers metaphysical questions :)

Federica · Post by **Federica** » Thu Nov 21, 2024 6:29 pm

Cleric wrote: ↑Thu Nov 21, 2024 9:05 am Knowing Grant Sanderson from interviews (he has two podcasts with Lex, for example), I would say he is a very down-to-earth thinker, exceptionally clear and concise, with an enviable pedagogical approach, always striving to build intuitive understanding, yet staying firmly within the sensory-intellectual field. From some questions that Lex asked him, it was clear that he has no interest in speculating beyond mathematics and physics (like in this and this chapter). With that in mind, I'm pretty sure he means nothing mystical about the emergent behavior. I think he would agree that in this case it is a synonym of 'surprising' or 'unexpected', something which we did not code specifically for, but nevertheless proceeds from the statistics.

That makes a lot of sense on the basis of the chapters you have linked. At the same time, when describing how Attention works in this video he says something that perplexes me (6:42 to 7:44):

"Compressing things a bit, let's write that query vector as q, and then anytime you see me put a matrix next to an arrow like this one, it's meant to represent that multiplying this matrix by the vector at the arrow's start gives you the vector at the arrow's end. In this case, you multiply this matrix by all of the embeddings in the context, producing one query vector for each token.
The entries of this matrix are parameters of the model, which means the true behavior is learned from data, and in practice, what this matrix does in a particular attention head is challenging to parse. But for our sake, imagining an example that we might hope that it would learn, we'll suppose that this query matrix maps the embeddings of nouns to certain directions in this smaller query space that somehow encodes the notion of looking for adjectives in preceding positions. As to what it does to other embeddings, who knows? Maybe it simultaneously tries to accomplish some other goal with those".

In fact, I am not getting all the details of the explanation, so this note could also have no relevance. Sorry if "goal" obviosuly points to something else that has nothing to do, for example, with what Levin would search for in a computational system like this: some form of basal algorithmic cognition.

Cleric · Post by **Cleric** » Fri Nov 22, 2024 1:47 pm

Federica wrote: ↑Thu Nov 21, 2024 6:29 pm That makes a lot of sense on the basis of the chapters you have linked. At the same time, when describing how Attention works in this video he says something that perplexes me (6:42 to 7:44):

"Compressing things a bit, let's write that query vector as q, and then anytime you see me put a matrix next to an arrow like this one, it's meant to represent that multiplying this matrix by the vector at the arrow's start gives you the vector at the arrow's end. In this case, you multiply this matrix by all of the embeddings in the context, producing one query vector for each token.
The entries of this matrix are parameters of the model, which means the true behavior is learned from data, and in practice, what this matrix does in a particular attention head is challenging to parse. But for our sake, imagining an example that we might hope that it would learn, we'll suppose that this query matrix maps the embeddings of nouns to certain directions in this smaller query space that somehow encodes the notion of looking for adjectives in preceding positions. As to what it does to other embeddings, who knows? Maybe it simultaneously tries to accomplish some other goal with those".

In fact, I am not getting all the details of the explanation, so this note could also have no relevance. Sorry if "goal" obviosuly points to something else that has nothing to do, for example, with what Levin would search for in a computational system like this: some form of basal algorithmic cognition.

Right, there's nothing cognitive in this 'goal'. It can be helpful to go through the first two videos in the playlist. From there we can gain pretty good intuition about how a very simple network can recognize handwritten numbers. Grant says how we may think that the first layer may, for example, detect edges, while the second layer - their patterns. This gives us some intuitive sense that every layer somehow integrates the numbers from the previous. However, as he reveals at the end of the second video, in practice these two layers don't do anything comparable to what we may call "edge detection" or "shape detection".

In some cases it may turn out that layers indeed extract certain regularity that makes sense to us humans, but this doesn't need to be the case. We should remember that training always starts with random weights (knobs in random positions). Then as we pass the training data through, we determine how the knobs need to be nudged in order to change the initially random result more into the direction of what it should be according to the pre-labeled training data. So nowhere in this process do we say what each of the layers should be doing. It's all about tweaking the parameters until the input data produces the expected result.

It is similar with the language models. One head of the attention layer cross-references each token with every other. It is conceptually helpful to think of something humanly recognizable, such as whether the attention head highlights whether a noun is coupled with an adjective. And it might indeed be the case that something like that really happens after training. However, no one has specifically sought this. It's just how the layer turned out after the optimization (training). These are simply numbers (the weights, knob positions) that work and satisfy the training data - nothing more. Each head of the attention layers may turn out to be sensitive to certain regularities and patterns. This is to be expected. We may then say "The 'goal' of this attention head is to extract such and such regularities from the embeddings."

Imagine that you are making a voice transformation device that works similarly to a neural network but passes the sound through a network of audio filters that are initially randomly tuned. Then you begin to tweak the knobs until eventually you make your voice sound in the way you want. Then you look at one of the already tuned filters and see that it cuts the mid-ranges. Then you can say "The goal of this filter is to cut out the mid-frequencies." Of course, we can't speak of a cognitive goal here. We didn't intend what that filter should be doing, we simply tweaked the knobs of all filters until our voice sounded like we wanted. What the particular filter is doing (its 'goal') is simply the result of the state it ended up after tweaking the knobs. It's pretty much exactly the same with the tuning and 'goals' of the layers of the LLM. In other words, the 'goal' is simply a synonym for what we can recognize that element is doing.

From this also proceeds the great difficulty in analyzing what the models are doing internally. We would very much like to point at a certain layer and say "this detects the nouns" or something like that. But it turns out things very rarely work in such a convenient for us way. Instead, the data is transformed in quite obscure ways until at the end it converges to something. We can get some additional sense of why this should be the case when we consider that most of the tokens are actually not complete words (you can see the list of the tokens that GPT uses here). It's obvious that for a great amount of these tokens it doesn't make sense to ask "Is it a noun?" because they are not even words. So there's no reason why some layer, after tuning, would turn out to have the 'goal' of detecting nouns. There may be something like this, but everything is much more hybridized and obscured. It simply works because we have tweaked the knobs in the direction of working. No one has incentivized the layers to assume certain roles and goals. For example, hypothetical noun detection can work in extremely unintuitive ways, by being distributed in different layers. In that sense, one layer does something that doesn't make much sense to us humans. But when later the effects of another layer are added, the cumulative effects give the answer. This is an artificial example but comes to show that there's nothing in the training process or the general network design that demands layers should have well-defined functions which on top of that should make sense to us humans.

Federica · Post by **Federica** » Fri Nov 22, 2024 1:52 pm

AshvinP wrote: ↑Thu Nov 21, 2024 1:56 pm It may be useful to ask in these situations where the meaning is ambiguous, if we were using 'collective unconscious' (for example) as a symbol for phenomenological realities, how would we understand its meaning? I know that I have often come across this same situation where I am unsure if a thinker means something in one way or another, and it's easy to assume they are flowing along with abstract thinking habits (usually they are). Lately, unless a person is explicitly weaving together metaphysical theories, I try to give the benefit of the doubt and seek the ways in which their concepts can be understood as phenomenological descriptions. If more people gave that same benefit when approaching spiritual science in the archive or on this forum, a lot of misunderstandings and skepticism would be cleared up.

Ashvin,

I have listened again, following your advice to actively search for the possible ways in which JPs thoughts could be expression of a fully phenomenological perspective. That's what I’ve thought, as a result.

“The objective mapping of the symbolic world” that JP ascribes to LLMs, is a purely quantitative mapping. What is weighed, measured, and then mimicked are: the frequencies of occurrence of words and constellations of words in the vicinity of other words and constellations of words; and the quantitative relations, or laws, that can be extracted from those patterns of frequencies. These laws are then used as guidelines to compose quantitatively lawful word-sequences. And when we read those sequences, they sound familiar to us, we can’t help the feeling that there is cognition behind them. In other words, these outputs are elaborated within the boundaries of the purely sensory-quantitative plane. Obviously there is cognition behind the LLMs, but there’s no cognition behind their outputs, although cognitive activity typically gives rise to constellations of words showing patterns comparable to those outputs.

With this in mind, let's go back to the video. Whoever says that LLMs objectively map cognition, does exactly like the kind of natural scientist who says: “When we measure natural phenomena we have no theory, we just do observations and measurements, and what we get is an objective mapping of reality. We simply hold ourselves to pure phenomena.”
What that natural scientist does for the sensory world, JP does for the symbolic world. He says quote-unquote, LLMs-have-mapped-out-the symbolic-world, objectively. The natural scientist who claims to stick to objective mapping of natural phenomena, actually operates under the more or less unseen guidance of a theoretical outlook, rather than a truly phenomenological approach, in their inquiry. In the same sense, JP says “It’s indisputable, it’s not a matter of opinion”, that is, it’s pure phenomenology. But, in truth, his contentions about what LLMs do are elaborated within the context of a certain outlook. Though he sees it as an indisputable evidence - a phenomenology - he has an unseen theory of what a symbol is, and I would confirm that he explains what a symbol is in the same way the concept has been defined on this forum - like an ideal point of balance. That theory is what informs his views on LLMs, and makes him equate their output to an objective mapping of ideas which is “far better than any mapping we humans ever created”.

To his credit I would say that, without Steiners or similar guidance, it is very tricky to stick to phenomenology while exploring human cognition. It’s more than easy to end up equating concept and word because, when we go from the mental picture of all the single flowers we may think of, to the collective concept of flower (another mental picture), we feel we must give up the specific sensory features in each of the singular mental pictures. From there, it’s only a small gesture to get sucked into the symbolic power of the word, and consider the word-symbol “flower” as the perfect candidate to recognize as that collective concept of flower, free from the particular flower features such as color, scent, etcetera. In fact, the word is only the ex-pression of the concept in sensory symbols. As Steiner says, words can do no more than draw our attention to the fact that we have concepts.

Within such a (mis)conception, we understand why JP says: “the symbolic world is the weighing of ideas”. For our part, we know that, from a phenomenological perspective, the concept is the weighing of ideas, the point of balance: we appeal to our life experience of witch encounters, and we stabilize that hyperdimensional meaning into a manageable, conceptual point of balance that we can handle in our limited thinking flow rate, that has to proceed slowly, from one mental pic to the next. JP calls this point of balance “the center”, but what he fails to see is that the overdimensioned (for our cognition) witch-idea is scaled down into the concept of witch, not the word-symbol “witch”.

Saying that the symbolic world is the weighing of ideas, and that LLMs do that, would be like admiring a beautiful portrait painted in realistic style, and state that the inks and the beautifully nuanced visual effects obtained by means of inks and brushes are the weighing of human life and spirit. We miss one crucial step when we think like that: the painter!
Similarly, when we say that the LLM weighs ideas, we miss a crucial step: the LLM ideators in the background, and take the outputs instead as an objective mapping of meaning.
While the portrait surely evokes a specific meaning for us - it makes us think of the real person portrayed, of their life and spirit, inbued by the painter's way - we are very aware that the ink doesn't encapsulate that life. It only gives us a perception somewhat comparable to the one we would experience when seeing the real person. In the same way, the output of LLMs gives us a familiar symbolic experience somewhat comparable to the one we could have if we had come to the linguistic output through our own cognitive capacity or through reading a poem by William Shakespeare. But it’s a dead end, there’s no poet or painter or portrayed person behind that sensory layer.

PS: I have an additional idea, to highlight the same point from another angle (I would prefer if Cleric could confirm this one, though). In our discussions, we have often distinguished the purely sensory aspect of language - the shape of the letter characters; the sounds articulated in the spoken words - from the layer of meaning conveyed by these combinations of sensory cues, once the particular language has been learned. Now, we can notice that, in a LLM, these two layers are actually the same layer. That would probably require even more computational power, however, instead of guessing the next word, we could imagine a LLM that guesses the next ‘digit’ in the sentence, space or letter. It’s only a convenience to use the word scale in LLMs. The fact that we do use that convenience, may trick us into investing the outputs with more meaning than they can bear, since we have learned the language of reference. But the LLM does not know that language. It could just as well do the same job based on characters (though with different principles implemented in the algorithms). Therefore, meaning is to be searched one level up, in the background, in the conception that has resulted in the LLM technology, not in the LLM outputs.

Cleric · Post by **Cleric** » Fri Nov 22, 2024 4:34 pm

Federica wrote: ↑Fri Nov 22, 2024 1:52 pm PS: I have an additional idea, to highlight the same point from another angle (I would prefer if Cleric could confirm this one, though). In our discussions, we have often distinguished the purely sensory aspect of language - the shape of the letter characters; the sounds articulated in the spoken words - from the layer of meaning conveyed by these combinations of sensory cues, once the particular language has been learned. Now, we can notice that, in a LLM, these two layers are actually the same layer. That would probably require even more computational power, however, instead of guessing the next word, we could imagine a LLM that guesses the next ‘digit’ in the sentence, space or letter. It’s only a convenience to use the word scale in LLMs. The fact that we do use that convenience, may trick us into investing the outputs with more meaning than they can bear, since we have learned the language of reference. But the LLM does not know that language. It could just as well do the same job based on characters (though with different principles implemented in the algorithms). Therefore, meaning is to be searched one level up, in the background, in the conception that has resulted in the LLM technology, not in the LLM outputs.

GPT can actually work on the character level. As a matter of fact, the only reason to work with larger chunks (tokens) is because it requires less computation. For example, if we look up the tokens we can see that there's a whole token for " photographic". This whole token can be added to the end of the list in one go. If the dialog is to be constructed symbol by symbol it would take 13 passes for this single word. But on a technical level the model works also with discrete characters. As a matter of fact, most of the tokens are words or fragments of words in English. Other languages and alphabets have much fewer tokens, and those that they have are shorter and rarely whole words. This explains why ChatGPT works much slower when we ask questions in other languages. It is precisely because of what was explained above with " photographic". The words in other languages are concatenated by single letters or short fragments. The model simply needs many more passes to assemble the same length of text as in English. This additionally shows that the layers can't be thought of as operating over words. Neither there are separate layers, some of which process single letters, others whole words. The matrices operate on embedding vectors, they don't know if they correspond to tokens consisting of whole words or single letters. This only shows how convoluted statistics the parameters embed and why it is so difficult to comprehend anything about them by analyzing them.

So it's very important to realize that inside the LLM there's not even a distinction of letter or word! The model doesn't have access to the token list. The tokenization of the input happens outside the model. The first thing that the model receives is a list of numbers (the token numbers) [ 23445, 1123, 999, ... ]. It is completely unknown whether these numbers correspond to a single-letter token, a syllable, or a whole word. It's completely irrelevant even whether these numbers correspond to anything. When the model is trained, only from the outside it is known that, for example, the next token should be 100 (as per the training text). Then all the billions of parameters are tweaked such that it gets closer to that number (of course, this needs to be done simultaneously for all the great amounts of training text).

We should clearly grasp how neither through the training process nor the normal execution, the model sees anything other than lists of numbers. This makes it very clear why ChatGPT is quite incapable of counting symbols. For example:

Q: How many "r"s in "raspberry"?
A: The word "raspberry" has two "r"s—one at the beginning and one near the end.

The reason is that this character information is not present in any way inside the model. In the cases it gets it right, it's rather because of some roundabout relations (maybe it has been trained on an actual text that states the count of a given character in some word).

So we can't really speak of meaning inside the model. The calculation doesn't even 'know' that the numbers it processes (both during the training and execution) are supposed to map to language letters that humans understand. It's nothing more than a function that has been optimized to give the correct numerical output for the training numerical input. It's once again outside the LLM flow, that the resulting token numbers are converted to characters and printed on screen. So the model simply captures the regularities in the input lists.

AshvinP · Post by **AshvinP** » Fri Nov 22, 2024 5:00 pm

Federica wrote: ↑Fri Nov 22, 2024 1:52 pm Ashvin,

I have listened again, following your advice to actively search for the possible ways in which JPs thoughts could be expression of a fully phenomenological perspective. That's what I’ve thought, as a result.

“The objective mapping of the symbolic world” that JP ascribes to LLMs, is a purely quantitative mapping. What is weighed, measured, and then mimicked are: the frequencies of occurrence of words and constellations of words in the vicinity of other words and constellations of words; and the quantitative relations, or laws, that can be extracted from those patterns of frequencies. These laws are then used as guidelines to compose quantitatively lawful word-sequences. And when we read those sequences, they sound familiar to us, we can’t help the feeling that there is cognition behind them. In other words, these outputs are elaborated within the boundaries of the purely sensory-quantitative plane. Obviously there is cognition behind the LLMs, but there’s no cognition behind their outputs, although cognitive activity typically gives rise to constellations of words showing patterns comparable to those outputs.

With this in mind, let's go back to the video. Whoever says that LLMs objectively map cognition, does exactly like the kind of natural scientist who says: “When we measure natural phenomena we have no theory, we just do observations and measurements, and what we get is an objective mapping of reality. We simply hold ourselves to pure phenomena.”
What that natural scientist does for the sensory world, JP does for the symbolic world. He says quote-unquote, LLMs-have-mapped-out-the symbolic-world, objectively. The natural scientist who claims to stick to objective mapping of natural phenomena, actually operates under the more or less unseen guidance of a theoretical outlook, rather than a truly phenomenological approach, in their inquiry. In the same sense, JP says “It’s indisputable, it’s not a matter of opinion”, that is, it’s pure phenomenology. But, in truth, his contentions about what LLMs do are elaborated within the context of a certain outlook. Though he sees it as an indisputable evidence - a phenomenology - he has an unseen theory of what a symbol is, and I would confirm that he explains what a symbol is in the same way the concept has been defined on this forum - like an ideal point of balance. That theory is what informs his views on LLMs, and makes him equate their output to an objective mapping of ideas which is “far better than any mapping we humans ever created”.

To his credit I would say that, without Steiners or similar guidance, it is very tricky to stick to phenomenology while exploring human cognition. It’s more than easy to end up equating concept and word because, when we go from the mental picture of all the single flowers we may think of, to the collective concept of flower (another mental picture), we feel we must give up the specific sensory features in each of the singular mental pictures. From there, it’s only a small gesture to get sucked into the symbolic power of the word, and consider the word-symbol “flower” as the perfect candidate to recognize as that collective concept of flower, free from the particular flower features such as color, scent, etcetera. In fact, the word is only the ex-pression of the concept in sensory symbols. As Steiner says, words can do no more than draw our attention to the fact that we have concepts.

Within such a (mis)conception, we understand why JP says: “the symbolic world is the weighing of ideas”. For our part, we know that, from a phenomenological perspective, the concept is the weighing of ideas, the point of balance: we appeal to our life experience of witch encounters, and we stabilize that hyperdimensional meaning into a manageable, conceptual point of balance that we can handle in our limited thinking flow rate, that has to proceed slowly, from one mental pic to the next. JP calls this point of balance “the center”, but what he fails to see is that the overdimensioned (for our cognition) witch-idea is scaled down into the concept of witch, not the word-symbol “witch”.

Saying that the symbolic world is the weighing of ideas, and that LLMs do that, would be like admiring a beautiful portrait painted in realistic style, and state that the inks and the beautifully nuanced visual effects obtained by means of inks and brushes are the weighing of human life and spirit. We miss one crucial step when we think like that: the painter!
Similarly, when we say that the LLM weighs ideas, we miss a crucial step: the LLM ideators in the background, and take the outputs instead as an objective mapping of meaning.
While the portrait surely evokes a specific meaning for us - it makes us think of the real person portrayed, of their life and spirit, inbued by the painter's way - we are very aware that the ink doesn't encapsulate that life. It only gives us a perception somewhat comparable to the one we would experience when seeing the real person. In the same way, the output of LLMs gives us a familiar symbolic experience somewhat comparable to the one we could have if we had come to the linguistic output through our own cognitive capacity or through reading a poem by William Shakespeare. But it’s a dead end, there’s no poet or painter or portrayed person behind that sensory layer.

PS: I have an additional idea, to highlight the same point from another angle (I would prefer if Cleric could confirm this one, though). In our discussions, we have often distinguished the purely sensory aspect of language - the shape of the letter characters; the sounds articulated in the spoken words - from the layer of meaning conveyed by these combinations of sensory cues, once the particular language has been learned. Now, we can notice that, in a LLM, these two layers are actually the same layer. That would probably require even more computational power, however, instead of guessing the next word, we could imagine a LLM that guesses the next ‘digit’ in the sentence, space or letter. It’s only a convenience to use the word scale in LLMs. The fact that we do use that convenience, may trick us into investing the outputs with more meaning than they can bear, since we have learned the language of reference. But the LLM does not know that language. It could just as well do the same job based on characters (though with different principles implemented in the algorithms). Therefore, meaning is to be searched one level up, in the background, in the conception that has resulted in the LLM technology, not in the LLM outputs.

Federica, I would point attention to this post/comment by Cleric.

The bold part again contains much implicit baggage. All our science and technology can be compared to a function optimization process. Every earthly activity is affected by many variables. We produce technology, tools, yet we modify the World state always in one and the same way - through the intuitive modification of our L-movements and observing the perceptual feedback of the imploding memory picture of the Cosmos. I think the recent advances in machine learning (which is a fancy word precisely for a function optimization problem) clearly show that. More and more scientists become skeptical that there are intellectually graspable mathematical laws of Nature at its foundation (which would imply that old Keplerian thinking - that God animates the World content based on intellectual mathematical thinking). Recent successes in protein folding, complex gravitational simulations, etc., show that we can pretty decently mimic these systems by taking a function with millions of coefficients and tweaking them until the function fits the training data. With this no one is under the illusion that this is how reality works - we simply mimic its appearances (the parts that we can quantify). However, on the positive side, this may help us understand that our traditional physics thinking is not that different after all. It all boils down to realizing that the higher order minds do not move their Ls according to computations in the way we manifest them in our stepwise cognitive sequences. This of course doesn't mean that these L-movements do not exist in certain lawful relations. It's precisely the latter that implode into the Cosmic memory tableau which seems structured and lawful. Morph the Ls - that is, steer our intuitive intents differently - and the imploding picture of the World state morphs too.

I believe JP fits into this category of "no one who is under the illusion that this is how reality works". As we can see from this and many other discussions, JP leans toward psycho-spiritual meaning as the foundation of phenomenal reality and its dynamics, including our own spiritual activity. He realizes we are only 'mapping' or 'modeling' the appearances of this psycho-spiritual meaning through LLMs which are designed to mimic certain aspects our linguistic cognition. In other words, he doesn't equate the dynamics of those appearances to the higher order L-movements of the contextual minds, but treats all scientific models as symbols for a mysterious and perhaps ineffable reality. If anything, the latter part is his major blind spot right now, which is quite common for anyone who goes down the path of Christian idealism but lacks the PoF-style insights of where 'reality-itself' and phenomenal appearances overlap in our real-time thinking. He simply doesn't realize to what extent we can intimately know the higher-order L-movements and establish relationships with them just like we establish relationships with other human souls (this is why he often says the highest meaning comes from taking on the responsibility of family and children, for example).

At a more practical level, I think if you were correct about JP's more rigid metaphysical understanding of LLM and its relation to human cognition, we would see him advocating for more and more sophisticated AI tech as a means of understanding our own psycho-spiritual nature, similar to Levin. But that's exactly the opposite of what he is doing. Instead he is sounding the warning on AI much like a spiritual scientist would, on how we need much deeper wisdom to understand what's at stake with these emerging technologies and how they can constrain our creative responsibility within the meaningful flow of existence. It would be very interesting to hear him comment on Levin and his research, because I suspect that he would voice many of the same concerns we have voiced here, of course in a less spiritually scientific manner and more on the foundation of scaled images drawn from traditional religious narratives (although he does an admirable job of connecting these with the latest scientific research in neuroscience, psychology, evolutionary theory, etc. as well).

AshvinP · Post by **AshvinP** » Fri Nov 22, 2024 7:06 pm

I also want to share this lecture by JP, which gives a broad overview of his intuitions, ideas, and his general way of thinking through them. I think we can easily discern the resonance with PoF-style phenomenology. One can use the time-stamps to skip around - particularly the sections on microcosm/macrocosm, what matters vs. matter, what dispels pain, stories and narratives as the lens through which we perceive the World (in a very literal sense), the Spirit that you walk with. That last section particularly highlights his phenomenological psycho-spiritual approach which he says is not a "secondary overlay" on reality but is primary.

(we can move this to a different thread since it's not directly speaking to AI/LLM)

Federica · Post by **Federica** » Fri Nov 22, 2024 8:44 pm

Cleric wrote: ↑Fri Nov 22, 2024 1:47 pm Right, there's nothing cognitive in this 'goal'. It can be helpful to go through the first two videos in the playlist. From there we can gain pretty good intuition about how a very simple network can recognize handwritten numbers. Grant says how we may think that the first layer may, for example, detect edges, while the second layer - their patterns. This gives us some intuitive sense that every layer somehow integrates the numbers from the previous. However, as he reveals at the end of the second video, in practice these two layers don't do anything comparable to what we may call "edge detection" or "shape detection".

In some cases it may turn out that layers indeed extract certain regularity that makes sense to us humans, but this doesn't need to be the case. We should remember that training always starts with random weights (knobs in random positions). Then as we pass the training data through, we determine how the knobs need to be nudged in order to change the initially random result more into the direction of what it should be according to the pre-labeled training data. So nowhere in this process do we say what each of the layers should be doing. It's all about tweaking the parameters until the input data produces the expected result.

It is similar with the language models. One head of the attention layer cross-references each token with every other. It is conceptually helpful to think of something humanly recognizable, such as whether the attention head highlights whether a noun is coupled with an adjective. And it might indeed be the case that something like that really happens after training. However, no one has specifically sought this. It's just how the layer turned out after the optimization (training). These are simply numbers (the weights, knob positions) that work and satisfy the training data - nothing more. Each head of the attention layers may turn out to be sensitive to certain regularities and patterns. This is to be expected. We may then say "The 'goal' of this attention head is to extract such and such regularities from the embeddings."

Imagine that you are making a voice transformation device that works similarly to a neural network but passes the sound through a network of audio filters that are initially randomly tuned. Then you begin to tweak the knobs until eventually you make your voice sound in the way you want. Then you look at one of the already tuned filters and see that it cuts the mid-ranges. Then you can say "The goal of this filter is to cut out the mid-frequencies." Of course, we can't speak of a cognitive goal here. We didn't intend what that filter should be doing, we simply tweaked the knobs of all filters until our voice sounded like we wanted. What the particular filter is doing (its 'goal') is simply the result of the state it ended up after tweaking the knobs. It's pretty much exactly the same with the tuning and 'goals' of the layers of the LLM. In other words, the 'goal' is simply a synonym for what we can recognize that element is doing.

From this also proceeds the great difficulty in analyzing what the models are doing internally. We would very much like to point at a certain layer and say "this detects the nouns" or something like that. But it turns out things very rarely work in such a convenient for us way. Instead, the data is transformed in quite obscure ways until at the end it converges to something. We can get some additional sense of why this should be the case when we consider that most of the tokens are actually not complete words (you can see the list of the tokens that GPT uses here). It's obvious that for a great amount of these tokens it doesn't make sense to ask "Is it a noun?" because they are not even words. So there's no reason why some layer, after tuning, would turn out to have the 'goal' of detecting nouns. There may be something like this, but everything is much more hybridized and obscured. It simply works because we have tweaked the knobs in the direction of working. No one has incentivized the layers to assume certain roles and goals. For example, hypothetical noun detection can work in extremely unintuitive ways, by being distributed in different layers. In that sense, one layer does something that doesn't make much sense to us humans. But when later the effects of another layer are added, the cumulative effects give the answer. This is an artificial example but comes to show that there's nothing in the training process or the general network design that demands layers should have well-defined functions which on top of that should make sense to us humans.

Thanks, Cleric, I've watched the videos. It is actually a question I had, how it works when I send in handwritten tax forms and the figures are automatically and accurately read

I see how 'goal' is used to signify a type of action characterized by some common direction, performed by a layer. This also clarifies what GS said in the LLM intro, that the knobs can be hundreds of billions, and that "no human ever deliberately sets those parameters". I see it must be somewhat similar to the neural network example, only with many more parameters: these are not manually set, but there still is some equivalent of a thoughtfully constructed "cost function" to initiate this backpropagation.

Federica · Post by **Federica** » Fri Nov 22, 2024 9:01 pm

Cleric wrote: ↑Fri Nov 22, 2024 4:34 pm GPT can actually work on the character level. As a matter of fact, the only reason to work with larger chunks (tokens) is because it requires less computation. For example, if we look up the tokens we can see that there's a whole token for " photographic". This whole token can be added to the end of the list in one go. If the dialog is to be constructed symbol by symbol it would take 13 passes for this single word. But on a technical level the model works also with discrete characters. As a matter of fact, most of the tokens are words or fragments of words in English. Other languages and alphabets have much fewer tokens, and those that they have are shorter and rarely whole words. This explains why ChatGPT works much slower when we ask questions in other languages. It is precisely because of what was explained above with " photographic". The words in other languages are concatenated by single letters or short fragments. The model simply needs many more passes to assemble the same length of text as in English. This additionally shows that the layers can't be thought of as operating over words. Neither there are separate layers, some of which process single letters, others whole words. The matrices operate on embedding vectors, they don't know if they correspond to tokens consisting of whole words or single letters. This only shows how convoluted statistics the parameters embed and why it is so difficult to comprehend anything about them by analyzing them.

So it's very important to realize that inside the LLM there's not even a distinction of letter or word! The model doesn't have access to the token list. The tokenization of the input happens outside the model. The first thing that the model receives is a list of numbers (the token numbers) [ 23445, 1123, 999, ... ]. It is completely unknown whether these numbers correspond to a single-letter token, a syllable, or a whole word. It's completely irrelevant even whether these numbers correspond to anything. When the model is trained, only from the outside it is known that, for example, the next token should be 100 (as per the training text). Then all the billions of parameters are tweaked such that it gets closer to that number (of course, this needs to be done simultaneously for all the great amounts of training text).

We should clearly grasp how neither through the training process nor the normal execution, the model sees anything other than lists of numbers. This makes it very clear why ChatGPT is quite incapable of counting symbols. For example:

Q: How many "r"s in "raspberry"?
A: The word "raspberry" has two "r"s—one at the beginning and one near the end.

The reason is that this character information is not present in any way inside the model. In the cases it gets it right, it's rather because of some roundabout relations (maybe it has been trained on an actual text that states the count of a given character in some word).

So we can't really speak of meaning inside the model. The calculation doesn't even 'know' that the numbers it processes (both during the training and execution) are supposed to map to language letters that humans understand. It's nothing more than a function that has been optimized to give the correct numerical output for the training numerical input. It's once again outside the LLM flow, that the resulting token numbers are converted to characters and printed on screen. So the model simply captures the regularities in the input lists.

Yes, great. Thanks again!

Federica · Post by **Federica** » Fri Nov 22, 2024 9:51 pm

AshvinP wrote: ↑Fri Nov 22, 2024 5:00 pm Federica, I would point attention to this post/comment by Cleric.

The bold part again contains much implicit baggage. All our science and technology can be compared to a function optimization process. Every earthly activity is affected by many variables. We produce technology, tools, yet we modify the World state always in one and the same way - through the intuitive modification of our L-movements and observing the perceptual feedback of the imploding memory picture of the Cosmos. I think the recent advances in machine learning (which is a fancy word precisely for a function optimization problem) clearly show that. More and more scientists become skeptical that there are intellectually graspable mathematical laws of Nature at its foundation (which would imply that old Keplerian thinking - that God animates the World content based on intellectual mathematical thinking). Recent successes in protein folding, complex gravitational simulations, etc., show that we can pretty decently mimic these systems by taking a function with millions of coefficients and tweaking them until the function fits the training data. With this no one is under the illusion that this is how reality works - we simply mimic its appearances (the parts that we can quantify). However, on the positive side, this may help us understand that our traditional physics thinking is not that different after all. It all boils down to realizing that the higher order minds do not move their Ls according to computations in the way we manifest them in our stepwise cognitive sequences. This of course doesn't mean that these L-movements do not exist in certain lawful relations. It's precisely the latter that implode into the Cosmic memory tableau which seems structured and lawful. Morph the Ls - that is, steer our intuitive intents differently - and the imploding picture of the World state morphs too.

I believe JP fits into this category of "no one who is under the illusion that this is how reality works". As we can see from this and many other discussions, JP leans toward psycho-spiritual meaning as the foundation of phenomenal reality and its dynamics, including our own spiritual activity. He realizes we are only 'mapping' or 'modeling' the appearances of this psycho-spiritual meaning through LLMs which are designed to mimic certain aspects our linguistic cognition. In other words, he doesn't equate the dynamics of those appearances to the higher order L-movements of the contextual minds, but treats all scientific models as symbols for a mysterious and perhaps ineffable reality. If anything, the latter part is his major blind spot right now, which is quite common for anyone who goes down the path of Christian idealism but lacks the PoF-style insights of where 'reality-itself' and phenomenal appearances overlap in our real-time thinking. He simply doesn't realize to what extent we can intimately know the higher-order L-movements and establish relationships with them just like we establish relationships with other human souls (this is why he often says the highest meaning comes from taking on the responsibility of family and children, for example).

At a more practical level, I think if you were correct about JP's more rigid metaphysical understanding of LLM and its relation to human cognition, we would see him advocating for more and more sophisticated AI tech as a means of understanding our own psycho-spiritual nature, similar to Levin. But that's exactly the opposite of what he is doing. Instead he is sounding the warning on AI much like a spiritual scientist would, on how we need much deeper wisdom to understand what's at stake with these emerging technologies and how they can constrain our creative responsibility within the meaningful flow of existence. It would be very interesting to hear him comment on Levin and his research, because I suspect that he would voice many of the same concerns we have voiced here, of course in a less spiritually scientific manner and more on the foundation of scaled images drawn from traditional religious narratives (although he does an admirable job of connecting these with the latest scientific research in neuroscience, psychology, evolutionary theory, etc. as well).

As said, I have not been following him and don't have a general impression, as you do. I didn't expect you to agree with me, but I am a bit surprised that you give him a pass when he says that his contentions are not a matter of opinion. They are irrefutable. (Sorry to insisit but, moreover, what he calls irrefutable is an inaccurate illustration of symbol - inaccurate not according to me, as said). In general you don't seem to overly appreciate postures like this?

if you were correct about JP's more rigid metaphysical understanding of LLM and its relation to human cognition, we would see him advocating for more and more sophisticated AI tech as a means of understanding our own psycho-spiritual nature

Perhaps my impressions based on that conversation alone are misled. I will watch the new video you've shared. Still, specifically on LLMs, I have googled "Jordan Peterson LLMs" and picked the first result. Maybe it's bad luck, but here he sounds enthusiastic about LLMs. LLMs have finally come to prove postmodernism wrong, because the meaning postmodernists don't want to recognize is irrefutably demonstrated by the LLMs:

“…like the whole map, essentially, right? And they do that statistically, they do that mathematically. So what that means now is that if these models are programmed honestly, and trained honestly - a very difficult thing to manage - then we can use statistics to evaluate the implicit structure of meaning, and that's what literary critics have been doing forever! When Harris argued with me he said: “Well, that's just your interpretation of the biblical texts - which turned him instantly into a postmodernist right? - there's an infinite number of interpretations of any text and there's no canonical order between them”. That's the postmodernist claim: the lack of meta-narrative, let's say, which means there's no union, no comprehensibility. It means everything fragments, ultimately. Which is what they wanted, so they could dance in the ruins and pursue their own short-term gratification with power as their hypothetical guardian and guide. Terrible. These interpretations aren't arbitrary! They're not arbitrary, they're coded into the language! Without that coding, their language would not be comprehensible.”

Without that coding, their language would not be comprehensible??

The problem with this view is the same again: LLMs mimic word sequences, not ideas. He says: "The meaning of a word is
coded in its relationship to other words". Then comes another pyramid, where semantics are at the top, above the imaginative world, world that "the LLM will soon be able to model". Well, it is clear, we have multiple issues here. I recognize there is an eagerness to understand the structures of reality. But, these thoughts are problematic.
Do you agree with what he says in this video, in particular the quote above? If you say yes, well, I will be somewhat surprised.

AshvinP · Post by **AshvinP** » Sat Nov 23, 2024 12:05 am

Federica wrote: I didn't expect you to agree with me, but I am a bit surprised that you give him a pass when he says that his contentions are not a matter of opinion. They are irrefutable. (Sorry to insisit but, moreover, what he calls irrefutable is an inaccurate illustration of symbol - inaccurate not according to me, as said).

The idea of "inaccurate illustration of symbol" is very internally dissonant to me. It concatenates first order content level (realm of 'facts') of "accurate vs inaccurate" with second order imaginative level of illustrating what the facts could symbolically mean. There can be better or worse illustrations in various contexts, misleading illustrations when not accompanied with enough context (but a lot of this also depends on the recipient thinker's habits), and so on, but I wouldn't call it inaccurate and would try not to let the perceived quality of the illustration deviate my thinking too far from the spirit in which it is offered.

I also don't think the experiential contentions he outlines are a matter of opinion.

Federica wrote: ↑Fri Nov 22, 2024 9:51 pm Perhaps my impressions based on that conversation alone are misled. I will watch the new video you've shared. Still, specifically on LLMs, I have googled "Jordan Peterson LLMs" and picked the first result. Maybe it's bad luck, but here he sounds enthusiastic about LLMs. LLMs have finally come to prove postmodernism wrong, because the meaning postmodernists don't want to recognize is irrefutably demonstrated by the LLMs:

“…like the whole map, essentially, right? And they do that statistically, they do that mathematically. So what that means now is that if these models are programmed honestly, and trained honestly - a very difficult thing to manage - then we can use statistics to evaluate the implicit structure of meaning, and that's what literary critics have been doing forever! When Harris argued with me he said: “Well, that's just your interpretation of the biblical texts - which turned him instantly into a postmodernist right? - there's an infinite number of interpretations of any text and there's no canonical order between them”. That's the postmodernist claim: the lack of meta-narrative, let's say, which means there's no union, no comprehensibility. It means everything fragments, ultimately. Which is what they wanted, so they could dance in the ruins and pursue their own short-term gratification with power as their hypothetical guardian and guide. Terrible. These interpretations aren't arbitrary! They're not arbitrary, they're coded into the language! Without that coding, their language would not be comprehensible.”

Without that coding, their language would not be comprehensible??

The problem with this view is the same again: LLMs mimic word sequences, not ideas. He says: "The meaning of a word is
coded in its relationship to other words". Then comes another pyramid, where semantics are at the top, above the imaginative world, world that "the LLM will soon be able to model". Well, it is clear, we have multiple issues here. I recognize there is an eagerness to understand the structures of reality. But, these thoughts are problematic.
Do you agree with what he says in this video, in particular the quote above? If you say yes, well, I will be somewhat surprised.

For me, what JP says in that quote seems self-evident, of course if we take words like 'coding', 'network', 'stacked discs', and so forth as symbols for non-computational spiritual processes, as we normally do on this forum. I mean, what else could the words and sequences of words be reflecting back except the 'implicit structure of meaning' at some scale of inner activity? I find what he says in this clip to be another way of speaking about the (potentially) concentrically aligned spheres of inner activity (intellectual/linguistic, imaginal, and beyond), which he describes as 'isomorphic' (which is also the term he used in the interview with Hoffman to question the latter's dashboard illusionism, where the 'noumenal' network of CAs is considered entirely orthogonal to our ordinary cognitive activity). We may speak of these as the self-similar temporal rhythms across all scales, which are also spatialized at our intellectual scale. Even our prosaic word sequences preserve these isomorphic narrative patterns to some extent, although usually in hardly recognizable form. Something lIke LLM helps brings that implicit narrative form more into focus by training on infinite sequences.

And I think we have all spoken about this relationship in respect to LLMs in various ways and at various times:

Cleric: What I have found of value is to contemplate how our human knowledge dispersed through the Internet (on which the GPT model is trained) has been compressed into different categories...Chatting with GPT may provide an interesting experience for some people. This can happen only if we're willing to learn something about ourselves... If we approach GPT with willingness to learn something about the way we tick, we'll soon have the strange feeling how in the language model have been summarized the main channels in which human cognition flows. This shouldn't be confused with explanation how our cognition works. It's only an abstract categorization of the main patterns in which present humanity's thinking flows... In a way GPT can stimulate us to feel certain shame when we see how superficially we spend our lives in the linguistic labyrinth. This might inspire us to seek what our true human worth is about.

Federica: They do illustrate and reveal, in their makeup, the quality of certain human cognitive patterns...

I see JP using GPT/AI in a same mirror-like way for ordinary cognitive pathways and, most importantly, his aim is to inspire listeners in a direction away from postmodern power narratives, where our identity fragments more and more into horizontally competing "interpretations" of reality based on uexamined soul factors, toward our more integrated archetypal nature where we are swimming in the shared moral intuitions that structure reality and naturally lose interest in the power games.

Generally speaking, through our modern scientific thinking, we are finding ways of conducting more of the implicit cognitive structure, mediated by our intellectual symbols, through the bodily will into our technologies as consciousness grows in resonance with the etheric spectrum (just as we see with Levin's research). It's only a matter of how conscious we can become that this is happening and JP is more conscious of it than many other current intellectual thinkers. For example we can notice the alignment of this clip on dreams (imaginal space) with spiritual scientific understanding, i.e. how the former overlaps with and modulates our intellectual-artistic thinking space:

Clearly JP's intuition of these things is not as fleshed out and refined as ours, nor does he suspect the isomorphically nested scales can be cognitively experienced beyond our nebulous intuition of their existence. He doesn't suspect they can come 'into focus' at our cognitive horizon in the same way as our inner voice is currently in focus. That's why the intuition remains rather nebulous. Yet, beyond that, I see no reason to be surprised at his comments on LLM which, for me, are entirely in keeping with his overall spiritual outlook which discerns continuity between the archetypal moral/value spheres of activity (symbolic world) and the perceptual flow of daily experience which we commonly associate with a 'material world' (the objective realm of facts, as he usually puts it).

I will add that it's slightly possible he is overestimating how much a technology like LLM can explain 'how cognition works', since like most people he is tempted to conceive of higher-order scales as similar to our familiar intellectual-linguistic movements in many ways, although I am not sure about that and I think other discussions have highlighted how he is wary of reducing the Spirit to our standard conceptions and rational movements. Fundamentally I think he is safeguarded more than others from the reductive intellectual tendency through his explicit allegiance to emulating the Christ impulse across the layers of thinking-feeling-willing.

PS - did you notice what he says around 5:50 min in that clip? "there's nothing arbitrary about that, there's no 'the meaning is only in the text' - that's the ultimate claim of the disembodied, rational, prideful intellect... 'it's all in the words', like no, no no..."

Philosophical Speculations

ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)

Re: ChatGPT answers metaphysical questions :)