Here's a concrete example of what is meant by the recursion f(f(x). Let's take a very simple sorting technique (I'm not sure, I guess it would be a form of Bubble sort). To make it more algebraic we can concatenate the numbers into a single number, i.e. if we use your example we start with x=31254.Federica wrote: ↑Sat Nov 09, 2024 9:59 pm I don't think the bold is what the experiment entails. There is no f(f(x)). If the whole sorting algorithm is f(x) - that is, the whole process that leads from say 3,1,2,5,4 to 1,2,3,4,5, then that's the end of the experiment. There is no iteration. Then Levin just goes home, and the next day he throws the dice again, to get a new initial configuration, say 4,1,3,2,5, and runs the same f(x) again, on this new "x". There is no f(f(x)).
We define f(x) such that it traverses pairs of digits from left to right and swaps the first pair where the left digit of the pair is greater than the right. I'll color the pairs that will be swapped by the function.
f(31254) → 13254
f(13254) → 12354
f(12354) → 12345
f(12345) → 12345
...
We can consider that the number is sorted when f(x) = x (that is, there's nothing more to swap).
Now we can of course define a function that handles the whole process as you suggest. It might be sort(x). In human language, we can define it as "iterate f(x) → x, until f(x)=x and then return x".
No sorting algorithm, however, can sort a list in one atomic step. Of course, not all algorithms iterate in such an obvious way as above but nevertheless, they always need to transform the list through many repeating steps.
ML's paper investigates precisely this path that the number takes from its original state to the sorted state. It is about these intermediary iterations that he says that the result of an iteration may look more unsorted than the previous one, yet still lead to the final sorted target. It is this step that is dubbed 'delayed gratification'. A lot can be said here but it suffices to say that 'less sorted' and 'more sorted' depend on what kind of metric we use and that metric is entirely coupled with the sorting algorithm itself. My guess is that ML approaches the problem through the lens of something like bubble sort, where it is very easy to say what is more or less sorted. A metric can be seen simply as how many steps away we are from the final result. When, however, the different algotypes are mixed, when some cells are frozen, etc., we actually build a new hybrid sorting algorithm. Now if this algorithm is working at all, with each iteration it will get one step closer to the final sorted result. And here's the critical thing: this step may look less sorted as measured by the Bubble metric! And this is where the simple logical error lies. We judge the iterations with Bubble logic and say "Aha! Here the experiment steps into a number that is less sorted (yet as measured according to our Bubble metric!!!). Thus it somehow goes around the barrier, it is willing to temporarily go into a less sorted state but as if having the insight that it will later make it up." Yet, according to the hybrid algorithm that we have built and its own metric, there's no such 'going around a barrier'. Every step of the hybrid algorithm (assuming a working algorithm) gets us one step closer to the final result. Nothing more, nothing less. This is the hybrid metric that we should consider. According to this metric there's no going around but going step by step straight toward the target. "In the eyes" of this metric, every step is "more sorted" because it is one step closer to the final result. The hybrid algorithm doesn't 'know' or 'care' that its next step may seem less sorted 'in the eyes' of another metric (such as the Bubble's).
Of course, one may argue that this is precisely the point - that the hybrid algorithm represents a higher-order morphic space that follows the principle of least action from within its perspective, which from another perspective may seem like delayed gratification. This is probably what ML would respond in the face of the above.
Yet this is precisely where the whole insight about 'no privileged scale of causation' crumbles. In the above, we have a single plane of causation - the iron necessity of the algorithmic steps. From such a view, the different scales (in our case they are like synonyms of metrics) are nothing but analytical lenses through which we assess whether the system is following a geodesic (the path of least action, or the straightest path toward the end result of computation) in some specific metric. This however has no causal significance whatsoever.
Think of it in the following way.
This is a simple mechanical device that should take much of the 'magic' of modern computers away. Now imagine that by having a large enough board and enough parts, you can mechanically implement even some of the sorting algorithms. You can imagine that you do that without any understanding, as a child would arrange the parts simply by looks. Of course, the very vast majority of arrangements won't do anything useful but let's assume that you are lucky and you manage to put the parts in configurations that do sorting. It's even easier to imagine the handicaps here - for example, you can glue some of the parts (i.e. frozen/dead cells). But anyway. The main thing is that the movements of the marbles are very clearly understandable. They depend entirely on very basic intuition about 'rolling down', 'inclination of a surface', etc. Now ask yourself the question: at what point in your experiments (remember that you may not even know that the arrangements can be interpreted as doing something meaningful (like sorting)), you'll be justified to say "Aha! Now a different plane of causation is intervening in my experiment! The intuition of rolling marbles is not enough to comprehend what I see."?