You appear to be confusing the UI over the LLM with the actual working of the LL...

kgeist · 2025-09-23T21:19:19 1758662359

I have an idea of how they work. At the end of the day, it's all just numbers. You say you put "one message", but in reality it's N input numbers corresponding to the tokens (where N can very from 1 to any size). Basically N variables. An LLM is one large function which takes N values as input and produces a new output value. So you can view an LLM as y = f(a, b, c, ...)

What I don't understand is how the line is drawn between "past" and "present". Why are variables a,b,c viewed as "present" and not "past"? How is past defined? Why can't you view token generation as f(past1, past2, past3, present)?

As I said, it does look "Markovian" if the number of "pasts" is fixed, but again, what exactly is not Markovian then? Infinite inputs? Everything in the real world is finite.

fluoridation · 2025-09-23T21:30:53 1758663053

Let's keep it simple and say that the procedure of an LLM is

  state: list[n];
  while (true){
      new_token = llm(state);
      state.pop_front();
      state.push_back(new_token);
  }

Any given time that llm() is called, it is blind to previous invocations. The state may contain evidence of them, or it may not. It's a free-form data structure that is sometimes filled with user input and sometimes contains previous llm() outputs, but llm() doesn't care.

kgeist · 2025-09-23T21:45:47 1758663947

So any pure function is Markovian then? They too only depend on their input and are blind to previous invocations. I’m just trying to understand the practical purpose of calling something a Markov chain beyond small sizes, because if we allow the state to be of any size, I’m not sure what the usefulness is: anything can be called a Markov chain if you concatenate all input data of any size into one blob and interpret it as "one state."

fluoridation · 2025-09-23T22:07:25 1758665245

>So any pure function is Markovian then?

I guess it would be more accurate to say that at its core a Markovian process is based around a pure function. Usually there's some kind of feedback loop around the function that allows the process to evolve in some way and continue producing output.

>I’m pretty sure anything can be modeled using pure functions.

We're not talking about theoretical models of black boxes, though, we're talking about known mechanisms. A recursive function doesn't become iterative just because it could be thought of as iterative (because eventually the CPU does loop over some sequence of instructions). We still say it's recursive.

>I’m not sure what the usefulness is

I suppose part of the usefulness is to de-mystify LLMs. If you can understand what Markov chains do, the thought "oh. An LLMs is just a much more complex Markov chain" can help reasoning about them.

drdeca · 2025-09-23T23:24:12 1758669852

“just”?

If you include the entire past history as part of the state, then any stochastic process becomes a Markov process.

Any computer program (without network stuff or inputs for a time (assume your computer has a true RNG module)) you could run is a Markov process. Just say that your state space is the space of possible ways the memory of the computer can be.

Saying that the LLM is “just” a Markov process seems to be doing something a bit weird with that “just”.

fluoridation · 2025-09-23T23:49:43 1758671383

>Any computer program (without network stuff or inputs for a time (assume your computer has a true RNG module)) you could run is a Markov process. Just say that your state space is the space of possible ways the memory of the computer can be.

No. The entire computer is a Markov process in that situation. An individual process may still remember its past states and therefore not be Markovian. See the example I gave about allowing or blocking transactions.

drdeca · 2025-09-24T01:18:37 1758676717

By “memory” I was including whatever is on disk, and the values of all the CPU registers, and all the CPU flags, etc.

So, yeah, the state of the computer (regarded as an abstract machine, not viewed at the level of the physical hardware)

Oh, you mean because I said a particular program, and other things on the computer could interfere with the program. Ok, point. I was imagining a computer that just had a single program running on it (with no separate OS) I guess? But yeah, good point, I certainly didn’t make my idea precise enough (in my own mind or in my words).

fluoridation · 2025-09-24T02:15:01 1758680101

Yes, I understood what you meant and my answer assumes that the contents of all memory -- volatile and persistent -- constitute the state of the machine.

It doesn't matter whether the program is the only thing the computer is doing or whether there are other things running on the same machine. The program considered as a machine unto itself is a conceptually different thing from the underlying hardware. The Markov property is a way to reason about how a process interacts with its state. "I could design a Markov process that's semantically equivalent to a non-Markov process, by just including more state" is a pointless statement. The whole point of the discrimination is to reason about the thing you're studying.