Because a function clearly defines the scope of the state within it, whereas a section of code within a long function does not. Therefore a function can be reasoned about in isolation, which lowers cognitive load.
I don't agree. If there are side effects happening which may be relevant, the section of code within a long function is executing in a clearly defined state (the stuff above it has happened, the stuff below it won't happen until it finishes) while the same code in a separate function could be called from anywhere. Even without side effects, if it's called from more than one place, you have to think about all of its callers before you change its semantics, and before you look, you don't know if there is more than one caller. Therefore the section of code can be reasoned about with much lower cognitive load. This may be why larger subroutines correlate with lower bug rates, at least in the small number of published empirical studies.
The advantage of small subroutines is not that they're more logically tractable. They're less logically tractable! The advantage is that they are more flexible, because the set of previously defined subroutines forms a language you can use to write new code.
Factoring into subroutines is not completely without its advantages for intellectual tractability. You can write tests for a subroutine which give you some assurance of what it does and how it can be broken. And (in the absence of global state, which is a huge caveat) you know that the subroutine only depends on its arguments, while a block in the middle of a long subroutine may have a lot of local variables in scope that it doesn't use. And often the caller of the new subroutine is more readable when you can see the code before the call to it and the code after it on the same screen: code written in the language extended with the new subroutine can be higher level.
You can write long functions in a bad way, don't get me wrong. I'm just saying the rule that the length itself is an anti-pattern has no inherent validity.