> We want to abdicate responsibility of this kind of memory management to the interpreter, just like we want to abdicate responsibility for handling processor cache levels. If you can articulate how all the data you use is laid out in memory all the time, you are majorly micro-managing the runtime.
Unfortunately, in practice, it's not possible to be oblivious of the layout of data in memory if we want to write fast code. The CPU/memory speed disparity graph [1] shows the new reality for programmers: the slowest part of a program is bringing data from RAM into the CPU registers. Fortunately, modern CPUs have very fast caches that help amortize this cost and it's the responsibility of the programmer to organize data to take advantage of that fast hardware—the compiler cannot do it. That's why two functions, with the same algorithmic complexity, and which compute the same result, can have an order of magnitude of difference in performance between them [2]. The famed sufficiently smart compiler that can do those transformations does not yet exist as far as I know.
If the slowest part of your program is waiting for RAM to get into your CPU registers, then you have one of the most blazingly fast programs ever written. Grats! The vast majority of software has much, much, much lower-hanging fruit than that: things like O(n^2) algorithms where O(n) exists, SELECT N+1 issues, missing database indexes, missing easy wins with caching, repeated work, threads blocking each other, loading more data than necessary, throwing and then suppressing exceptions everywhere, bad netcode, doing work serially that could be paralyzed, etc. In this software, fixing these issues will be 10x easier and result in 10x larger speedups than worrying about organizing your data to get loaded from RAM into CPU caches more quickly. That makes this advice counterproductive for most programmers to hear.
Unfortunately, in practice, it's not possible to be oblivious of the layout of data in memory if we want to write fast code. The CPU/memory speed disparity graph [1] shows the new reality for programmers: the slowest part of a program is bringing data from RAM into the CPU registers. Fortunately, modern CPUs have very fast caches that help amortize this cost and it's the responsibility of the programmer to organize data to take advantage of that fast hardware—the compiler cannot do it. That's why two functions, with the same algorithmic complexity, and which compute the same result, can have an order of magnitude of difference in performance between them [2]. The famed sufficiently smart compiler that can do those transformations does not yet exist as far as I know.
[1] https://gameprogrammingpatterns.com/images/data-locality-cha... [2] https://play.rust-lang.org/?version=stable&mode=release&edit...