> This reads like a hand wavy reference to some sort of programmable systolic ar...

> This reads like a hand wavy reference to some sort of programmable systolic array thing.

Indeed, systolic arrays are a big inspiration. Like also the Xputer, the Reduceron, computational RAM, parts of IBM's TrueNorth chips, and of course in parts current AI and GPU hardware.

> Ok fine; but for statefull calculations that need to load and or store information per compuatational element it’s not the win one might think.

The whole point of my idea is to mitigate this.

Exactly this problem is the famous von Neumann bottleneck. And it's holding back any progress we could probably make regarding computer hardware.

Imho the whole misery we're in is due the fact almost all "basic" stuff is build in languages that are married to the "C-Machine". The core of the problem: The "C-Machine" can't be build efficiently with hardware that does not simulate a von Neumann computer. So we're stuck.

The only way out of this "local optimum", like I call it, would be to change the basic programming model. From imperative to some mix of declarative data-flow with FP elements. Than "rebuild the world" on top of that… (Simple as that, isn't it? :-D).

One can't fix the von Neumann bottleneck as long as it's possible (or actually required!) to be able to jump around freely in memory. So we need to change that! Pure FP languages could pose a way around that basic problem. In FP languages one can "contain" all state manipulations on the language level in isolated parts of the program. Everything between this parts is pure data-flow. So you could have explicit "machines" that manipulate state and talk to the outside world strongly separated form the parts that do only pure transformations on data. Those "machines" would be the "processing units" (with their attached scratch-pad memory) that I'm talking about in the other long sibling comment.