We are in the 640KB of RAM stage of AI accelerators. Current top models are measured in hundreds of billions of parameters. Future models will have many trillions. We've only had GPU/AI processors large enough to run GPT-4o or Llama 3.2 400B for a few years. It is silly to think we have already maxed out this approach.
Look at what Cerebras is doing with wafer scale AI chips that have 900,000 cores and can do 125 FP16 petaFLOPS. The most powerful Nvidia chip is 2.25 petaFLOPS. We've hit a local maximum at worst.
You could say that RAM has had diminishing returns. My laptop has 26,214x the RAM of my original PC. It is certainly way more capable, but is it 26,214x more capable? After all, diminishing returns are still returns.
I still think that Anthropic's latest mini model performing better than their best model from 12-months ago suggests that we are far from meaningfully diminishing returns.
Look at what Cerebras is doing with wafer scale AI chips that have 900,000 cores and can do 125 FP16 petaFLOPS. The most powerful Nvidia chip is 2.25 petaFLOPS. We've hit a local maximum at worst.