We need software and hardware to cooperate on this. Specifically, threads from different security contexts shouldn't get assigned to the same core. If we guarantee this, the fences/flushes/other clearing of shared state can be limited to kernel calls and process lifetime events, leaving all the benefits of caching and speculative execution on the table for things actually doing heavy lifting without worrying about side channel leaks.
I get you, but devs struggle to configure nginx to serve their overflowing cauldrons of 3rd party npm modules of witches incantations. Getting them securely design and develop security labelled cgroup based micro (nano?) compute services for inferencing text of various security levels is beyond even 95% of coders. I'd posit that it would be a herculean effort even for 1% devs.
It's not a "just" if the fix cripples performance; it's a tradeoff. It is forced to hurt everything everywhere because the processor alone has no mechanism to determine when the mitigation is actually required and when it is not. It is 2025 and security is part of our world; we need to bake it right into how we think about processor/software interaction instead of attempting to bolt it on after the fact. We learned that lesson for internet facing software decades ago. It's about time we learned it here as well.