Ironically, a high performance, general purpose architecture without speculative execution might require a deep reinvestment in SMT. Instead of trying to speculatively make one thread fast to mask IO stalls, run a large pool of threads that can stall frequently but still keep the execution units and memory channels busy.
To avoid reintroducing these spectre like bugs, you'd have to conservatively design the per-thread execution to avoid those covert channels. Not only synchronously enforcing all logical ISA guarantees for paging and other exception states, but also using more heavy-handed tagging methods to partition TLB, cache, etc. for separate protection domains.
> Instead of trying to speculatively make one thread fast to mask IO stalls, run a large pool of threads that can stall frequently but still keep the execution units and memory channels busy.
Isn't something like that done for GPUs? They have the advantage of having a massive number of threads to execute. For CPUs, the number of runnable threads tends to be lower.
To avoid reintroducing these spectre like bugs, you'd have to conservatively design the per-thread execution to avoid those covert channels. Not only synchronously enforcing all logical ISA guarantees for paging and other exception states, but also using more heavy-handed tagging methods to partition TLB, cache, etc. for separate protection domains.