Amdahl's law always resonated for me for this reason: you only gain speedup equal to the portion of the total work that can be accelerated.
From that perspective, speedup is less about hyperefficient design of a brand new thing, and more about recognizing large portions of existing workload that are amenable to acceleration.
More successful accelerators have been borne from the latter approach (superscalar/hyperthreading, GPUs once OpenGL/DirectX dominated, fixed function video decode hardware) than the former (Itanium/VLIW)*.
Or in simpler form, never start a value proposition with "First, rewrite all your software..."
* ML is an odd duck, as it's somewhat co-evolving with its own accelerators?
From that perspective, speedup is less about hyperefficient design of a brand new thing, and more about recognizing large portions of existing workload that are amenable to acceleration.
More successful accelerators have been borne from the latter approach (superscalar/hyperthreading, GPUs once OpenGL/DirectX dominated, fixed function video decode hardware) than the former (Itanium/VLIW)*.
Or in simpler form, never start a value proposition with "First, rewrite all your software..."
* ML is an odd duck, as it's somewhat co-evolving with its own accelerators?