I think thinking about the CPU add mainly the ALU seems myopic.
The job of the CPU is to get data into the right pipeline at the right time. Waiting for a cache miss means it's busy doing its job. Thus, CPU busy is a reasonable metric the way it is currently defines and measured. (After all, the memory controller is part of the CPU these days.)
It also is reasonable to know about cache misses so that you can do something about that if you decide that's possible. Do you think because it is not always valuable information, maybe even rarely (when you average over all programmers worldwide, most of whom do web stuff) that it never is, for anyone ever?