By "sane malloc" do you mean one that gives you "cleared"/zeroed memory? I think it's a rarity and I think programs kinda assume that malloc takes, I dunno, 300-1000 cycles at worst when allocating many megabytes - whereas zeroing such buffers takes much more.
Or did I misunderstand your point about "malloc sanity"?
The section responsible for Hearbleed was never allocating more than 64 kilobytes, which can probably be cleared in 1000 cycles on most modern architectures.
As someone else pointed out, OpenBSD's malloc() implementation could have supplied a cleared memory area with no discernible performance impact (in fact, I think LibreSSL already does).
Technically yes (although, by default, no), but it's more efficient than that would imply. By default, I think only small chunks are overwritten, so OpenSSL's meagre 64 KB of Heartbleed payload would have been filled with useless junk, whereas multi-megabyte mallocs() in e.g. a RDBMS would have been unaffected.
There are some other protection mechanism included, too; there's a more in-depth presentation here:
Or did I misunderstand your point about "malloc sanity"?