Also sounds like a case of not throwing enough adversarial data at the system - you can't just code coverage your code, you can't even establish KPIs, you have to establish its performance under system failure (does it freeze or gracefully shut down, does it persist to disk, what happens when the disk is yanked out of the system), etc.
Very few software shops I am aware of that do anything like this.
Very few software shops I am aware of that do anything like this.