I think the idea is ok, but the defaults are poor for many people.
The idea is I just restarted this thing so many times, it's clearly not going to start properly, we need a bigger restart. Up to restarting the whole VM, Erlang ships with heartd to automatically restart the VM, but not a lot of people use that either.
This works ok if all the software running on a node is deeply related, and the normal startup time is more than the escalation threshold. Then it can catch things like bad code push -> instafail. But that doesn't match my environment very well, and it's easy to accidentally trigger.
Another tricky thing is the let it crash mantra really needs to be moderated, crashing in a request handler often really should be caught to give an appropriate response to the requester, and may need to be caught so that other, independent, requests that have already been queued can be processed.
The idea is I just restarted this thing so many times, it's clearly not going to start properly, we need a bigger restart. Up to restarting the whole VM, Erlang ships with heartd to automatically restart the VM, but not a lot of people use that either.
This works ok if all the software running on a node is deeply related, and the normal startup time is more than the escalation threshold. Then it can catch things like bad code push -> instafail. But that doesn't match my environment very well, and it's easy to accidentally trigger.
Another tricky thing is the let it crash mantra really needs to be moderated, crashing in a request handler often really should be caught to give an appropriate response to the requester, and may need to be caught so that other, independent, requests that have already been queued can be processed.