Hacker News new | past | comments | ask | show | jobs | submit login

The error message could represent defective hardware. If we did our job right, that's about all it would ever indicate. There's few sane actions one could take in an exception handler to remediate the problem. I will agree that a better overall system design could detect this problem and rotate the defective nodes, though.

But in this case for all we know it could be a software defect from which there is no recovery. IMO it's bad to overdesign recovery mechanisms that just end up masking design errors. Clever staged rollouts of changes are good ways to mitigate the impact of a new regression.




Hardware failure is one thing but in my experience it's just devs not taking the time to properly handle errors and exceptions. If something fails, restart the process and report the error silently in a log, not facing the customer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: