Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Once you build a distributed system that's large, it has to tolerate routine node failures.

Once you have a fault tolerant system, and some idea of failure rates, you can then ask "is it worth having X units of less reliable hardware or Y units of more reliable hardware? How many will still be running after a year?"

You then find out that buying a few more cheap nodes compensates completely for the lower reliability.



One more step, and it will conjoin with Erlang's "Let it crash" approach.


This thread seems overly complex given that an engineer tasked with designing a RAID storage to replace an unplanned haberdashery would probably start with looking up what the requested acronym stands for.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: