Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Absolutely, yes.

It can also misbehave without any hardware bugs due to glitching. Rates of incidence of this must be quite low or that would be considered a HW bug, but it's never zero. Run code for enough hours on enough machines collecting stack traces or core dumps on crashes and you will notice that there's a low base rate of failures that make absolutely no sense. (E.g. a null pointer dereference literally right after a successful non-null pointer check 2 instructions above it in the disassembly.)

You will also notice that many machines in a big fleet that log such errors do so exactly once and never again, but some reoccur several times and have a noticeably elevated failure rate even though they're running the exact same code as everyone else. This too is normal. These machines are, due to manufacturing variation on the CPU, RAM, or whatever, much glitchier than the baseline. Once you've identified such a machine, you will want to replace it before it causes any persistent data corruption, not just transient crashes or glitches.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: