Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, I have a war story...

I was working on mobile robot research at JPL back in the 1990s. We had a robot with an arm attached. It worked fine except that every now and then the whole system would crash hard with a totally corrupted heap and stack, just random data everywhere. So no chance of a backtrace. The really weird thing was that this only happened when the arm was moving. We also had the exact same system running under a different operating system and we never had any problems there, so we were 100% sure it was not a compiler error.

It was a compiler error.

It took us a year to figure out what was going on. It turned out that the compiler had a bug where it would emit code that would pop the stack pointer and then pull a value out of the now unprotected stack frame. On the non-embedded system this did not cause any problems, but on the embedded system (running vxWorks) hardware interrupts used the same stack as the process that was running when the interrupt hit. So if we happened to get an interrupt just after the stack pointer was popped but before the unprotected value was grabbed, that value would get stomped on by the interrupt handler. Then when the interrupt handler would return, the process would resume, grab the now-random value, and chaos ensued.




How many novel depressions were created as a result of high velocity impacts after making that discovery? I think I'd be seeing red...


Actually, I remember being thrilled to have finally figured it out. We had been beating our heads against the wall (metaphorically) for a year, and I remember looking at the screen at the disassembly sequence and thinking, Oh my God, I think I've found it! It felt like making a major scientific discovery. (To be fair, I was only able to do this after others laid the groundwork for me by finding ways to reliably reproduce the problem. But I'm the one who spent hours single-stepping through assembly code before finally realizing what was happening.)

I also remember reporting the problem to one of the authors of the compiler (I think it was David Kranz) so he could fix it in the next version and him telling me that there wasn't going to be a next version because the funding for the project had been cut. There was no github in those days so the whole thing just faded into the mists of time, which is a real shame because the system really kicked ass.

The whole history of the project can be found here:

https://paulgraham.com/thist.html




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: