Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Parent comment is onto something. You sound traumatized.



I was a couple years out of school, and felt a lot of responsibility. It wasn't really a big deal, but it didn't feel like it at the time.


What a great story nevertheless.


It helped me empathize with young engineers dealing with their first high responsibility bug.

About 4 years ago, I was managing a guy right out of school who pushed a minor bug that broke real-time risk calculations for a major multinational financial institution in the middle of the European trading day, prior to the NYC open, and people were yelling over email that they were trading blind. Someone had committed an important change right after his, so a simple rollback was highly sub-optimal.

Remembering how I felt years ago, I reassured the new guy that people were yelling over email because it was important, not because they were mad at him. I told him that I thought he was the most familiar with his change and the most capable person to fix it, and that he should do his best to calm down and focus, but he should let me know if he needed help, and I would do my best to calm folks down. I told him he would probably remember that mistake the rest of his life, but nobody else was going to remember it a week later. He had the bug fix in production in under an hour.

He sent me an email from home that night worried that he had let the team down, and I reiterated that he was going to be the only one who remembered the mistake longer than a week. The post-mortem follow-up was just to reiterate to authors and reviewers the importance of corner-case tests, and nobody brought it up later.

I really only remember it because my manager sent me an email that night praising how well I handled the new guy's first big production bug.


"I reassured the new guy that people were yelling over email because it was important, not because they were mad at him"

This is outstanding advice, and very well put. I shall be borrowing it, thank you!


For someone who sort of justified a humiliating response to a mistake a few comments above, this seems really well done! Congratulations.


I have to say you tell good stories about how wonderful you are.


That's a fair criticism. Deep down, I usually have a pretty high opinion of my abilities. I think I'm pretty good at hiding it in person, but I'm less good at hiding it in my writing. I feel happiest and most excited to write when I'm thinking about some of my happiest memories. I try to also be open about the mistakes I've made. I've generally been much more lucky than skilled.

I've definitely written more than one bug where post-mortem estimates were over $10,000 in losses.

August 20, 2013, I finished a code change (in Hong Kong) to Goldman's global algorithmic trading system and sent it out to a colleague in Europe to review. A friend of mine was a machine learning person in our Tokyo office and was in town for work, so a bunch of us had dinner and a small number of drinks. I stopped by the office on my way home to check if the change had been approved. It had, and I hesitated a bit to put it in production, because I had a couple of drinks and it was late at night. However, rationalized that I had written all of the code while awake and without a drop of alcohol, and pushed the change into production.

I woke up the next morning to read news [1] that Goldman had lost up to 100 million dollars in an automated trading problem within 1-2 hours after I pushed my change. I couldn't see how my change could possibly have caused that error, but was still a bit panicked until I reassured myself that my cell phone would have been called once a minute until I woke up if I had made a change that caused a loss of that magnitude.

I went into the office and saw that a chat window I had open with a friend in the NY office showed "presence unknown". An email sent to them bounced. So, I walked over to the derivatives (Flow) Strats desk, sat down in an empty chair next to one of my friends, and just quietly said "... so " and the name of my friend in NY. My friend on the Flow desk's eyes got wide and he said "how did you know?". I actually didn't know until the Flow Strat's reaction confirmed my guess.

My friend in New York was actually very careful, but he had been working under time pressure late at night and pushed a bug into production. He'd been more responsible than I had the night before. I got really lucky, and he got really unlucky. He's actually a really solid engineer. He caught plenty of very subtle bugs in other people's code, at least once when he hadn't been asked to review the code.

After August 20, 2013, if at all possible, I push changes into production before noon, and not on Fridays.

If memory serves the "maybe $100 million" ended up being around $28 million.

And that's the time that I could have easily caused a $28 million loss.

There was also a time I misplaced a paren and had a bad actor noticed, they could have used 60 million customer computers in a DDoS UDP traffic amplification attack. My test cases weren't matching my hand-worked-out examples, but I eventually just gave up and assumed my code was correct and put incorrect values in the message authentication code test vectors. Never roll your own crypto, especially if your test vectors aren't coming out as you expect. That was 2004.

[1] https://www.cnbc.com/id/100976404


It's your fault, you were the manager, the yelling should not even have leaked to the developer, it should have stopped with you.

Edit: if you are managing someone who is new in a job, it's your job to make sure they don't push bugs to break important stuff in the first place.


I don't think we can assign blame in a complicated situation based on a two-paragraph retelling, please have a bit more empathy.


The "yelling" was coming to the team email list, asking for ETAs and progress updates for when real-time risk would be back up. Roughly 4 people at the time knew the bug could be traced to the new guy's commit, and none of those people were doing the "yelling". And it was Goldman, so the "yelling" was kept very professional (no swearing, strictly enforced). But, there were literally tens of billions of dollars that needed to be dynamically hedged, but that wasn't possible without real-time risk, the European markets were open, and markets in the Americas were going to be open within a couple of hours. Trading and management were making sure that that everyone on the team email list understood that this was drop-everything important, perhaps using all caps.

Yes, I and the person who reviewed the change bear more responsibility than the new developer. Also, I say "new guy", but the person who had interned with us the Summer after "the new guy" had already joined full time at that point, so "the new guy" had been working full time with the team for at least 9 months at that point. I also remember the room where it happened, which wasn't the first room we were in, so maybe he had been with us full time more like 18 months. In any case, it was the first time when he was trying keep the weight of billions of dollars out of his head and calmly but quickly fix a bug.


> It's your fault, you were the manager, the yelling should not even have leaked to the developer, it should have stopped with you.

That's impractical in organizations with flatter structures and general purpose communication channels.

What would you suggest, kicking everyone off of internal IRC/Slack/mailing lists/etc.?


Did you consider the fact that you probably know nothing about the dynamics of their workplace, the structure of their management/leadership, etc. before assigning blame?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: