Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Weirdest hack that you ever saw in production?
330 points by jxub on March 24, 2018 | hide | past | favorite | 278 comments



I worked at a place that had a large, distributed terminal network running on something like OSF or DEC Unix.

It was 2001 and the thing was on life support while PCs were being rolled out. I was helping to rack my new database servers, which was next to this big lab table/shelf combo with like 16 terminals on it. When pulling a cable I banged my head on the table, then this big book fell on my hand.

About 10 minutes later, a bunch of graybeards come around the corner yelling “WTF are you doing!”

Turns out that the dictionary that hit my hand was perched across two keyboards, holding down the “enter” keys of two terminals. Turns out that for reasons unknown, those terminals had to be repeatedly hitting the enter key in order for the logins and print jobs of about 40,000 people to work.


I've got one of those from "Only on WIndows" series. One of my colleagues uses WIndows and tends to remote in from home. Unfortunately when a cleaner cleans the office after hours she sometimes accidentally hits the CAPS key and he can't log in anymore remotely. His solution was to rip out the CAPS key and cover the hole with duct tape.


I do this, too. The first thing I do when I get a new keyboard is to remove the caps lock key. I never used it in my 20 years of PC usages and I don't get why it's still there.

Back in my CounterStrike gaming days I would also remove the Windows key because it would crash the game when accidentally pressed.


I work about 30/70 on Mac and Windows. I have the caps lock key mapped to ctrl on Windows and cmd on Mac since they’re roughly equivalent. Makes switching my muscle memory when I switch my OS far less necessary. No more hitting ctrl-c to copy on Mac and instead sending a SIGKILL to the terminal!


On Windows, CapsLock is my Push-To-Talk key, so I remap it to F9 or something "useless" via KeyTweak.

On Linux, it's my meta/compose/whatever key, so I can write äöü on UsIntl via caps-aou ( https://github.com/winks/dotfiles/blob/master/.us-intl-germa... )


DON’T remove caps lock!!! Use it as an application switcher!

Some of my most used shortcuts:

Caps + C = Chrome

Caps + S = Spotify

Caps + A = Atom

Caps + T = Terminal

+1 for removing windows key if gaming


On Windows, Win+1 launches the first app in the taskbar, Win+2 launches the second, etc.


That's a really good idea. Most of my application switching needs are covered by the built-in Windows "Win+1 opens first icon on the taskbar, Win+2 opens the second...", but this could cover the lesser used ones like Outlook.


Ahh the good old CounterStrike quick exit! If you were lucky you could alt-tab back in, but you'd have a big mouse arrow where your reticle used to be :D

I ended up coding a utility (in VS6!) to disable the windows key when you launched the game.


Map escape key to it. Especially if you use vim.


I map ALT to CTRL and CapsLock to ALT, so I don't have to use my pinky to hot CTRL



It's used by the Japanese IME on Windows: Ctrl+CapsLock - hiragana input, Alt+CapsLock - katakana input, Shift+CapsLack - alphanumeric input.


I like your typo in the last capslock, it accurately describes my capslock key ;)

Though, that actually is interesting, and you made me read up ab bit on Japanese.


My wife would kill me. She uses it to enter uppercase first letters. Just won’t (or can’t) work the shift key.


dude that windows key always messed me up, esp because my computer was a load of crap, and it took hella days to minimize/maximize the window again, usually I had to reboot the machine.


I've mapped my caps key to fullwidth text, initially for quick-response memeing but it turns out it's actually quite helpful for increasing expressiveness on various chat protocols.


That's actually pretty cool - how do you accomplish that? (And on which OS?)


Not a hack, but once we couldn't figure out why a printer dropped off the network.

Turns out the cleaners were thorough enough to clean deep down behind a desk and turned off a surge protector that a 5-port network switch was plugged into.


I remapped my caps key to control on every machine I use. I find it much more ergonomic than the standard left-control location.


I use AutoHotKey to remap it to launch TotalCommander.


Does AutoHotKey function on the login screen?


Nope -- I remap Caps Lock to control, and if I'm not careful I can wind up logged in with it stuck in caps mode, and no way to get lowercase characters without locking the screen, turning off all-caps mode, and logging in again. Luckily, this is hard to do :-)


I use AHK to remap CapsLock to double-click to reduce the strain on my mouse hand, but I can access the original CapsLock function via shift-CapsLock. Try it -- maybe that will help you.


That's a fantastic idea! Unfortunately, I use JetBrains IDEs, so I have lots of ctrl-shift keychords I couldn't live without.


WTF? How come I worked around three years on RDP clients and stuff and did not know about this?


I've worked at a place with a very similar hack before, but it was for a bunch of windows servers. The issue was a process was trying to scan various Office files, and even though no actual Word/Excel/etc app was running warning dialogs could suddenly appear blocking the scan process. The problem was "solved" by some frustrated ops guy that wrote a service that scanned for dialog boxes and closed them.


First time I've heard of this problem there was "Buzof" - maybe nearly 20 years ago? I just googled and it still exists: http://www.basta.com/Buzof


Dictionary Attack 1.0


Similar but inverse to this, I was fixing a critical server and it usually came up in a minute or less. 5 minutes later I get concerned, I walk into the server room and it’s still trying to boot. I scratch my head and see a usb keyboard attached.... with a screwdriver sitting on my space bar.

Yup, I IRQ-DOSS’ed myself.


Seriously, that is pretty impressive. Unix police found you.


/dev/random entropy issues?


I have heard a similar story that existed in my company before it joined. It was a system that raised and dispatched jobs to engineers, and unless the space bar was held down on the master terminal, it stopped sending jobs out.


I worked for a chain of medical clinics in the early 2000s in Florida.

Everything was going digital. We had a “remote” office across the street that didn’t have internet access and it was really expensive to have lines installed.

I don’t remember what the device was called but it was some sort of satellite dish. Our network admin had ordered them and installed them on the roof tops of both buildings and we could provide access via this link.

It was pretty rad to me. But then the intermittent bugs started creeping in. The remote site used some windows 2000 dumb terminals and throughout the day all of them would disconnect at the same time - every day.

It was very random and they’d automatically reconnect after a few seconds to a few minutes - always.

Well, I was the new guy / intern grad. And so after a few weeks of debugging different things inside and moving the dishes around to make sure they were exactly pointed at each other my boss rolls into work with a beach chair and an umbrella.

I looked at him and asked if he was taking off early and hitting the beach and he goes “no you’re debugging”.

We set up the chair and umbrella on the roof. I sat up there with an walkie talkie and the remote site had the other one. My job was to sit until they radioed to see if anything was obvious interrupting the connection.

Then a semi truck stopped at the red light.


Ah, good old laser link. We had one of those. Great connection as long as there was any. Came to an end as developers startet a revolution. We went home and stated to only come back when there was a proper connection. By the end of the week the company gave in and one month later we had fibre. Those were the days :D.


Aahahaha, I work for a company that does P2P internet links over RF. When I got to > "all of them would disconnect at the same time - every day." semi truck was my first thought!


Sounds like a point-to-point directional wifi radio. They're far from a hack and have been used to great success upwards of 2miles away from each other with 1Gbps throughput, even for small ISPs.


Yeah I can’t remember what it was. Someone said laser link above, I feel like it was microwave - I do remember being concerned we’d cook birds.


Yeah, it was probably microwave.


Faster communication and cooked birds to eat, what's the downside?


Well, it was in the early 2000s so I doubt wireless would have given any reasonable bandwidth. Sounds more like https://en.m.wikipedia.org/wiki/Free-space_optical_communica... at least that's what we had those days.


So if I understand correctly, the semi was tall enough that it would interrupt the signal between the two dishes on the roof?


Yeah the other building was a small one story office that was a at a lower elevation. The main building was a converted strip mall so it had a bit more height with the extra space from the drop ceilings.

We ended up putting the dishes on longer poles. :)


So what was the fix? Boosting the height?


Longer poles. It started out on I think 18”. Pretty much enough to clear the edge of the roof of the main building. We ended up putting it on a ten foot three pole stand.

That blew off during a hurricane the same year :)


The story keeps getting better.


this was an awesome way to start my Sunday, thanks!


If you ever need a pick-me-up at my past selfs expense let me know ;) That place was full of stories!


Whilst this isn't really in production. I was porting AOSP to an Android TV I owned. However, I wanted to use the latest version of Android, but I had some closed-source graphics composition binary blobs to interface with; they were for an older Android API. Naturally, this meant writing a wrapper from the new API back to the old API.

For some reason, every so often (sporadically) I'd get a segfault inside the closed-source binary blob. To get things "working", in my wrapper, I captured the stack before calling the occasionally segfaulting function, and setup a segfault handler that would simply restore the stack to its state prior to the crash.

Unfortunately, after restoring the stack, a subsequent call to the closed-source function would hang. I did some preliminary reverse engineering of the binary blobs and found that it was segfaulting whilst having retained a mutex. So I did, ah... the obvious thing(?) and just grabbed the raw memory address of the mutex, and released it myself when a segfault was encountered.

Surprisingly this all "worked". In the end I had a TV that thought it was a 65" phone, lock screen and all. Umm, yay!

Here's the code:

https://gist.github.com/Benjamin-Dobell/bb13f6169aaa48625453...


Wow. Thanks for sharing your story.

The way you casually explained this to me really threw me off. I remember trying to do some AOSP hacking on some older phones and always gave up. For different reasons

Locked Bootloaders

Huge sources to download

Confusing device configuration (all those xml files were and still are magic to me)

The farthest I ever got was to compile a kernel and flash it unto a phone. I was so happy for so little. The only difference was the change to the build version / name.

I remember back in the pre Ice cream era graphics drivers were the most often quoted reason for older devices getting "stuck" on older versions of Android. It never occurred to me that you can write a graphics wrapper.


Right after WinXP came out, I was a sys/win admin at a manufacturing co. Their ERP system was an old VB app that sat on a share drive and everyone opened the same .exe from their respective workstations.

The app required a ton of scheduled database and ERP tasks (it used a legacy flat-file db), so the vendor wrapped them all up in a secondary executable that was effectively a non-headless (headful?) daemon (this was expensive, niche industry software btw). The first instance of the application that opened would also trigger the daemon to open too, on whichever PC it was executed on (it was supposed to be opened on the server 1st). It was provided by the vendor this way, as part of the COTS application.

As a result of this daemon hack, every couple days (after the application crashed on the server, as it did frequently) I would run around the building to dozens and dozens of workstations until I found the user’s workstation that had been the first to run the ERP after the server process crashed, and thus had the daemon running on their workstation. Then I would kill it, and sprint back to the server closet to reopen the daemon before any other users would run the ERP and grab the daemon (later would just RDP after we got off NT).

It was awesome.


Oh wow. Yeah, that sort of thing seems to have been so common in ERP software.

My first professional programming job was working on a bespoke ERP and industrial process control suite (written in PowerBuilder). The program had a huge number of sub programs (dynamically loaded modules, each an MDI window accessed using a “program name”, something like a SAP transaction code).

We had a number of background services that would have to run, however writing Windows services in PowerBuilder was anything but easy. And we were reluctant to use anything else - the whole benefit of using a 4GL was a well integrated ORM and report generating functionality.

So we’d implement our background services as regular modules (with their little MDI window) within the main thick client app. Clients would have a number of workstations dedicated to running a single one of these processes. Nothing headless, each outputting it’s status or logs to he connected display. If the power, network or database ever dropped, each of these machines would have to be restarted and have its allocated sub program reopened.

For example, despatch label printing program would monitor a database queue table for new rows, bring up a report associated with the specified despatch note, print the report to the label printer then delete the row.

It seems so hackish but it worked incredibly well. Our clients were all food or paper manufacturers, running 24/7. Operations were rarely disrupted. Have a single screen per function to monitor for status changes was something operators were accustomed to.

This was over a decade ago, but I’ve never worked with a more productive team since. The constraints of the system let us focus on solving business problems. I can’t imagine writing anything of this scale in a modern environment. I’d love to see 4GLs like this make a comeback. The first class GUI, ORM, report generation were a huge productivity boost. And the simple programming language (with a very simple object model) put the focus on problem solving and not API acrobatics.

Simpler times.


No kidding. Everything got clear queueing and back-pressure for free!


Back at CS school, that's the definition I always hoped to hear of "race condition".


I looked up ERP software in google images. Holy poor UX, Batman! There's a great market opportunity there for something that doesn't spew out everything at once onscreen.


From the outside it definitely looks this way. And some enterprise software (Oracle, Peoplesoft) definitely have some real UX gaps. However... in a professional, power-user environment, high information density is massive plus. Being able to see as much information as possible, with as few clicks as possible, and as few round-trips to the server as possible, is very desirable.

The less information you have per screen/interaction, the more you lock users into a specific way of doing things. Business software users tend to optimise for what works for them. The worst UX experiments I’ve conducted in enterprise software involved low information density screens, showing users just what they told me they needed to see. User expectations in this space are so nuanced - better to favour more information and a high learning curve vs easier to user but less flexible software.


In my startup days, we were working on a proof of concept with a really big bank. Because of their security rules, we couldn't have direct access to their systems - so if we wanted to do something remotely, we would have to start a webex, they would join and share their screen, and give us remote control.

This worked great, except if we wanted to work over the weekend, since if we left the screen alone for more than a few minutes, the screen lock would kick in and we'd lose the session.

Our solution? We purchased a small fan with an oscillation mode, and tied a mouse to it. We then had the fan drag the mouse ever so slightly back and forth whenever we wanted to step away from the remote session. Kept it going for weeks.


I use a posh script like this when i dont want the computer to lock.

$ws = New-Object -COM wscript.shell;

while($true){ $ws.SendKeys("j"); Sleep 60;}

Ive used it for demos that way the computer doesnt lock before the demo starts, its pretty short and easy to remember. Also on windows if you spam SendKeys("{Left}") everything you type is backwards and when you hit the windows key it freezes the computer in an interesting way, pretty fun.


I used to use autohotkey to send an F13 key press every couple minutes to avoid lockout on a work computer where I wasn't allowed to increase the timeout


Wait is Function keys above F12 still addressable?


It’s Windows, so backwards compatibility mandates F13-F24 exist.


Certainly, my keyboard still has those buttons and I still use them


Which keyboard is this? I'm a keyboard-only user and would love to have an extra row of keys for shortcuts.


Can't speak for the parent, but Apple USB keyboards have F13-F19.


bash/X11 equivalent:

  function mouse_around {
    while true; do
      sleep "$1"
      xdotool mousemove_relative 1 1
      xdotool mousemove_relative -- -1 -1
    done
  }
  
  mouse_around 60


Mandatory XKCD: https://xkcd.com/196/


The script is overkill, just put some plastic under your mouse.


Try this: http://www.zhornsoftware.co.uk/caffeine/

Use it all the time when working from home/moving around and I don’t want my laptop to lock every 15.


There's also a hardware solution: https://www.amazon.com/CRU-Inc-30200-0100-0013-CRU-DataPort-...

Or a powered turntable for your mouse, if you can't plug HID devices into your machine! https://www.amazon.com/Liberty-Mouse-Mover/dp/B079P592K8/


This seems to be a very common problem as evinced by the many solutions below. A friend of mine asked me solve the problem for him because he was remotely accessing a database and the company-issued laptop, VPN client, remote server and database server all had aggressive timeouts. Getting a cup of coffee meant logging in to everything again using long and complex company-issued passwords.

We used one of the common Raspberry PI Human Interface Device (HID) Python packages to send a harmless keyboard or mouse event once every 5 minutes.


On a windows machine you can start a powerpoint presentation, then minimise it. This stops the screen lock from starting.


I’ve used a blank no-audio video in Windows Media Player to achieve the same.


I've seen people strapping a watch on the back of the mouse to achieve the same effect, the constant ticking makes it like the mouse is slightly moved, preventing a screen lock.


Similar situation, except with a personal massager, a wireless mouse, and a large bowl.


There's also this [1] I found when I had to opt out of a crazy work-enforced screen lock timer. (like 3m)

[1] https://archive.codeplex.com/?p=mousejiggler


As someone who enforces screen lock timers for compliance, if I ever found someone was using any of these hacks, my solution would probably involve writing a script to automatically lock their screen every five minutes regardless of activity until they agreed to knock it off. :P



I actually find people's risk intuition spectacularly bad. People seem to suspect their IT department will catch them and report them for things that their IT department doesn't care about, for example. Or expect that the IT department and/or supervisors may be watching their webcam, which is creepy, and really not something ordinary workplaces do.


Ha - amused to read this! Just built that a week ago for my partner. I used an Arduino connected to the USB as HID. The approx. 10 line script is moving the mouse back and forth every couple of seconds. Works like a charm: keeps the screen unlocked and her activity indicator busy...


I did something similar in my last job: We used a custom terminal services client to log in to our data center, and the servers would kick you off after something like five minutes of inactivity. I ended up modifying the ts client to send a shift keypress every four minutes.


I think an optical mouse placed on the reflective side of a CD achieves the same "jumping around" - effect.


In the midnineties I was hired to improve the performance (and eventually rewrite) a custom in-house search engine. I dipped into the software and there were some quick wins, but I couldn’t get the damn thing to reply quicker than 100 ms. In desperation I just grepped for the number 100 and sure enough I found a 100 ms sleep in the routines handling the connections. Turned out the author had made a mess of his socket handling and by trial and error had found out he could get the thing to work reliably only by waiting for a while.


An SAP consultant once told me that his preferred technique of averting long-winded and pointless discussions about irrelevant details was to insert random delays into his code. That way, instead of discussing irrelevant details, people would get upset about the performance. He would then sigh, dramatically, and say he would see what he could do, remove a few lines, spend the rest of the day reading the news, and - importantly - billing the customer.


I'm pretty sure I've seen this before, is it one of the BOFH stories?



there's also the memory-optimization variant of this, from the land of game dev. see "the programming anti-hero": https://www.gamasutra.com/view/feature/132500/dirty_coding_t...

most of the other game dev dirty trick on gamasutra are good value -- the "(s)elf-exploitation" one from Jonathan Garrett, Insomniac Games is particularly awfully clever:

https://www.gamasutra.com/view/feature/194772/dirty_game_dev...


I am too lazy to check right now. ;-)

Honestly, this story was told to me by an SAP consultant. I do not think he was an avid reader of BOFH. I might be wrong though.

Either way, I think this approach to avoiding bikeshedding has been invented independently numerous times. ;-)


I once found a "sleep(1000);" in the middle of some hairy critical code -- something existentially important like payment processing. There was no obvious reason for it, it had been added without explanation, and the author was long gone from the company.

I didn't have the guts to remove it.


Having an important endpoint delay every response by 1 second (assuming the 1000 are ms) is a relatively common and easy way to delay brute force attacks. E-mail servers call this tarpitting.


I can vividly imagine finding this kind of bug through a random feel and briefly feeling great relief before flying into an apoplectic rage haha


As a consultant I got a job from a major public company to fix a new touchscreen based in-car dashboard they had built. It was a web app running on a cheap android tablet full screen. The thing worked well, they said, except that it would get stuck in demo mode, and you couldn’t switch out of it. They’d paid an overseas contractor a significant sum to build this and eventually fired them when they got stuck at this point.

Upon opening the code I discovered the entire program was a carefully constructed slide show with hundreds of jpgs in a jQuery carousel and some magic click areas coded in to jump the user between slides. Other than this code to jump to specific slides, there was no code at all. Even the text on screen was in the images.

I should note that their git repo consisted of about a hundred folders whose names were dates, and one folder named “current.” That was actually my first warning of just what I was getting in to.


"Upon opening the code I discovered the entire program was a carefully constructed slide show with hundreds of jpgs in a jQuery carousel and some magic click areas coded in to jump the user between slides."

Based on my narrow understanding of "standard issue practices" in car dashboard UI:s workflows, this was a common pattern at least at one major German automobile company. Static views and transition rules between them.

I was a bit involved in dashboard software a decade ago and was really surprised to learn this.


I worked on a very similar application (likely the same platform) and grew increasingly concerned while reading your post that I was the one who built this.

Phew -- this was not me.


In 1997, I worked with FedEx to build an integrated order management system for the e-commerce company that I ran with my dad. Orders would come into my Perl-based order management system and the pickers would use the web interface to print a packing slip. A bash script would generate a Postscript barcode and then my Perl would generate a packing slip in LaTeX that included that barcode. The LaTeX-generated PS file was sent over a private T1 from the datacenter to the warehouse printer. The order would get picked and put in a crate with the packing slip. The barcode was then scanned at a FedEx shipping station by our shipping guys. That would trigger a script on the FedEx machine (written in Visual Basic, I think) that would make a call to PostgreSQL over Windows ODBC to pull the shipping address and shipping method. As soon as the workstation populated this info, it would spit out a FedEx shipping label and the VB script would then trigger an INSERT back into Postgres with the tracking number. This triggered another Perl job to mark the order as "shipped" and would send an email to the customer with the tracking number.

TLDR: we had real-time order tracking with full shipping and billing integration in a tiny mail order bicycle parts business in 1997.


You should have sold books, and expanded to bicycles later.


I could hear the Looney Tunes factory music in my head just reading this.


Talk about "bespoke".


Webhost circa 2002. Lots of carrots from Microsoft for us to go big on ASP.NET hosting. Fat boss made a deal which involved us rewriting our customer interface in ASP.NET from the existing ColdFusion morass. His eyes popping at our estimates of how long this would take, he came up with a solution: rename our *.cfm files to .aspx, and map IIS to pass .aspx files to the CF server. Job done.


Genius. Now if we could convince the kids of today this is solution to rewriting everything using the latest fad framework.


Nice and efficient. There’s an elegance to this solution that’s hard to grasp unless you’ve been in a similar situation before.


I love the novelty and frivolity


Years ago, in the 70's, we came across a bug where some program would skip every other input line. When asked to fix it, the responsible programmer went away and within a few minutes reported back that she fixed it. When we told her it was still broken she referred us to the updated documentation, which now said "the input should be double spaced". The said program was used this way for years after.


In a consumer app, I would say Snapchat's early camera hack on Android takes the cake.

To be brief, their app ran the Android native camera app in the background and took a screenshot of the resulting feed for the image, bypassing actual integration with Android's camera apps. Having worked on an Android smartphone from the ground up, I can understand their reluctance to commit dev time to having to support so many Android versions and other variations on all the devices out there, but still a lazy weird hack.

https://android.gadgethacks.com/how-to/fyi-why-androids-snap...

https://www.reddit.com/r/GooglePixel/comments/64xqv0/snapcha...


I thought it was also about delay, because the main camera API in some cases imposes a considerable delay (1 second or more), but screenshots are almost instant.


Actually used the same "hack" years ago. Made an app for a friend where one could import photos and drag logos on top of the photo.

Making a screenshot was way easier and since I didn't had to spend time to figure out how to use the bitmap API and its edge cases. Especially large pictures on low end devices caused crashes.


This is hilarious as Snap Inc. bills itself as a "Camera Company"


This isn't a comment about snap chat, moreso about how shitty the camera API is for Android. Yes, it is faster and more reliable to take a screenshot of the camera app.


Aren't they still doing that? At least I have the impression.


First job out of university and I had to fix a terrible crash that happens to our prod application every few hours running on Windows NT boxes. After lots of debugging and asking all the “senior” devs, no one knew a solution since it was all super old code. What I did notice, though, was the apps that I was debugging didn’t crash until I stopped debugging it.

Turns out that it was a memory issue and every time I minimized and maximized the app, part of the memory got cleaned up. So as a temporary fix, I just wrote a script to auto minimize/maximize the apps on all boxes until we found the memory leak.

Note: we never found the leak.


I once worked maintenance on a large C++ program used in production by a lot of customers. It was odd in several ways, but the feature that stands out in my mind was the numerous classes that were not defined anywhere in the source code or libraries.

If that sounds unlikely to you, it sounded unlikely to me, too. I wasted a lot of time trying to figure out where they were defined. I couldn't ask the original author of the code; he had moved on.

Eventually, I found them, sort of. They were being defined by a sed script that ran during the build process. It read the sources before they got to the compiler, constructed class definitions on the fly, and injected them into the code before it was fed to the compiler. So the definitions were right there in the code that the compiler saw; they just weren't anywhere in the code that humans could see.

Why was it done that way? I have no idea.


Honestly, that sounds pretty normal, or at least ok, to me. Auto-generation of code is one of my primary daily tools, and I think it's just right for whole categories of problems. I currently generate a good third of my compiled sources, and also generate a rudimentary typescript library out of my source code to be used by the my colleagues.

That being said, I usually work in Scala which provides language-based tools for that, so it definitely helps with avoiding the "dark magic" sentiment you may have had.

But why was it done that way ? It reduces boilerplate, copy-paste errors, code duplication, in ways sometimes not made possible by inheritance or composition.


Did you consider the point about code being defined only at build-time, not being available for inspection by the developer? That sets it apart from most auto-generating code scenarios I've seen.


Hey, I'm a common lisp programmer. I'm down with automated code generation. But in Lisp or in Scala, as you say, it's in-language, so you can see what's going on by reading the code. This was different.


I have a simple rule about adding "magic" like that- if you can't make it immediately obvious to the readers of the code just what it is the magic is going to do, don't do it. AOP Java lead me to that rule because of too many obscure annotations that did insane things to help one developer avoid some minor nuisance.


I disagree that you shouldn't do it. The way you do it is to add a comment

// This %thing% was generated from %template% using data from %data%.

If your language is too restrictive to add sensible tools, you should write the tools yourself. As with any code, write it in a way such that other people will be able to understand it.

There is no magic, it's just code. I wish people would stop using that word.


The original comment says "they just weren't anywhere in the code that humans could see."

Where would you and the comment in a situation like that one?

I don't have anything against generated code but it should be visible and, as you said, it should be crystal clear where it come from


I think in this instance it's also a matter of tools. Visual Studio/Visual Assist would have been able to find the class definition, and that's where the comment would go. For cscope it's a matter of configuration, which should be auto-generated by the build tool.

If the class is being used, then the definition has got to be somewhere, hopefully in a header. It's simply a file that is included somewhere. And it will be human readable. There is no magic.


The class definitions were not in any file. That was the entire point. They didn't exist at all except as transient build artifacts.


I agree with that rule.

Sometimes you have to get the job done in a "less than ideal" way. But a lot of documentation/comments should be left to justify and more importantly explain how it works.


If I had to guess, it was a hamfisted way of doing JS-style objects in C++. They could just slap properties onto objects, etc, and the build process will determine what each class needs. Clever, but terrible.


For what it's worth, it was before JS existed.


What’s your opinion of languages that can do stuff like this, dynamically at runtime?

And it’s even better: you can define, augment, redefine or remove classes at runtime.


The lingua franca of theatrical lighting control is a physical-layer protocol designed for custom cabling called DMX. Light boards emit an array of 512 values in the range [0,255] and dimmers, or lighting instruments themselves, interpret these values as parameters like intensity. For various reasons it's useful to carry this signal over an IP network, and proprietary standards to do so have proliferated.

Light boards these days are just computers with some domain-specific IO. Tired of our ancient ETC Expression console, my colleagues and I wanted to start using ETC's new Nomad control software on our laptops. Our venue's dimmers only understood ETCNet2, while Nomad could only speak the newer ETCNet3 (and a few other open standards we couldn't use). Attempting a software upgrade on the dimmers themselves seemed incredibly risky. To bring Nomad's output to DMX would have required an additional $500 hardware purchase on top of the already-not-cheap software license.

On the message boards, I discovered a strange fact. The ETC-branded DMX<->Net2 interfaces we owned were actually white-label manufactured by a company called Pathport. Pathport boxes spoke a much wider array of protocols using the same hardware. These things handled firmware updates by flashing themselves with whatever was served to them over BOOTP. Pathport firmware images were free to download straight from the manufacturer.

Net3->Net2 was too much to ask for, but they could do ArtNet (an open standard) to DMX. Nomad could also emit ArtNet. So I flashed and configured one node to operate as ArtNet -> DMX, and plugged it into another node configured for DMX -> Net2.

So now, locked in a closet, there is a very strange loop of switch -> hacked ETC box -> normal ETC box -> switch which seems bizarrely redundant, but actually makes the world go 'round. And I could run lights and sound from any network drop in the building.


Wow, this brings back memories of running the theater in my school. Definitely a different situation, but I like to think we did a good job given what we had.

We didn't really have a budget, just some hand-me-down equipment that came from above sometimes. I and others on my team put together so many hacks to make things work. One memorable time, our light board had broken, but we still needed to run shows.

We didn't have enough time to wait for shipping on a real USB->DMX adapter, nor budget for a new board, so I created a hacked together DMX adapter with a serial to USB adapter and a NAND gate (I put schematics together here, if anyone's interested: https://github.com/magmastonealex/DMXAdapters).

It worked remarkably well for being a bit of a hack, but paired with software like QLC+, had more features than our old light board! It was still in use for controlling special effect lighting when I left, though thankfully not for main lighting and day-to-day use.


The Expression may be outdated, but LDs still cling to it.

This also reminded me of how when HES stopped supporting the DP2000 in Hog, DP2000 owners just swapped it over to ArtNet mode.


Sony PSX (original playstation) port of a PC title that I worked on, we needed to have a physics thread run at a predictable and consistent rate regardless of what the rest of the game was doing. Sony Japan said pre-emptive multi-tasking wasn't possible.

Found a way to hook the vertical blank interrupt (shades of old Atari 8-bit programming), push all the registers onto the stack to create a setjmp/longjmp-ish way to call our physics thread at a consistent 30Hz. (OK, 29.97, but close enough)


Do you have any more interesting stories of working on game development for the PSX?


I only worked on the one title (NASCAR Racing) and so my war stories are limited, but I'll give you what I recall.

Original dev boards were 3 full length ISA bus (IIRC) boards and were a PITA to get installed, all the IRQ conflicts resolved, etc. Later dev environment was a "blue PSX" (basically a production PSX with blue plastic that could run non-copy protected discs). I think the ISA boards had more memory than the production boxes; I'm not sure if the blue had extra RAM or not.

We were always RAM constrained (may have been less of an issue for a ground-up game, but we were porting a PC title), and we wanted to use a common codebase with the PC title, so we had a LOT of complex C macros to bridge between the PC world and PSX world. (As just one example, we could have used filenames on the PSX, but there was no reason to waste the RAM, so I wrote macros to turn PC-file-based accesses into PSX-sector-byte accesses. I also wrote the macros such that they'd break the PC compile/runtime [depending on the macro] to prevent the PC teams from writing code that would only work on their platform. It wasn't hugely popular with some of the "old-timers", who viewed the consoles as a distraction.)

Compiler was gcc; we used Emacs as our editor (me and the other main programmer were MIT alums) and in order to get a better emacs experience, we installed OS2-Warp as our desktop OS (so we could get subshell compilation working, which didn't work, or didn't work well on a DOS boot [this was 1995 and prior to NT-based flavors of Windows]). Debugging was primarily via printf or small graphical blocks on the corner of the screen.

Documentation was fairly terrible and Sony CA had to escalate many clarification questions to Japan. Docs would say things like, “It’s critical to never fail/forget that initializing this system must happen strictly before the lack of initialization of that system.” It sometimes felt like the Ed Asner water-in-nuclear-reactor sketch.

Sony QC to approve the golden master was very strict. We shipped with over a dozen tracks and they seemed like they drove every square inch of them and complained about graphics glitches in many places that were far enough off the racing line that we never noticed (or never cared).

In terms of graphics "flair", the PC title had a flat colored track, which wasn't as appealing as the PSX titles of the day (Ridge Racer and the like). We didn't have a huge art budget for the title, but we created an artificial racing "line" of darker track which we placed by repurposing the position and acceleration data used for the PC AI drivers' algorithm. Where the AI cars were accelerating (including laterally) was darker than where they were just driving was darker than where they rarely drove.

Because the PC title was heavily focused on realism (which means it's not as easily accessible or "fun" for the casual gamer), I created an "arcade physics" mode where the car would slide and rotate more, had higher absolute cornering and braking ability, but the same forward acceleration. I also added "double click to burnout/do donuts" in normal mode as both a fun way to screw around but also a way to more easily exit a tight pit box. This had the unfortunate effect of giving much better acceleration from a standing start. So, when it found its way into the PC multiplayer title, standing start races became a sea of tire smoke and cars running into players who hadn't learned that burnouts gave faster acceleration. (We properly modeled the horsepower as a function of RPM. Burnouts raised the RPM. My hack didn't model the tire slip under acceleration, so burnouts brought the car up into the power band and you would walk away from a car who was accelerating from a lower RPM.)

We had another team working on a Sega Saturn version at the same time; that title never shipped, in small part because of the technical hurdles of getting the title to run on the platform, but also because of the limited commercial success of the Saturn was becoming obvious during development.

Other memories were working with some of the most talented programmers and artists I'd worked with up to that point in my career (both on my immediate team and elsewhere in the company), meeting Ken and Roberta Williams (Sierra bought us), and going to racing school to get a better hands-on feel for auto racing. Fun times and I sometimes wonder how my career would have gone differently if I'd stayed in games. (I left because each successive merger or acquisition by non-gamers made the company worse and worse to work for. Sierra and the Williams were great; subsequent MBA-types were each progressively worse, including substantial securities/accounting fraud so I was glad to get out when I did.)

Random tidbit: it was a single player game. If you pressed a button on P2 controller during boot, we had a simple light cycles of Tron type game embedded as a small Easter Egg.


Interesting story, I Played this game way-back-when. Thanks for sharing!


Funny you mention NASCAR Racing. Was this the 1994 MS-DOS title? How were threads even done in an OS like DOS? I'm guessing this was something DOS/4GW gave or along those lines.


Yes. This one: https://en.wikipedia.org/wiki/NASCAR_Racing

I was tech lead on the PSX title and contributed to the PC NASCAR Racing 2 and Grand Prix Legends title. Many of the core programmers from Papyrus went on to form iRacing.

I seem to recall that the DOS titles were 4GW. We ran the physics and joystick read (time a capacitor charge through a variable resistor in the controller) together (and maybe the sound synthesis as well)

(We had hacks to detect running under Win95 and then walk the app, touching each page periodically to keep Windows from paging us out.)


I think your solution is better than running a background thread.


This isn’t even top 20 in this thread, but here’s mine: maybe 17 years ago, we upgraded our department server from a big old Sun 4/690 running sunos to a shiny new Ultra80 running Solaris.

Among the many functions this server has was to host a bunch of black and white x terminals. Probably only a few people here ever used those (although more than most other online forums!), but basically the idea is that they plug into the network, at power on they tftp down the image for the x server, they boot and allow you to run x client apps on the server, an 80s thin client implementation. So we upgrade the server and things are working pretty well, especially for such a major upgrade.

My boss/mentor at the time is truly brilliant, so we really had most everything thought of. The only thing that was off was that all of a sudden the xterms all stopped booting. We couldn't figure it out. Network sniffers didn't show anything useful--we were just baffled.

On a whim, we decided to take the tftp server out of inetd control and truss it (Solaris equivalent of strace). The first time? Worked perfectly--our test xterm booted just fine. Eventually we figured out that the new server was so fast that the speed of the tftp transfer was triggering a problem on the Ethernet card firmware of the xterms and by using truss, it slowed down the transfer and bypassed the bug.

Solution: In inetd.conf, we just spawned in.tftpd with "truss -o /dev/null". Never saw the issue again.



Second real job I ever had, in the IT division of an investment bank, all the devs (about 15?) had color X terminals, which all booted off of one shared development server which was a mid-1990s era HP9000 box, and it supported the load and worked well. Everyone else had either a PC or a dumb vt100 type of terminal.


Early 90's. Big fortune 500 website running on a single Pentium 90 desktop PC. We had to remove the case on the computer to allow for more cooling and put a consumer grade house type fan next to it. Otherwise, it constantly overheated and would reboot.

So real data center, racks, etc. But this cheap ass, caseless, P90 on a shelf with a household fan blowing on it, while making millions of dollars.

I was mystified why there was no budget to use a real 1U server. The internet was pretty new at this time, but it was driving revenue.

Also, side info, this caseless P90 still exists. Sitting in my friend's cubical, naked and caseless. Pure glory. It's a hero. Tech stack was NCSA webserver, C, and Ingress plus daily updates with a 1.44MB floppy disk.


"Don't delete this comment or the production server will crash." Tried it, did as advertised. Apparently the website went through a proxy that reflected over the code, using that comment as a hook to inject some sort of functionality.


I've personally set this kind of thing up. Inherited an old PHP site for a webdev contract, and a couple weeks into development (before any of my code had made it to prod), the server starts hanging randomly, or spitting out seemingly random errors on every request.

I was told in no uncertain terms by the client that I had to fix the server hangs within 48hrs I'd lose the contract. This was in a million+ LOC custom Wordpress nightmare.

I wound up writing a script that ran on a little EC2.micro instance that would ping the homepage every 60s looking for the HTML comment `<!-- if you delete this comment, the server will reboot forever -->`, and if the request timed out or the text wasn't found it would hit their hosting API and reboot the server the site was running on.

I deployed the "fix", finished the contract without incident, and subsequently fired the client.


The Atlantic has a magic PAGE_COMPLETED comment that we used when I was there to tell the CDN whether to cache a page or not. I imagine that’s common.


Using the Google Sheets API to store session history and metadata for a nightly backfill job instead of, you know, a database. The program broke after the creator left and no one could figure out how to bring it back up. The engineer assigned to fix it pulled their hair out looking for the database creds, local SQLite3 records, anything that would initialize the backfill. Finally realized it wasn't just printing out metadata to a Google Sheet but actually relying on that as a persistence layer. Root cause of the breakage was automatically adding every Hadoop counter from the job as its own column in the Sheet, which eventually exceeded the dimension limits.


Using Google Sheets instead of a proper database brings me nightmares.


I did this once. A client wanted a website that mimicked the functionality of a complicated spreadsheet that he had created to calculate quotes for customers and didn't have enough money to pay to have all the logic rewritten in a webserver (not to mention continually updated).

I imported the spreadsheet into google sheets, gave him access, and had the webserver paste the values in the spreadsheet via the google sheets api and read them back out.


I created a whole project for this!

https://github.com/franciscop/drive-db

Since you can also hook a Google Form to a spreadsheet, you can do surprisingly advanced things over there.


Not to detract from this project, buy this is a good opportunity to mention Apps Script.

You can do all sorts of interesting things between a form, spreadsheet, and other services including your own. Nobody seems to use it, but internally we do all sorts of gloriously hacky workflows with it.

You can easily script forms/sheets/calendar/Gmail together to create pretty much anything you need.

I use it to send daily email reports of data fed into a spreadsheet.


Ah sure, no problem, feel free to do a PR mentioning App Scripts as well if you'd like.

From a quick overview it seems like if you need serious work with spreadsheets/GDocs then App Scripts is a good choice. However drive-db is more like a (very) quick way of putting a Spreadsheet into your Node.js backend as an array/db. I purposefully didn't even allow edit since that'd require API keys from users and defeat the quick part of it.


I did this once, the DOM isn't too bad to store your program source code :P https://github.com/winks/brainclick


The funny thing about this was that the truly incompetent wouldn't have been able to do something like that.

Sounds more like some BS requirement from a clueless middle manager or a boss and a dose of malicious compliance.


Not sure if this is still the case, but back when I was at Apple the program that triggered when you pressed a button on an Apple remote pointed at a Macintosh was just a giant AppleScript file that, at the top level, was a giant `if...else if...` statement to try and determine which application had the foreground so that the appropriate action (e.g. next track for iTunes, next slide for Keynote, next chapter marker for QuickTime, etc.) could be triggered.


Interesting! The only .scpt files I can find in /System are Automator actions or in Script Utility itself, and of those the only one that seems relevant is Library/Automator/Initiate Remote Broadcast.action/Contents/Resources/Scripts/main.scpt, which seems like something else. Hopefully that means they fixed it (if someone with an apple remote wants to run opensnoop and double-check that would be cool!)


I think I remember this from the days of Front Row. It’s long gone now.


So that‘s why that remote was always so unreliable.


Years ago, I was playing Prince of Persia: The Sands of Time, a lot. As the game was quite hard, I died often and every death resulted in huge loading times. After hours of game play I found out that every load was showing the exact same animation and took about the same time to load. I browsed the game folder and found a video file with the exact same animation. Replaced it with a 1 second video file and guess what, it worked. Never felt more like a hacker again.


Heh, nice. Reminds me of 13 y/o me and hacking a copy of a shareware CGA strip poker on a 5.25" floppy disk. What'd I do? The revealing images of the digital strippers were numbered 0.bmp up to 5.bmp. This was in DOS before we had windows. I renamed the files so their numbers were backwards 5 to 0 4 to 1 and so on. Instant reverse-strip poker and a viscerally satisfied teen :)


I try to do that as much as I can with every game I can get away with it on. Nothing worse than 2 minutes of animated logos every time you start the game, so sometimes you can delete them and sometimes you can replace them with shorter clips and really speed up the game's load time.


The music for our call waiting at my first job, was an old Windows machine blasting music in our server room with a phone on speaker... You ever wonder why music on clal waiting sounds so fuzzy?


Reminds me how the Russian Buzzer UVB-76 works. Hams listening to the station determined it was literally just a microphone in front of some tone generator, because occasionally you'd hear conversations between soldiers in the background.


Which I guess means you had to be super silent when doing physical hardware maintenance?


"I just heard a bunch of swearing when I was on hold!"


Here's a late version of Encarta.

https://goo.gl/6uX4Qu

Do you see that plain-looking dropdown menu with the rounded orange highlights? That is Internet Explorer. Just this one menu. It's an in-process instance of Trident, IEs old HTML rendering engine. So that little window is the equivalent of somthing like chromium embedded. I don't know why that menu is an instance of IE's HTML renderer. Someone wanted to style it with CSS, I think. So they embedded IE. That flyout to the right is probably another Trident window. In order to meet accessibility requirements, I had to grab the running instance of the root IE COM interface, and route keyboard events into it. With raw C++ COM. There were other hooks going in the opposite direction so the menu / browser window could tell the app about clicks.


That is just insane. Separate rendering engines for each tab?


I can't remember how the flyouts worked. They might have shared a window, or they might not have been HTML windows (but I think they were). What I know for sure is that the one main dropdown was IE.


Glass/gradients was a baaaad trend in visual styles. Very plastic, cheap looking.


Year 2006, we had a very high-traffic website running with 1 MySQL server and 1 web server (PHP). Maybe high availability or resillience terms were not coined yet, that's why we were comfortable with having one server per each function. Web server had two ethernet cards, one is with private IP and one is publicly accessible IP. After a while, the platform started to crash and I would be called by my loyal users before Pingdom alerts reach to me, then I would call the datacenter technicians to press restart button of the web server. Obviously it was a lengthy process for recovery, with a lot of human involved.

After a while, I discovered that the issue was about web server's ethernet card attached to internal network and used to connect MySQL server. When that ethernet card stops working, the platform would crash. On the other hand, it was also possible to connect to MySQL using the other ethernet card via public IP. It would reduce the performance of the platform, since all the bandwidth of that card (100 mbps!) is already eaten by HTTP traffic, but at least it would keep it running.

I ended up writing a script at my home computer, checking if the platform is up or not. Once the faulty ethernet card fails, it would connect to FTP, change PHP configuration to use the other ethernet interface to connect to MySQL server, and send an e-mail to datacenter technicians to press restart button.

This script successfully did its job during 3 months, until I eventually replaced the faulty ethernet card and fixed the issue.

Isn't it "Invent and Simplify" like Jeff Bezos says?


The other day a former colleague pinged me with a screenshot from one of our secondary RADIUS servers, asking if he could remove my former user account from a bit of Perl code (we used Radiator).

That ‘if’ block exempted me, the CEO and the CMO from traffic limits (which at the time would forcibly disconnect you) and make sure we had 24/7 access (I had set it up during testing because they kept calling us to remove the blocks, and one night I couldn’t log in either).

We found out during that exchange that another former colleague had left a cron script downloading Dilbert and User Friendly comics that had filled up the hard disk since 2008 (the machine had nearly 12 years of uptime).


Hm, 10 years at 200KB/day (about average for a Dilbert strip) would come out to 730000KB or 713MB. That seems rather quaint compared to the ~50GB git repo we have :)


I didn't have access to the machine, but I gather the cron job grabbed more stuff :)


Finance needed to do end-of-year stuff a couple of days past end-of-year. The system couldn't handle this, bad things would happen and data would change once end-of-year passes.

Solution? A bash script that does:

   while true:
       set date to 4pm end-of-year
       sleep 1



Why the loop, vs setting the date once?


NTP would resync it, and obviously after X hours it would no longer be end of year even if you set the date in the past.

It needed to be end of year day for 2 or 3 days.


yeah I get that but `while(true)` seems excessive. Why not just do it once and disable NTP?


because time moves forwards, and if you set it to 4pm then in 1.5 hours it will be 5.30pm, and the end-of-year stuff will kick in?

Three lines of bash seemed simpler. It's a hack, yes, and there are better ways. But really who gives a damn.


I would guess the system was probably running ntpd or some other time sync service that they were unwilling or unable to turn off.


Big C codebase. To be more precise, they said it's C++, but as far as I could see, it was C compiled with g++.

Some code read xml data. Instead of choosing one of the xml-parsers available, author decided to write another one. Instead of using C++ features, atoi() used. For empty strings, atoi() got NULL and segfaulted. Signal 11 has been handled and suppressed in order to avoid crashes. Certainly, the code had other segmentation faults too, which could not been discovered this way. :)


You mean, instead of fixing _just_ the atoi() crash, that developer fixed all crashes with his patch? Quite the clever bastard!


Was this a telco? This sounds surprisingly familiar.


I installed 65 cash registers all over Ireland. Each one had only an RS232 serial port. I had to read and aggregate their daily reads between 5am and 10am (only time these outlets were not running).

It was not possible to read this particular cash register when it was in operation mode OR if it was in OFF mode. Also we did not have access to GSM sims and there was no WIFI at stores.

SO:

We installed 56K modems and plugged them into the regular PSTN lines.

BUT:

That would interfere with customers calling to order out.

SO:

We plugged them into analog plug timers and only had the modems switch on between 5AM and 10Am.

AND:

The store owners kept switching Off the tills. So we had to disable the off position for all the tills.

The backend was a VB6 app running a 56K modem that read each till in turn and then processed all the results.

Ran for 11 years with not much bother.


CBE?


Nah. EPOS wasn’t our main business - or even competence! Got dragged into it because we could do the backend...


Reminds me of this brilliant The Daily WTF submission. [0]

[0] https://thedailywtf.com/articles/ITAPPMONROBOT



5F3759DF a.k.a. fast inverse square root [1] in graphics programming.

[1] - https://en.wikipedia.org/wiki/Fast_inverse_square_root


Descendents of this still exist in many codebases, especially libm's. I've found it in my own codebase as well. It's surprisingly maintainable.


I'd love to know who left those comments for Quake III Arena in your referenced Wikipedia article. I had a good laugh.


>I'd love to know who left those comments

The legend himself, John Carmack.

Fast inverse square root is really the perfect example of black magic in programming.


I always thought of this bit of code as a great example of applied numerical methods techniques, rather than “Black Magic” The magic constant is derivable from standard methods and one can even choose to optimize other measures of error.

http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf


Isn't it what black magic is all about?

Unorthodox technique, that you can explain if you try hard enough (in a sense, everything that reliably work could be explained and someone has to came up with in the first place), used by people who don't really understand it.

What do you think is a better example?


I would consider “black magic” to be something that works reliably due to some specific and idiosyncratic property of the environment that it operates within. Basically, something that is exceptionally tightly coupled. I think the novel FPGA solutions that genetic algorithms can create fall into this category; they often didn’t work on different boards, or even when the same board was plugged into a different power supply because the solution was overfit.

“A Story About Magic” is black magic in action. http://catb.org/jargon/html/magic-story.html

“The Story of Mel” is not black magic even though no one else understood his program. http://www.catb.org/jargon/html/story-of-mel.html


Yes, it is derivable, but it takes a certain amount of (reckless) genius to put all together

In the world where FizzBuzz is hard, applying the Newton Method is a rare thing.


I work on a .NET application that runs multiple download HTTP requests at the same time. We recently added support for client-side certificates to authenticate to a customer controlled server.

When Windows is configured in a very high security manner, the user needs to manually give our application permission to use the certificate once during the lifetime of our process.

We hit a bug in .NET where, if we start multiple HTTP requests at the same time that use the same certificate, and the user needs to approve our use of the certificate, the user will get multiple request dialogues.

The fix is a very convoluted lock statement, because if the user says no, the other HTTP requests that would be started at the same time need to be aborted.

What makes the lock statement more complicated is that we essentially need to lock right before the HTTP request starts, but then unlock when we are reading the stream. This means that the first time we use a client-side certificate, we have to disable multi-threading until we know that the client site certificate is approved by the user.


An external vendor delivered a new static marketing site written in PHP. Info sec team wouldn't let us install mod_php on the publicly facing servers and the vendor needed more time/money than the budget and timeline allowed to change it. A coworker stood up a local server and wrote a script to periodically crawl it and push changes out to the publicly facing apache servers. It might still be there for all I know.


Thats actually not so bad imo, it's fairly common to build a site in wordpress just for the CMS, then convert it to static HTML periodically and rehost it straight from CDN edge servers.


Definitely not as crazy as some of the other posts here, but it felt like a hacky work around for something that probably wasn't really a problem.


We did something a lot like this recently when marketing demanded a wordpress instance.


I used to support a warehouse management system (RedPrairie) that our company had customized per business rules. At some point, a bug was introduced which locked up a very important table and brought multiple warehouses to a stand-still. The decision makers weren't interested in fixing the bug, so after months of waking up at 2am to kill these locks, my coworker and I wrote a script which monitored for locks on this table from SQL with a certain pattern and killed any lock that persisted for longer than N seconds, then sent an email to anyone and everyone. This really messed with the integrity of the data in the system, but the decision makers loved the decrease in downtime and it stayed in place for a year before the bug was finally fixed.


Wow that name brings back memories. Worked for a company years ago that moved to RedPrairie from an older system and was not happy about the "upgrade". This was around the same time they changed their interface for customer service to enter orders and customer information (to BlueMartini- same company that owns RP maybe?), which was also painful (old-IE-only, bug-ridden, etc)


Mid-90s, a very large cell phone company at the time, working on a car phone (the head units had their own microcontrollers).... Discovered that the interrupt service routine was calling the reset vector instead of returning from the keypad interrupt. The original author apparently didn't understand how it was supposed to work, and a simple RTS would eventually overflow the stack....so on every keypad interrupt the whole code flow started over from reset. Worked surprisingly well.


Circa 2006, I saw some crazy ass undetectable worm Windows shit on a production oracle database server that was insanely connected directly to the internet for a couple of years. Winternals, Symantec and Microsoft folks couldn’t find a forensics smoking gun for what was there, but it fit a too aggressive advanced persistent threat (APT) type that was so aggressive it was NIDS detectable but neither HIDS nor clean boot HIDS detectable. The only solution would’ve been to reimage the machine, but of course that wasn’t allowed, so it was just locked down brutally and left to do whatever.


I had a piece of legacy, proprietary software that communicated to another machine by wvdial over an SSH TTY (?!). It inexplicably stopped working when migrated to a newer machine, but.... worked just fine while running under strace. It appeared to be some kind of timing/race condition brought on by the new machine being faster to bring up a connection. strace slowed it down JUST enough to work.

So naturally, it now runs strace >/dev/null in production, probably to this day.


CTO had to make a deadline for the next morning, so he decided to override the safety system of an Autonomous Guided Vehicle of a few tons. Just bridged over it electrically.

Not only this, but he failed to inform the onsite crew that was going to put it in production.

AGV went somewhere it wasn't supposed to go, first employee pushed the red safety stop. AGV kept on riding, and thank god there was another person at the electonics panel to make it stop.

I was outraged about this, said that someone could have died, and that could mean jail time for the CTO. They said I was overreacting, and continued on their latest project that involved an AGV carrying people in an amusement park. I quited after that.


Now I am curios! What deadline needs you to override the safety system of an Autonomous Guided Vehicle?


Cashflow. Customer paid a part on the "delivery" milestone. This means the thing had to drive a bit on-site.

Problem was that it didn't want to move because of wrong interfacing of the safety system. So the solution was quick.


I read this as "The CTO was late for a meeting, and needed to travel across campus really fast"


Gas boiler in an old building with a safety mechanism that would trip and latch/lockout sometimes in high wind conditions, requiring a manual power cycle to restore heat.

Solution: Arduino and a relay in-line to cut the power for 1 minute every 120 minutes.

Planned to add a DS1820 to only cut power when required, but never got around to it.


A website that was initially written in VB6 on the back end (I know) had transitioned to C# for new development. So new functionality would be written in C#, but all the old stuff had to still work. On a page built with the new framework, the bulk of the page was going to be C#, but the "chrome" around it- the header, footer, page nav- needed to be generated with the old VB code. It turned out to be nearly impossible to begin in VB6 and call out to C# to generate the page contents, so the solution was to

1. Generate the page chrome 2. Post that HTML into the database 3. Call the new framework 4. Pull the page chrome out of the database 5. Render the page

The worst piece of the VB6 code I ever had the misfortune of working on was an implementation of a questionnaire, which would guide you through several pages of questions and keep track of what the state of the questionnaire was. A particular sequence of actions could crash it consistently, but the code was so incredibly stateful that it was impossible to work out what was going on. To add insult to injury, it didn't happen when debugging, and the VB6 debugger left a lot to be desired anyway.


After more than a year at a small company the lead developer decided to take some vacation. He revealed to me that at one point an important customer requested a daily summary feature for their account. I was horrified to find out that since there were some tricky bits he didn't actually implement the feature. Instead he signed on every night right at 9pm and hyped up the summary. I refused to take over and helped implement the actual automatic summary feature.


> I refused to take over and helped implement the actual automatic summary feature.

What do you mean you refused to take over? You refused to continue “hyping” the undeveloped feature?


"Hyping" was an auto correct typo, I didn't notice until it was too late. I meant to type "typing". My coworker was typing up a fake automated email every night as close to 9pm as possible. I didn't want to do that, since it is invasive to my schedule and would take place on my own time. Instead, we solved the problems and implemented the actual feature.


Oh boy.

I had just joined a “subscription based social site” as a backend developer. I think it was PHP 4 and MySQL 3 at the time.

Someone had realized that it would be great to have database “migrations” in code instead of using phpMyAdmin to modify tables.

Thumbs up, forward thinker!

So there was an ONE BIG migrations.php file that had our DDL in it with if statements around each one to see if it had already been run.

The statements didn’t have a version number or anything super simple and trivial. It was a combination of all the worst ways you could figure out if a column had been added to a table in MySQL.

When code was deployed, someone would visit http://example.com/migrations.php real quick to migrate the site.

Oh man, classic.


Not claiming "weirdest" by reading the other comments - but this is one off the top of my head.

Application A is multi-threaded, huge, hardly maintained. It also constantly crashes. Would've probably taken a few ace C++ programmers quite a while to iron them all out.

Solution? Write a custom watch-dog application that restarts it once it crashed. (This was on Windows, and a while ago. Being a *nix guy myself, I've no clue if something -available- like daemontools could've been used back then).

Let's say I didn't know whether to facepalm or stand in awe.


Sounds ridiculous until you realise they probably weren't given the proper time to ever fix it. What's better at that point? Restart it until it doesn't crash at a significant performance penalty, or just leave it broken?


my dad put a rubberband on an ibm mainframe printer in 75 to make it work while ibm showed up to fix it. They came in and didn't get it to work. So my dad as he needed to print millions of water bills on time put the rubberband back in and ran his job.


Glad to see confirmation of using a rubber band to fix an IBM mainframe. Been there done that, but the story is so loony I wondered if anyone would believe me. http://laughtonelectronics.com/oldsite/comm_mfg/commercial_i...


Love that you've updated your page with a reference to this discussion already :-)


I was delighted at being able to cite an independent source saying such a thing was possible!

In the 2016 HN thread "Strange bug workarounds" I posted a much gnarlier problem (and oddball solution): https://news.ycombinator.com/item?id=12485921


Read it yesterday as I was reading your previous posts :-)

(I often check the posting history when someone posts something interesting.)


My dad had a similar thing with his car. An old jalopy, I don't remember exactly what was wrong, something slipped out of place in relation to the engine, miles away from any support.

He happened to have clamps in the back, put a clamp on it and a few rubber bands for good measure. When he finally got to the garage they looked at it, laughed, then suggested that he keep it that way, given that the parts for the car were rare and the fix was reasonably expensive. I believe that he ended up driving on that for a couple of years until the car broke down from an unrelated part failure.


This reminds me of a car hack I once had to do:

https://www.reddit.com/r/Cartalk/comments/10cwhb/til_how_to_...


Email validation by inserting a record into database and rolling back from savepoint if succeeded - Salesforce does not publish their official code/regex and architects pushed it to match it perfectly with the platform. Project was done in 2017 btw.


Just recently I had to put together a web-based UI for editing vector graphics (clothing tags) and export them to pdf for printing. In order to avoid writing and maintaining two separate codebases, I re-used the client side code as my rendering code.

So I ended up with a server serving the editor frontend, as well as api endpoints that would use puppeteer and headless chrome to make a request (to itself) to load the frontend, import a given label file, render it, take a screenshot and save it to a file, then reply to the api request with the contents. So kind of a recursive api.

I had so much fun with this but I still can't believe I had to write my own svg editor and bake my own pdf generation and merging code to get the job done. It should not require two languages (node and python) and a headless browser to get the job done.


I’ve literally just helped someone do similar in their own product. If you run into issues with puppeteer’s blank or missing pages of a screenshot (for high-res images), just forgo that for canvases base64 export. It’ll save you a world of hurt!


Ha. My team implemented something very similar. We were writing a browser-based animation editor, and we needed to render a 'film strip' of frames. But we were targeting tablets, and loading a new canvas filled with heavyweight vector assets for each frame took forever. Since the animations were auto-saved to the server, it proved quicker just to spin up the app in a headless browser on the server, render all the frames, and have the client download them!


wut. why did you do this? why didn't you just handle all of it in the frontend? jspdf makes PDFs in browser.


I would have loved to, but one of the sticking points was that the label file had to be serialized, saved, and later interpolated with data from a bunch of different records, which is why I used svg over canvas, wrote my own editor, etc. jsPDF seems great for imperatively generating a pdf, but not for editing/loading template files. Plus, since the pdf generation had to happen server side, I would have needed a headless browser anyway.

It's super weird, and I've been over it from every direction but I don't know how I could have done it differently.


>Plus, since the pdf generation had to happen server side

this is exactly my question: why did the pdf generation need to be server side?


1. Client creates a custom label and we store it in s3 2. They select x inventory items + click print 3. We retrieve and send the label file + inventory data to a service that interpolates the data into each template, renders to pdf, merges them, and forwards it to our printing service.

We certainly could do the work on the client, but they're making this request in a context where neither the label nor the libraries need to be loaded; in fact, in our server-side implementation, it could be done with a single api call. Doing it all client side would strongly couple the client to the technology.

Also, we re-used this api in two applications, one of which never loaded the editor (along with its dependencies).

TLDR; a weird hack was better than violating separation of conerns.


>forwards it to our printing service.

is the printing service a real physical service?

anyway i'm doing basically exactly this same thing except all client side (i send the serialized "label file" and data to the client) but printing is being done using the user's printer


Yep, the printing service is a third-party api (PrintNode) that routes the request to one of any number of desktop clients.

The reason we did it this way is 1. we have a web app so we can't print directly to their printer without a print dialog, and 2. we want to print to potentially a different device than the user is on.


I worked at an international oil and gas company that put a high price on security in the early 2000's.

I was hired because the regular firewall/security sysadmins resisted installing tooling the director wanted that would allow them to be effectively 'monitored' doing their work on the firewalls distributed worldwide. In particular the director wanted to use Tripwire to alert when files were changed on the firewalls. He had tried to push this through 3 times before and each time it was rebuffed/scrapped one way or another.

As I went through the testing phase I took careful note of the security issues I found. When all was done I had 2 big holes I could not easily close. The first one was that from the management server (a simple Java app) you could click file/open and using the explorer window you could 'run' explorer.exe thus opening the Windows shell/GUI (as well as run command.com, notepad.exe). I closed all these with file permissions and other settings.

The final one was much harder though. You could, using the same file/open explorer window, open the log files of the GUI (with a notepad like functionality), and alter and save the logs again without notification (non-repudiation violation). The user account had to be able to write to that log for the entries to be created.

My solution was to create a long-running script in the background that would cat and empty all of the log entries every 2 seconds to another file location further limited by tight permissions only to the script account.

I deployed this in production for over a year and to my surprise it never stopped running (or more likely I thought, overflow and/or lock up the system).

My worst kludge...


The entire billing system at my first job was written as a console exe by a (brilliant) guy in during one-day session, working almost completely from live debugging breakpoints (against the production database). The result was less than beautiful, but there were rarely any problems with it. Needless to say anytime a billing inquiry came up it was his problem (which he was okay with since he owned the place).


Haha, literally writing a multicurrency accounting system from scratch at the moment. Same deal: my company, my rules. Trying to be a bit forward-looking though[0], and already tonnes of features you can't buy. Live third party market platform scraping, machine translation, github integration, beginnings of live mainland Chinese bank API integration, etc.

[0] https://github.com/globalcitizen/ifex-protocol/


A function called dig(key) that iterated over a map to find the value. It even had a comment above it that complained about the slowness of the function, and that C version was available to improve the performance. Facepalm.


Couldn't they just....use the map as a map? They probably didn't know how to use it.


running a 3rd party program called PTFB (press the fckn button) as a background process on a farm of 20 Prod web servers to automatically press "OK" on a dialog window that would pop up as a result of unhanded exception in one of our hacky conversion processes that the company didn't allow us to fix.


The one I removed a ways back that really irked me was a poor man's backup circuit: Equipment room had two electrical circuits. A consumer grade desk UPS was plugged into one, and a homemade male to male power cord went from the output of the UPS to a wall outlet for the other circuit. So if the power went out, the UPS started feeding power into the other circuit.


Memory management under PL/I.

It was a highly optimised sorting algorithm for client names, and used everywhere. It occasionally dropped a name, and I was asked to fix it.

PL/I allows you to free part of allocated memory. In this case, two names would be pulled out of the array, that part of the array would be freed, and turned into a new array instance. Pair got sorted and reinserted. Repeat until array becomes array of single arrays, then bitshift edge of array allocations to recreate array in place.


Had a server that was overheating badly and the numerous fans just weren’t working. Need to keep it running to transfer all the data to a newer machine. Went to Walmart and bought flexible dryer ducting and duct tape. Vented the AC directly into the case. Got everything transferred without having to pull the backup (which is slow). Never did figure out the fan thing but something must have gone wrong on the motherboard because all the fans were good.


I had a similar set up for a computer I overclocked to play games as a kid.


Production system that would chain-ssh through various bastion hosts to get around asinine firewall systems, eventually telnetting in as root to a non-standard port to run a script and dump data out of the files on disk of a MySQL instance backing a virtualization system. Billed based on how many of the output files that instance appeared in ("hourly billing!"), and who was the "owner" of that instance.


A long time ago (early 1990's, IIRC), I came across a Novell Netware LAN in which none of the Mac OS clients could print to a then-very-costly laser printer shared on the network if a particular a Windows 3.0 client elsewhere the LAN was powered off.

It turns out that when the costly printer was purchased, there wasn't a Mac OS driver available for it, so the people who installed the LAN created their own homemade print queue management system, relying on a batch file running permanently in that one Windows client. The batch file would take .eps files, saved on a shared folder by the Mac clients, and send them to the network printer using Windows printer drivers. Whenever that Windows client freezed, was reset, or was powered off, printing to that one printer would stop working for all the Macs at the same time.


That’s how most RIPs work.


The batch file was running on a Windows client -- i.e., on someone else's desktop machine, and the user of that machine was apparently unaware.


Once we had a problem with Microsoft' LDAP library not handling referrals correctly on a Big Corp AD forest with domains on each sites. Real headache as we were already late for meeting our deadline...

Backstory: we produced a custom software that used Windows Embedded' LDAP library to handle the LDAP part (Winldap32 library with the Winldap.h headers). The machine running our software didn't join the domain, so it only authenticated the users with the ldap_bind function.

If I recall correctly, we found the ldap32 library referral problem when we used AdInsight (by Mark Russinovich) and saw the library was poking all around the place (the other forest DCs) and never completed any of the requests. I think we confirmed with Wireshark.

The hack was in 2 parts:

1) We made a DLL that offered the same ldap_* functions as those we used in our software. The library then redirected the LDAP calls to a python script that used a native ASN1/LDAP implementation which relied on nothing but pure python code.

2) Then we made a injection software that injected the DLL in our software at startup and replaced the Winldap32 functions with our DLL functions.

We then were able to bypass the MSFT' LDAP library problem, and I think we pretty close to our initial deadline in the end. Apart from the (very small) added latency on LDAP code, everything was fine in the end.


A company I worked for in the early 2000's produced a reality tv series which was shot on an island about 1.5km out from the city shore. It needed a decent (but rather temporary) Internet connection and our company headquarters happened to be near by, so my colleagues set up a directional WiFi antenna pointed from the building's rooftop to the island. The total distance of the line-of-sight WiFi connection was about 2.5km and apparently it worked fine when the weather was okay.


Now you need to match your experience with the post above about the van stopping in the link's line of sight and tell us: - what wifi you had and - what the bandwith was.

We need to solve the riddle if it was wifi or laser link.


Really can't remember much details, but I suppose at the time all WiFi equipment in Europe was based on 802.11b (max 11 Mbit/s).


2.5km seems ridiculously far for WiFi to be effective.

Does 802.11b support this magnitude of distance?


Qemu emulating Ultrix at AWS to run software from the early 90s for dialing out to weather stations. No source code, so could not be ported. Modems were local on site and accessed via TCP tunnels over an OpenVPN tunnel from AWS.


Not the weirdest, but weird nonetheless: https://github.com/ckeditor/ckeditor-dev/blob/major/core/too...

Interestingly enough as recently as in 2016 none of the browsers' devtools were prepared for such a usage and would hang if you set a breakpoint inside any of the "tried" functions.


Here's the weirdest hack that no one thought of as a weird hack:

I worked in a vehicle assembly plant from the mid to late 2000s, and our real-time monitoring was performed with a commercial off the shelf system called Cimplicity. Cimplicity was an excellent systems integrator... for the 1990s.

Cimplicity's main problem was that it wasn't enterprise scale; updating the content of screen objects was laborious.

Someone's solution was to add a database call to every object on the screen - some screens had about 100 objects.

So on each and every screen startup, on each and every computer, about 100 database calls were made.

---

I guess that wasn't too hacky, but when absolutely everything is half hacked together like that, it's hard to what is a hack and what is normal.


Using a Remote Access Object to proxy calls between 32-bit and 64-bit code. Instead of rewriting any of the original ASP code, the lead architect made a DLL that would provide .NET 2.0 functionality to the ASP code. This was in 2014. The DLL was 32-bit and predictably ran out of memory every 30 hours. We created a scheduled task to restart the service every 24 hours, what a band-aid.


Tons of asp (and asp.net) sites are restarted on a regular basis to prevent crashes. The source of most of these problems are never known.


Actually, IIS automatically restarts the worker thread once a day by default; meaning most developers are unaware of memory/resource leaks that don't end up causing a crash within 24 hours. Got called in to look at quite a few "random" issues that turned out to be resource leaks that only showed when the site was under enough stress for it to hit the brick wall within the 24 hour recycle period.


The default IIS app pool elapsed time based recycle interval is 1740min, or 29hrs. 29 being the smallest prime over 24. Seriously.


Yup, and for good reason. It means your site won’t be down at the same time two days in a row.


Maybe not the weirdest but;

I was building the client library (wrapper) of an old external third party service we were running, for the our new system (re-write).

By the way, this old third party supposedly has REST interface which was added later, when REST was getting popular. So I needed to get all ABCs but couldn't figure out how because there wasn't any `GET /abc` for ABC but `GET /abc/<id> on the docs. So I just checked how it was done in legacy code.

Then I found a for loop 0 to 100 which was doing 100 `GET /abc/<index>`. It was working because ids were incremental and there were only less than 50 ABC records. Unfortunately there was no caching or `break;` after `404`. :(


I did some work for a major Australian Federal government department. They had a very secure environment and couldn’t (well, wouldn’t) allow webservices or anything else of that nature between themselves and other government departments. However, they did allow for email.

Luckily the ITSM system I had took incoming and outgoing email. They had a stored procedure “hook” for email. I reimplemented the SLA functionality by inserting the relevant data into the database, calculated the SLAs via the procedure.


Email: the original asynchronous work queue!


And it’s comparatively one of be simplest and most reliable transports for business data exchange.

In my first professional job, we used PKI-encrypted email as a transport for ebXML messages (order, despatch notifications, etc) between our customers ERP system and their suppliers. And in my most recent role, we received retail transaction data and returned analytics reports via email.

The great thing about protocols like this is that virtual all enterprise systems support them in one form or another. And it’s always been easier to get clients to set up exchanging data via automated emails vs SOAP or REST APIs.


I had a fairly large client that literally had an entire hacked together "CMS" they refused to move away from because it was a memorial to the late lead Dev


This was probably not considered a hack when it was implemented but at some point in the early 2000s someone deployed two binary programs on an old Compaq rack server running RedHat Linux of some version.

I think they're called lpr and lpd. But I've tried to find their source and it's not in any major printer packages. By running strings and hexdump on them I think my co-worker finally managed to trace it to a package of software made for a firewall. And embedded in there were these two programs for printouts.

SO that became well used in a major government branch that will go unnamed.

Fast forward to 2013 and it's my job to upgrade it. Replace its hardware and its OS.

Luckily the two binaries could be run from RHEL 6 but it was all on faith. I had no way to replace them or upgrade them because I had no idea what they did, what protocol they spoke.

So they're running to this day. I'm sure someone here who is more versed in printing might tell me that they implement some standard protocol that can be replaced by CUPS or something well known. I'd be thankful but tbh they work so they won't be replaced in my time with these systems.



This is an amazing troll. Bravo!


Not an actual bug, but I found this a couple of days ago. And it seems like an elegant hack, allowing `/` in directory, files, etc.

Typing `mkdir foo:bar` in my macOS's terminal leads to a directory, that Finder shows as `foo/bar`.


Classic Mac OS (<10) used : as a directory separator, so it's probably related to that.


Good luck adding that directory to your PATH :)


Such directories are non-portable and can’t be used in $PATH on any Unix, nor any other path-like mechanism like $DYLD_LIBRARY_PATH.


This wasn't production, exactly, but it was - in my own humble opinion - an awesome hack, so I'm going to post it.

Was the lead software and computer engineer on a new robot. We had decided to use compact PCI, which was at that time (late 1990s) a brand new form factor, so a lot of specialized cards weren't available for it yet. But that was OK, because the manufacturers were selling bridge cards that adapted smaller form factor cards to the compact PCI standard.

One of the cards we needed was a motor driver card. The particular card that was available to us used an LM629, a very old, widely used, digital servo controller chip. Now, whichever guy or girl designed the LM629 was an anal bastard. The chip had memory-mapped read-only and write-only registers, which was fine. But if you ever read or wrote anything out of order, or tried to read a write-only register or write to a read-only register, the chip would drop into an error state. So the device driver had to be right on the nose, the chip wasn't going to cut you any slack.

Because the cPCI standard was so new, I ended up writing the device drivers myself. Which was OK, I had just graduated with a CS degree, knew metal-level C pretty well, things should have been fine.

Except I couldn't get this motor driver card to work. Every time I tried to command the motor, the chip would generate an error. I stripped the code down more and more and more, to the point that I was only sending a single 8 bit write and then a single 8 bit read, and still the chip generated an error.

After two weeks of banging my head against it I was at the end of my rope. The manufacturer of the LM629 card took pity on us and let us bundle the robot up and bring it to their site, which was in Minneapolis. They gave us a small lab, where we proceeded to bang our heads for another three days with no progress. Eventually he took more pity on us and assigned us his digital logic expert for the afternoon.

Dude rolled in the most badass digital logic analyzer setup I had ever seen at that point. The thing took up a full-sized equipment rack. He hooked up to our board and asked me to issue an 8-bit write. That looked fine. Then an 8-bit read. He raised his eyebrows at that one - asked me if I was sure my code was correct, as he had seen a 16 bit read. The top byte of which was mapped to chip memory space we weren't supposed to be tickling.

I told him I was 100% positive that the code was not asking for a 16 bit read.

Eventually we ended up on the phone with Intel, which made the bridge card chip. They told us that their chip had a known bug; it couldn't translate an 8 bit read on the cPCI bus to an 8 bit read on the daughter card bus. Instead, it issued a 16 bit read and threw the most significant byte away. "This is documented behavior," they said. And sure enough it was, in a footnote in 10 point font on page 52 of the manual.

That left us in a quandary, since there didn't seem to be any way to fix things. Then we had a brainstorm. The fix was to cut the address lines of cPCI side of the daughter card and shift them all one line to the left, and tie the least significant address pin of the bridge chip to the most significant address pin of the LM629 address decoder logic. That way any attempt to access odd addresses got mapped into nullspace, memory space somewhere way above what the chip actually had, and the decoder logic just rejected the access request. The chip would never see it. Then we rewrote the code to make the new addresses line up with the chips newly re-jiggered address space.

Worked like a charm. We treated ourselves to a high-end steak house that night and flew home. As far as I know the robot worked in that fashion for close to a decade, until they retired it.


I know an ISP at Poland that used/uses 169.254.0.0/16 for their whole clients network. This way anytime the client does not get an IP address from DHCP (for example because DHCP server is down) the client after little longer periond of time would get a working IP address anyway :)


At amazon is was common to just restart the servers every 100k requests to deal with unsolved memory leaks.


I worked on games for the original PlayStation, and one of the requirements for release was that your title would be able to run for 48 hours without crashing (the soak test). A big issue we ran into as developers is that we only had 2 MB of memory, and unlike cartridges, you had to load all your data into RAM. Since we dynamically allocated memory, you would quickly run into memory fragmentation issues. If I have 2 MB of memory, I could have 1.5MB free, but I would be unable to allocate a 750k block because it could be laid out with 500KB free, a small block of used memory, another 500k, another a small used block, and then the rest of memory. If you end up in this position, there aren’t a huge number of good options, and several engineers started thinking about how to carefully allocate the memory so that we wouldn’t end up in that position. Instead, I found a way to fix it with a scorched earth policy. I could soft reboot the PlayStation which resets it, but avoids the Sony logo at startup. I would write the current player data into a special save game, do the soft reboot, which would launch the game. The first thing the game would do is look for the special save, and use that to skip various menus and the front end and just drop you into the game. A simple soft reboot per level increased our load times by a second or two, but safely reset our memory. Years later, after talking with other programmers, it ends up that many people had discovered this trick, and it used to get through the soak test.

As a side note, memory issues were a huge problem that is largely invisible to most software engineering. To give you an idea on how complex the solutions are, check out how Naughty Dog solved this on Crash Bandicoot: https://news.ycombinator.com/item?id=9737156


We were located on Brazil, moving some applications to RHEL servers located on EUA. My junior-sysadmin, logged as root, accidentally removed some /lib libraries ( don´t remember which exactly ) and we would be locked out of the servers if we closed the ssh shell already opened. No scp, no ftp, no kind of file transfer was possible ( due the lack of the libraries on the destination server ). and we would have to call the admins on the remote site to clean the mess, but i decided to try something: I base64-hexdump'ed the proper binaries on a local server ( same distro ) and copy/paste via the terminal to the remote server. Luckly i had busybox compiled statically on the remote server, so the base64 tooling was available... and then we had our server accepting connections again, and the other admins never knew it ever happened.


In an private cloud EHC (EMC Hybrid Cloud) implementation there was a need that deployed hosts needed to have, well a hostname with a domain, like hostname01.example.com but the problem was that EHC does not support that in that version (as stupid as it sounds) so the developer put a script in all Linux templates called /root/hostmame.sh (yes with an error) which was run by cron every minute and the only thing it did was to put the hostname01.example.com into the /etc/sysconfig/network file (RHEL/CentOS/Oracle Linux) and to invoke the hostname command with hostname01.example.com as an argument.

Every minute of every host on the private cloud ...

... but I have come upon many strange problems in various EMC products so that does not surprise me a lot.


Wordpress being used in a regulated software project.


the X Windows system (in use in a production system, post 2015)


Is there a full, complete compatibility layer between X programs and Wayland now? I know quite a lot of people who use X Windows still. It's not that unusual or weird.


Didn't Ubuntu just switch back to Xorg from Wayland?

https://www.omgubuntu.co.uk/2018/01/xorg-will-default-displa...


Making a wrapper function that locked/serialized access to printf. There was a crash that would happen occasionally in a server which always happened in printf and outputting logs and debug data. While you'd think it was a software bug in calling printf, the underlying issue was that the app wasn't linked with the multithreaded stdlib on Windows, so therefore the printf we were calling was not reentrant.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: