That fear is good, it means you know that you're taking a risk - I'd worry if you had to touch production and didn't feel that fear.
If you only have the one or two production systems/databases, and you're having to do this more than a couple of times each year, then you could certainly benefit for improving your tooling.
But if you've inherited a large number of production systems, each with their own local database instance. Then implementing new tooling will be a long process that you might not be able to get Management to agree to.
When I have to touch a production database, I usually use the following steps:
0. Follow your organisation's Change Processes (i.e. if you need to put in a change request and get approval then do that)
1. Snapshot the server itself (VMs are great for this)
2. Backup the database (just be careful not to leave that backup anywhere open, and remember to clear it up when you no longer need it)
3. Write the SELECT version of your SQL first (If you need to delete a record from the userSessions table with an ID of 123, then start by writing
SELECT * FROM userSessions WHERE id=123;
4. If that shows you just the rows you'd expect then convert it to the delete from of the SQL (making sure that you don't change any part of the WHERE clause)
DELETE FROM userSessions WHERE id=123;
5. Get someone else, who also understands the database, to check your SQL
6. Run your SQL
7. Do all your sanity checks to make sure that production is working as expected.
8. If it does go wrong then you have more ammunition to use when trying to convince Management to spend some resources on improving your tooling for these sort of changes.
I’ve seen teams handle this exactly the same way — snapshots, SELECT first, second pair of eyes, manual checks.
Out of curiosity:
at what point did this stop feeling “rare but scary” and start feeling like “this is happening often enough that the process itself becomes the bottleneck”?
Was it:
– number of production systems
– frequency of fixes
– or cross-team coordination cost?
I’m trying to understand where that line actually is in practice.
My personal opinion is that if a team couldn't write something before AI then they should be very careful about writing it with the help of AI.
For example a team that couldn't write a new encrypted messaging app without AI, gets an AI to write them one. How do they check that the code is actually secure? Writing encryption code is very hard to get correct, in fact most humans can't get it right, and if you don't understand the intricacies of cryptography then you'll never pick up the mistakes the AI makes.
Or the Atari ST! I have one at home with 1 MB of RAM in it and it still flies. Boots up in less than a few seconds, which is faster than any of my modern PCs.
It always amazes me the number of SaaS solutions that don't implement rate limiting, then tell us not to worry about it when we ask how hard we can hit their APIs, and then complain that we're hitting their API too hard.
My favourite response from one of a suppliers who hadn't implemented rate limiting was "Please stop, you're making our database cry" :D
You can get light spreader kits for the F91-W/A158W that replace the bit of plastic that sits behind the LCD for one that spreads the light from the LED far more evenly than the stock one.
I recently fitted one on my F91-W and it certainly makes a difference, but it's not going to make the light brighter like some of the other LED mods people have done.
From my point of view the power of automation for recurring tasks is less to do with time saved, and more to do with making sure that it will get done and be done the same way every time.
Bonus tip: log the outputs of automated tasks when they run, but only send out notifications of errors - that way you don't train staff to ignore the notifications from the task just because they see it every time the job runs, and instead seeing a notification from it is rare, so they know they need to investigate.
Strongly agree. Automation is the ideal outcome whenever possible.
What I keep running into is the gray area between "can’t be automated yet" and "shouldn’t be automated". Things like reviews, checks, approvals, or manual verifications.
The notification fatigue point is especially real. If everything notifies, nothing gets attention.
Do you usually treat non-automatable tasks as exceptions, or do you still rely on routines / trust for those?
As a team we use Kanban, so everything being worked on gets a ticket and we walk the Kanban board every morning. So if a task is waiting for someone to review it, then it gets highlighted to the whole team each morning. If a task is blocked until something else happens then it gets highlighted to the whole team.
Walking the board feels a bit awkward and slow at first, but after a few weeks you find that it takes very little time. It certainly works well for us.
That is the second year of a PhD - the first year you're distracted by your literature survey of the area you're interested in. The second year is where you're trying to dig out a little niche that you can work in and expand the knowledge of what's there. The third year is where you're supposed to be writing up, but in reality you're probably still working on building up enough new knowledge in the area to actually write up.
It's not uncommon to feel the way you are during the second and third years - my advice is to recognise how you're feeling and then work out how to push forward (which is what you've already started to do).
My advice for how you can compete with large research projects full of postdocs is, don't try to. You're not in competition with them, you are doing your own research. It might be in a tiny niche area, but it's your area and it's new knowledge.
reply