> The entire thing is stitched together by spreadsheets that are parsed by Python, dropped into S3, parsed by Lambdas into more S3, the S3 files are picked up by MongoDB, then MongoDB records are passed by another Lambda into S3, the S3 files are pulled into Snowflake via Snowpipe, the new Snowflake data is pivoted by a Javascript stored procedure into a relational format... and that's how you edit someone's database access. That whole process is to upload like a 2KB CSV to a database that has people's database roles in it.
Sometimes it's hard to distinguish resume-driven development from iterative-StackOverflow-driven development.
Everything I look at these days looks like this. And most of the time it doesn't even solve the initial problem statement but everyone is too naive to even realise that.
The worst thing I've seen is a stack that parses out a file and loads it into a DB. So someone sends us a file via an expensive SFTP+S3 thing in AWS. That is then picked up by some scheduled task using a proprietary in house scheduler process running inside kubernetes. This proceeds to download the file to the local pod. Then it makes tens of thousands of API calls to match up data which cranks the CPU up on a huge database server. This breaks all the other jobs running. Then it writes another file out to S3, consuming 17GB of RAM in the process. Another process picks that up and then batches it and inserts it into the DB with no transactional stuff around it.
The original process this replaced was a copy into a temporary table and then a bit of transaction-wrapped SQL that took about 20 seconds to import + run. They improved that to 7 hours and reduced the success rate from 100% to about 80%
I am currently working with a US government system for downloading public scientific data. You select some data you want to download and add it to a shopping cart. Check out, and select 'create database'. This generates your own copy of an Oracle database, with your own credentials and hostname and db name. Connect to that and construct a query against a table that has some metadata about studies you're interested in. Using the identifiers from that table, join with a LIKE against another table for s3:// URLs. (There are no primary keys and the other table's column is not exactly the same; you need to use a LIKE. This is all documented.)
Those s3 URLs point to a CSV which contains another identifier which you use to download manifests which contains links to a web page created on-the-fly which contains to the s3 files to download. By the time you've done all this, your access has likely expired and you must start over from scratch.
But, going the other way, I worked for over a decade on Goldman Sach's SecDB system. It's a quirky steampunk alternative future that branched from our light cone around 1995. There's a globally distributed eventually consistent NoSQL database tightly integrated with a data-flow gradually-typed scripting language (and a very 1990s feel 16 color IDE). I'm sure in the late 1990s/early 2000s (before globally distributed NoSQL was popular and before gradual/dynamic typing had a resurgence) it was more like discovered alien technology than steampunk alternative future. (Also, with source code being executed from globally distributed immutable database snapshots, deployment is much nicer than anything else I've used to date. After release testing, set a database key to point to the latest snapshot, and you're deployed.)
There's a service that watches the transaction log of your regional replica so that you can make long-poll HTTP requests that return when any change matching your filter is committed. (Edit: usually the HTTP result handler is used to invalidate specific memoized results in the data flow graph, letting lazy re-evaluation re-fetch the database records as needed.)
It makes a lot of sense for a financial risk system, where you end up calculating millions of slight variations on a scenario. The data flow model with aggressive memoization makes this sort of thing much cheaper.
However, I saw plenty of systems written where you'd attempt to write your request to the next key matching some regex (and retry with the next key if it already existed), where your request would contain some parameters and the database key and/or filesystem path where results should be written.
Under-experience with databases easily results in rewriting a database using message queue/bus. Under-experience with message queues/busses easily results in rewriting a message queue/bus using a database.
Message queueing triggers a host of psychological needs. Synchronous jobs rarely need monitoring and management features, but move the same work to a queue, and everybody loses their minds.
I've seen so many people spend weeks if not months "working" to avoid doing a trivial database migration. Database fear is overwhelmingly powerful in a lot of people it seems.
> Database fear is overwhelmingly powerful in a lot of people it seems.
Database are still fairly poorly documented when it comes to administrative work.
There is an incredible amount of tutorial, books and courses on how to write sql queries and stuff... But there is almost zero content on how to properly administer a database.
I mean, from novice admin to DBA-level capabilities.
I said all this before and i'm ready to write this again: i think there's a good market space for dba-style courses.
This does not honestly seem true for the main sql databases. For everything else, yes, but if people were actually learning databases instead of hiding from them most of the uptake there wouldn’t exist anyway.
I think I've seen enough complexity created by engineering teams given total autonomy, with hands-off leadership, that I'd prefer a much more constrained approach. There should still be autonomy, of course, but proposals for new tech, languages and paradigms should only be considered with due diligence.
The most unpleasant codebases I've dealt with are ones that have suffered from a lack of strong leadership, and they are almost uniquely microservice setups that pull in everything but the kitchen sink, usually because it's just trendy to use it. Monoliths can get pretty damn ugly too but at least it's contained in one single codebase.
The reason for all the middle processes were because teams couldn't agree how to structure their data and the first app would dump literal nonsense sometimes so the Kafka connect process's job was to clean it and dump any of the nonsense they pumped into it. Pretty sure there was a gnarly log aggregation layer in the middle somewhere too IIRC.
Just two examples from my prior gig (fashion e-commerce).
#1 Our hottest dataset (db of current products) stored in DynamoDB. Core dependency for all our code. Easily fits in < 1Gb of RAM. OMG, just make a hashmap. Over a year, I managed to persuade the team to start transition from DynamoDB to Redis.
#2 Tiny (vs micro) service that munged some URLs. Blocker for an important campaign. Prior team of 4 churned for a year, was no closer to delivery. Spring, ORMs, CI/CD pipelines, the works. I spent a week unraveling the requirements (repeated facepalm). A second week banging out a trivial nodejs thing. (My team preferred nodejs, which was their prerogative.) Really trivial. I felt so bad for the biz dev people who'd been dying to get this functionality for so long.
You can actually make a very comfortable career of Senior and Staff by learning to identify this kind of work/system and proposing ways to simplify it. These kinds of systems, as the author pointed out, are incredibly expensive and inefficient, but look readable on an architecture diagram.
As opposed to all those people that make similarly comfortable careers in middle and upper mgmt by identifying simple systems and complicating them beyond recognition?
Hah, yes. I will say that while I understand the general disdain here, as I grew more senior in my career I realized the world takes all types. There are "doers" who will rush to an end goal that's highly prioritized and then there's "optimizers" who come fix that mess up into a durable, cost-effective system. Some people are gifted enough in knowledge and have the right business priority to do both at the same time, but usually they're required at different times.
Anecdotally, optimization tasks (in this brain) are multitudes easier than innovation tasks. I spend a lot of time thinking about how to do things differently whereas optimization utilizes many lessons I've learned over and over again with well-trodden patterns. That's to say, I'm grateful for the doers :)
These roles aren't opposed at all, they greatly benefit each other :-)
Bringing what used to be the privilege of upper management (wasting massive amounts of resources while getting paid handsomely) down to software developers.
It's that trickle-down effect people talked about, right?
Hmm looking at your git statistics it appears you have only pushed 60 commits this month with 12,000 lines of code changed - while Jimmy over here has pushed 200 commits this month with 200 lines of code changed.
If you do not improve next month we are going to have to let you go, we just cant carry a 0.3x employee such as yourself.
> You can actually make a very comfortable career of Senior and Staff by learning to identify this kind of work/system and proposing ways to simplify it.
I've typically worked in SRE and platform engineering work and that's where I've gotten exposed to these kinds of Rube Goldberg machines. Make a short list of them when you find them and then use them as a hit list during "cost cutting". Most people don't want to touch these systems because they look big and expansive and generally "work". They're just very poorly optimized.
Dare I say, any time I see a function as a service my brain immediately drifts to inspecting the cost implications of said process.
Most large companies. There is a stark difference between the distinguished engineers and the tier below them in terms of asking people to stop doing things badly.
Tried it. Nope. You can get people to acknowledge it but because it's not a fun project or doesn't involve an upsell you can bill the clients for, it'll go in a product backlog for a decade or two.
I don't care any more. I'm just there to tell people what's shit and then laugh when it explodes in their face.
The easy part is choosing a better end-state; anyone can do that, and for any of these Rube Goldberg machines at a large-ish company, several people likely have.
What makes someone a staff+ is finding a path to iteratively evolving towards that end-state without breaking anything along the way and while having progress to show off at each step.
I've used that video to explain to business people. It's watchable, and communicates important ideas of what a poopshow this can easily become, without having to talk about real partners/teams close to home as problems.
At my workplace (the one in the post), whenever one of the good engineers asks about how something works and it's one of these spaghetti-balls, we chorus "It's the design of our backend, okay?"
I see stuff like this every day. It is a natural consequence of people who only “develop” by gluing things together. God help them if they’d actually have to write some core function themselves.
You don't give enough credit to organization chart and project driven engineering.
When developing anything:
1) you don't get to touch anyone else's code. And another department's code? Something another manager's team manages ... that amounts to treason. Never for any reason. MAYBE if they've totally abandoned it and you absolutely need it (but only during unpaid overtime)
2) you don't get to spend ANY time on anything outside of the current project or JIRA ticket. Any time at all. So really, NOT optimizing anything is faster and cheaper. Just look at all the spreadsheets made!
I’ve had enough calls with the “Senior/Technical Lead Azure Cloud Engineers” telling them exactly what they need to do that me and them really really don’t get along.
I don’t do any of that shit and even I can muddy my way through it, but these people cannot. The real kicker of it is how much these people make.
And you know how there are those people who, every time you need to work with them, they answer a teams call and then “need to get to my computer, give me 5” and their status is perpetually set to away? I don’t want to RTO at all, but dealing with this team almost makes me think I’m wrong about that.
>It is a natural consequence of people who only “develop” by gluing things together. God help them if they’d actually have to write some core function themselves.
That's on the industry for not training and gating well. It would be nice to have glue/plumber positions so expectations are not out of line too.
They're not usually software engineers. They're tool users not tool makers.
So they'll cobble things together to accomplish the task, using only available tools and never anything custom that would do it task much more cleanly, because they understand data, not software. They're not computer scientists or programmers, they're just users. And we all know what that means.
Agreed. I've been "the backend engineer who works with the data engineers" for several years now and I've seen their general trend of re-inventing the wheel the hard way a number of times.
I've spent the majority of my career building better tools for data-related tasks, then winning over my users by showing off performance and productivity gains.
I stepped into a Data Engineering Lead role in 2019. Stepped out of it in 2021. My team was the first in the org to really approach data engineering and we were all software engineers. I'm told that the systems we built have largely been replaced by Rube Goldberg machines pieced together by the folks who came after us.
Those replacement systems aren't even working, they're failing to deliver on the same simple data pipelines that we had working by the start of 2020. They're cobbled together using a million little AWS pieces and Docker and k8s... I'm glad that I left that role when I did, we were being pushed by a new-hire with a fancy Data Engineering VP title to do all sorts of asinine things. I went and looked just now and I see that he's Senior VP at a different company, he started there this summer. Onward and upward!
And I thought my unholy xmllint -xpath (bad stuff, lots of slashes) ${1} |sed -r -e s/this/that/ -e s/alsothis/alsothat/ -e /ohyeahthistoo/somethingelse/ | grep something | while read AA; do stuff then echo ${COUNTRY},${SIGN}$(perl -e "printf('%.2f', ${VAR}/1000000)"),${ENTRYDATE}; done|sort
was as bad as things get. I need to get my horror code game up. I mean, not only is the code awful, its very purpose is horrifying (XML to CSV with some transformations, bit of math, all without being able to use any external sources due to security, only what's in a baseline RHEL7 (soon 8, yay!) ).
I promise I'll rewrite it in python at some point.
I don’t work with large databases so pardon my ignorance. Is there typically a “unit test” bucket you run it on or do you just put in test entries on a production bucket?
Normally you'd fire up a separate environment, mock the process and see if it produces the expected results. By the time you put 'test entries in a production bucket' there are so many lines crossed that it likely won't end well even if the tests do pass.
We tend to only test what is being tested. So, most DB calls are mocked in our unit tests. For stored procs or other tests that need to be run on a DB, we use a test DB that is setup to mirror production.
I'd bet there are a 100 different answers to your question though. This is the way we handle it.
You have no idea how unbelievably annoying it is to work in a company that doesn't a well defined architecture. Every "buzz word" service should be easily justified.
This is why I hate recruiters, I can't even tell you how many times I've had a recruiter call me saying they are looking for service XYZ. The same concept rephrased in my resume. I have to rewrite my resume just to satisfy these people? No thanks.
I had recruiter pull that in my most recent job search. Has to stick "C#/..." in front of everything because they didn't understand that ASP.NET, WPF, WCF, WinForms and several other C#-specific tech had anything to do with .NET.
I think it's "iterative-StackOverflow-driven development" most of the time, and that actually causes the increased popularity of those resume keywords.
I feel like I've never seen anything even reminiscent of this bad in the twelve years I've worked as a software engineer. I really want to believe this pipeline as described is satire. Yet, somehow, it does not quite seem that way. This scares me. But also somehow explains why some companies contain so incredibly much more engineering staff than I can possibly explain looking at their output.
Sometimes it's hard to distinguish resume-driven development from iterative-StackOverflow-driven development.