I am sorry but as an application developer, I think this is all wrong. I'll than...

generalk · on Sept 3, 2021

I find this response surprising, as I fully agreed with TFA.

I've had an Ops team that had a similar attitude, and they did a lot to help me become a good developer. Part of that was requiring that I come to them with identified problems. "Hey I'm getting this error, can you take a look at a stack trace in a language you've never used and tell me what's wrong?" would have gotten me booed/laughed out of the office, and for good reason.

It's not at all unreasonable to expect the developer to come around instead with "hey my application can't write to this NFS mount like I expected. It's running as $user, the permissions look right but I'm still getting permission denied. Any thoughts?" (A real situation I ran into, turns out SELinux had further permissions I was unaware of, and my Ops lead Chip was happy to show me what was what.)

Yeah, we're all on the same team, and that cuts both ways -- Ops should ensure Dev has what it needs, and Dev should make some actual effort to understand the landscape their production applications run in. Which seemed to me to be the entire point of TFA.

xorcist · on Sept 3, 2021

I've always tried to encourage the following format for all professional questions of that sort:

"a) I do exactly this, b) expected this outcome, c) but got this instead"

Short and to the point, it's remarkable how much easier it makes things for everyone. I think I got it off usenet at some time.

jsperx · on Sept 4, 2021

I work in healthcare IT and many times have heard an “I’m just a nurse” dodge, and I respond in just this fashion — “Good! Then you have just the skills needed. In fact you don’t even have to diagnose the problem, just document what you did, what happened, and how that differs from what you expected.” It works pretty well.

michaelcampbell · on Sept 3, 2021

This closely mirrors what I expect (and have in the past made this exact template for) in an error/bug report.

* What did you see?

* What did you expect to see?

* What sequence of events led you to see what you saw?

* any other info; (OS, time of day, location, anything that causes you to see something different doing the same thing)

pbecotte · on Sept 4, 2021

The author of the article had a great point. Developers often DO treat me as an IT person. The number of times I've taught a developer what an SSH key is or how to install a python virtualenv on the dev box blows my mind.

On the other hand, everything else he says is wrong. Devops isn't "ops gets out of the way and lets developers do crap and nobody owns it when it breaks" its... dev and ops WORKING TOGETHER. If you don't know what their app is doing YOU ARE NOT DOING YOUR JOB. (On the same note, if they don't know anything about how the system is deployed or the like, they're also not doing their job). The rest of the article is literally about complaints about that. The point that you can't monitor an app you don't understand, and developers don't want to push out crappy code, is the whole reasoning behind devops. The people no longer being in their own silo, but instead working together as a more holistic team to own the thing from end to end. If you stay in your silo but add automation, you are 100% going to have pain. There is another pattern they could try, thats the platform model. In that case ops can stay in their little silo and present a platform with apis that developers can build on. Its what you're talking about here, and that can ALSO work. But its a different model. The old style of ops they would do as they were told, while trying to restrict anyone from changing anything. As a platform team, now they're delivering a product. They should be talking to users, and judging throughput, and iterating quickly, and being customer focused... basically, acting exactly as the developers are supposed to be. I am a strong beliver that the platform team should have product owners and customer metrics the same as the developers- heck, if they like QA, they can come up with a QA process. But yeah, I've seen a lot of low effort finger pointing in orgs that pretend to do devops from inside their functional silos. That point, the one in the title, is a great one.

emeraldd · on Sept 3, 2021

> Part of that was requiring that I come to them with identified problems. "Hey I'm getting this error, can you take a look at a stack trace in a language you've never used and tell me what's wrong?" would have gotten me booed/laughed out of the office, and for good reason.

This a thousand times over ... If you can train your users to do this any customer relationship will be better off!

igetspam · on Sept 3, 2021

You, sir or madame, are a good job. I like working with people like you. I want to help but some things just don't fall into my wheelhouse buy when they do, we're on it. This is how teamwork should be defined.

indigodaddy · on Sept 3, 2021

Thank you for being one of the minority that do this!

greedo · on Sept 3, 2021

I think if your take away is that the author is an asshole, you might want to reflect on specifically why you feel that way. In my experience as a sysadmin, in a large company that's been trying to become a user of "cool" IT in the last decade, the article is spot on.

I think for point 1, he's trying to say that application developers aren't doing their role as both dev and QA. I've witnessed the same issue where an DBA had trouble installing Maxscale on two identical servers. He was convinced that there must be something different between the two servers despite them being created from the same template, and only differing in IP/hostname. He had done no research, opened no tickets with the vendor, but instead wasted 30 minutes of my time arguing that it's not his fault. And this is common with many of the developers I've worked with in the last decade.

For #3, I don't own the application you develop. We provide you with a platform that YOUR application runs on, based on requirements you provide. If you don't do an adequate job of providing accurate requirements, that's on you, no my team.

And #4, developers don't abstract all those things away, they often fundamentally don't understand how they work at all, so they ignore them. This ignorance has damning consequences when they make blind assumptions about how things work.

EastSmith · on Sept 3, 2021

Talking in terms of mine and yours means we are not on the same side. And this is the problem.

If there is a problem with the deploy let's meet, fix the issue and most importantly learn from the problem, and document the incident for future reference.

And them move on without fingerpointing.

CodeMage · on Sept 3, 2021

Just because I have my responsibilities and you have yours, it doesn't mean we're not on the same side.

I've come to dread cute management phrases like "everyone should pull on the rope". I agree with the sentiment, but software development is not as simple as pulling on a rope. There are lots of moving parts and lots of things to specialize in. And I say this as a generalist dev, not as an ops engineer.

I agree with TFA completely. I was interviewing for a job recently, and one of the questions I would ask when the interviewer signaled it was time for me to ask questions was "how do you handle QA?" On some occasions, this got me weird looks, because "QA" seems to be an antiquated concept.

In a similar vein, my stint at Amazon taught me that one of the questions to ask my interviewers is to tell me about their on-call rotation. Is there any? How often are you on call and for how long? Who gets paged first?

Yeah, we're all on the same side, but there needs to be some structure and order. Otherwise, you end up with something like this:

"Twenty-seven people were got out of bed in quick succession and they got another fifty-three out of bed, because if there is one thing a man wants to know when he's woken up in a panic at 4:00 A.M., it's that he's not alone."

-- from "Good Omens", by Sir Terry Pratchett and Neil Gaiman

hakre · on Sept 4, 2021

> If there is a problem with the deploy let's meet, fix the issue and most importantly learn from the problem, and document the incident for future reference.

Not being on the same side should not be the problem, it's the solution if you divide the work as each side has to treat things differently while working together.

Operations takes care to restore the operational side. The faster the better.

Developers need or normally want to tackle the issue differently and most likely after operations reports updates they can decide whether the workaround is acceptable or development wants to maintain further on.

this normally works very well this way as operations does not have the time to maintain the application but the developers have.

greedo · on Sept 3, 2021

I used "mine/yours" to denote where the responsibility lies. In a small org you can have the entire IT team troubleshoot an issue. In a large org, that's unfeasible.

I'm willing to help troubleshoot and provide guidance based on my experience, assuming the application developer has performed their due diligence. I have no insight into what their application is expected to do, or its failure modes. I have no input into the coding methods, the test harnesses, the deployment process. But when that shit breaks because the dev doesn't understand the difference between `rm -rf ./*` and `rm -rf /` that's his problem.

Now of course this is an org problem, not a team problem. As in parenting, setting boundaries and responsibilities is the key to success. Too many leaders in IT simply think that "DevOps" will be cheaper and faster and leave it at that.

_lqaf · on Sept 3, 2021

I currently manage an infra ops team. I was a developer for about 10 years.

I agree with point 1, nearly completely. A lot of developers could take a lot more responsibility for understanding the environments their applications operate in, but I get it.

Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.

Point 4, in my shop, we provide a lot of documentation and guidelines for this sort of thing. Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out. Again with the road metaphor, if you drive a semi into a single car garage, you're the idiot, not the person who built the garage.

On some of this, I'm taking a hard line. I do, in fact, end up doing a lot of troubleshooting with developers. But most of my team does not write code. If you want more senior ops folks who also have a coding background, come on over! There aren't that many of us who are any good, and I would love to hire more.

Aeolun · on Sept 3, 2021

> Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.

I don’t follow this. Developers are responsible for learning what kind of environment their application runs in, but ops is not responsible for having some clue about what they’re running? That cuts both ways, and it’ll help everyone out.

> Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out.

I find this attitude fairly common amongst ops people. They just build something that is totally inappropriate for actual usage, and then dump the responsibility for figuring that out on the developers.

philbo · on Sept 3, 2021

> Developers are responsible for learning what kind of environment their application runs in, but ops is not responsible for having some clue about what they’re running?

I don't think it's as cut-and-dried as your question frames it, but I do think there are fundamental differences between the two positions that justify some of the tension there.

The problem is the difference between domain knowledge and general systems knowledge. The former varies wildly from org to org, team to team or even within individual teams. The latter is more consistent across wider applications and over longer timeframes.

Developers usually need a lot of domain knwoledge to do their job, which can leave less space for systems stuff. But the systems stuff they do learn tends to be more widely applicable.

Ops folk often service many teams where the domain knowledge differs between them. The best of them might be able to internalise all of those differences but it's a big ask. And there's rarely any crossover.

This difference is also why developers tend to have a slower ramp-up time than ops engineers do on joining a new team. It's just the nature of the work.

I say all this as someone from the developer side of the fence. I'm fortunate to have some years in the bank now that the systems stuff comes more easily. The domain stuff remains really hard.

BurritoAlPastor · on Sept 3, 2021

Developers have more responsibility than ops for knowing their apps, for the simple reason that each developer owns a small number of apps, but ops owns the infrastructure for all the apps.

jameshart · on Sept 3, 2021

Why has your organization built a one-size-fits-all ops organization if it doesn't have a one-size-fits-all dev organization? Sounds like a failure of ops organization to recognize that the needs of the email hosting guys are different from the website team or the billing team. Maybe you should build a set of smaller, more focused ops teams focused on meeting the needs of those different groups?

kazen44 · on Sept 3, 2021

Smaller, more focused ops teams already exists, but are not bound by application boundaries but by system boundaries. (mostly, storage, compute and networking). The reason is because each of these is a completely different environment on its own.

cogman10 · on Sept 3, 2021

I completely agree. Far too many devs are clueless about how their apps perform or interact with the ecosystem. That tunnel vision has a LOT of negative consequences on infrastructure.

kcb · on Sept 3, 2021

I don't follow. Why compare a developer to the entire ops organization?

tadpole172 · on Sept 3, 2021

Because the ops org doesnt concentrate on just the one application. They have broad knowledge of the entire stack and therefore don't have as deep of an understanding on any single piece.

kcb · on Sept 3, 2021

The dev org also doesn't concentrate on just one application. I've not seen this situation where every Ops personnel is assigned to the entire stack. Each Ops employee or team in a larger organization is generally responsible for a subset of the environments.

cogman10 · on Sept 3, 2021

> Point 4, in my shop, we provide a lot of documentation and guidelines for this sort of thing. Developers are responsible for knowing if their stuff is going to fall outside of those, and come to us to work something out. Again with the road metaphor, if you drive a semi into a single car garage, you're the idiot, not the person who built the garage.

With the road metaphor, one issue I've seen is ops will create a rope bridge and get mad when devs need to drive a car over it. "You shouldn't do that! You idiot! Just walk over the bridge like we expect!"

Example: We have about 500 different applications in our company and the ops team maintains a single rabbit cluster for all apps (and everyone is supposed to use that one cluster). If an app gets too chatty on that cluster "Oh you idiot, why are you so chatty! You just sunk the organization!" Which, in turn, discourages the usage of rabbit (maybe that's the intention?)

> But most of my team does not write code.

I actually prefer this ( :D ), our ops team was a bunch of converted devs that decided the best way to do things was making a giant ops framework for all devs to follow. That ended up costing WAY more money than if they'd just used tools that were available. They fetishized trying to make everything "just one line!" which ended up breaking anytime you had a slightly different need (trying to take control right up to managing how version bumps happen).

Overly trying to force a single method of implementation has a lot of negative consequences. I prefer instead to have guidebooks and examples with the freedom to be an idiot and walk off the beaten path when needed.

kazen44 · on Sept 3, 2021

> With the road metaphor, one issue I've seen is ops will create a rope bridge and get mad when devs need to drive a car over it. "You shouldn't do that! You idiot! Just walk over the bridge like we expect!"

Well, the main problem with the "bridge mismatch" is usually that resources required for an environment are not free. Its usually the opposite, most infrastructure is rather expensive, and running multiple systems side by side because multiple developers require slightly different versions of the same thing tends to explode cost.

kcb · on Sept 3, 2021

It pains me. Just add this magic line to your pipeline and everything will "Just Work (tm)"

jameshart · on Sept 3, 2021

> You are mistaking the highway road crew for mechanics

The highway road crew know what a car is, though, right? They know that the road needs to be clear and flat and drained of water, and the markings need to be clear, so that cars can drive on it.

When the devs come to you complaining about flat tires, you can't turn round and say 'this is a mechanic issue, I don't know how tires are meant to work. They go on the bottom, right?' - you're meant to help check for rusty nails or bits of metal in the road that are causing all these flats.

'Oh, I didn't realize that was something that could cause trouble for cars'

Well then you're a pretty crappy highway maintenance guy.

pjmlp · on Sept 4, 2021

Yeah, just don't expect any sort of understanding for jammed doors, bad clutch, or overheated motor, which is the kind of complains OPs were referring to.

kcb · on Sept 3, 2021

> Point 3, at least in my shop, you're just wrong. I don't know anything about what you're writing. I probably don't even know what problem it is supposed to solve. You are mistaking the highway road crew for mechanics.

How? Honest Question. If you know nothing of the application how are you able to offer any input into the infrastructure it runs on.

Plasmoid · on Sept 3, 2021

Because an ops team will have between dozens and hundreds of apps to support. You do a survey of needs and build out something that gets to the most common use cases.

You try to respond to what people need and add things when there is enough demand. But I can't know what your business goals are, what your uptime metrics are, or who your users are.

At some point, your app becomes a black box that takes in requests, accesses DB/storage, and emits logs/metrics. I just don't have the brain space to be intimately familiar with each service.

jameshart · on Sept 4, 2021

"I can't know what your business goals are, what your uptime metrics are, or who your users are"

Do you... work for the same company? Draw your paycheck from the same revenue stream?

If you're just a black box provider of undifferentiated compute/storage why the heck are we paying you? We can buy that from a dozen cloud PaaS providers.

nucleardog · on Sept 4, 2021

We have 60+ projects out there using four major languages and god knows how many frameworks supported by a few dozen developers, and... two ops staff.

I do my best, but I'm never going to be intimately familiar with your product on a technical or business level in the way that you are when you spend 20-40 hours a week on it. If you want that level of service, you're gonna need another couple million a year in ops staffing budget.

You're paying ops because someone needs to know how to fit all the lego AWS gives you together, understand what's inside the lego pieces to debug issues when things go wrong, be accountable for ensuring best practices are implemented as far as security, backups, etc, optimize spend, and figure out how to architect all this stuff to make sense on AWS.

We could get some of that with some of the more managed services like Heroku, but at our scale the premium we'd pay is waaaaaay more than two ops salaries.

Plasmoid · on Sept 4, 2021

That only works for a small company. Once you start having multiple products then I can't reasonably be intimately familiar with your particular service. If you want a white glove level of support then you're going to need to grow the operations teams substantially.

eropple · on Sept 4, 2021

It sounds like what you want are team-embedded SREs, not a generalized ops team answerable, somehow, to everybody for their own applications.

pidminusone · on Sept 4, 2021

As an Ops, I sort of gave up telling developers they should know how computers work.

The more they need me, the bigger my paycheck gets. My TC has increase 4x last five years and I do not consider myself some infra guru by any means.

civilized · on Sept 3, 2021

If the author is an asshole, you certainly also are one by the same standard.

Developers are not just "users", they're fellow software professionals who can reasonably be expected to work harder on troubleshooting than reporting "it works on my machine but not in the test environment :(" without even reading the error message or including it in the report.

As a general rule, when you have most of the control or knowledge of a technical process and you want someone else to help you with it, you need to give that other person as much transparency and info as possible. Because they don't control the process and will have to slowly, laboriously ask you questions, or ask you to do things, rather than just probing the system themselves.

They're taking time out of their day to work in a relatively inefficient and frustrating mode just to help you out, so jeez, have some respect and try to make their jobs a little easier.

If you don't and prefer to wear this entitled attitude, fine, but you're just as much an asshole as he is.

seniorThrowaway · on Sept 3, 2021

my favorite response ever to "it works fine on my laptop/dev machine" is "let's connect the prod load balancer to your workstation and get you a pager, problem solved!"

igetspam · on Sept 3, 2021

I believe your assessment of the agreement is flawed. Application developers are not our users. You're our tenants. We provide highly available housing for your projects. We keep the lights on, we keep walls standing and we make sure the roof doesn't leak. We also provide APIs for you to interact with. When those things fail, we are responsible. When your code doesn't run in the test environment where everyone else's does, that's not our job. I'll help you but at my convenience because I have other things to do. Of your app fails in the middle of the night, that's your responsibility. If it's an infra problem, then it's on me. We don't ask you to tune the network or balance the cluster or ask you why the daemon sets are failing, right? If this was a shared responsibility, you'd be helping with the core too but I can almost guarantee that's not happening. (Some of my eng peers do but the vast majority think or it as a black box.)

czep · on Sept 3, 2021

> Application developers are your users.

No. Equating internal teams with paying customers is the very attitude that is causing these problems. Encouraging teams to think about their "internal customers" leads those customers to become entitled. We work together in the same company, our relationship is not the same as with actual external paying customers. I can't tell a paying customer that they're being unreasonable or lazy or unrealistic. We absolutely should be able to have that conversation with other internal teams when appropriate.

The post is describing the situation that has evolved as a result of QA being phased out. Telling Ops to suck up that extra work because "Dev are your users" is exactly why the post was written.

gravypod · on Sept 3, 2021

At most large companies things are organized in such a way that internal teams are your "paying users". Some internal teams at some companies even say "If you want X feature and Y support you need to request $$$ funding and N people for our team".

LambdaComplex · on Sept 3, 2021

...Isn't that how Sears went bankrupt?

gravypod · on Sept 3, 2021

I don't know much about Sears. I've mostly worked as a Software Engineer and know other Software Engineers.

_y5hn · on Sept 4, 2021

That was a passing fad 15 years ago. It didn't work out well when actual expenses got passed around.

waylandsmithers · on Sept 3, 2021

On point 3: > I likely don't even know what "working" means in the context of your app.

I think both sides can do more to reach into the domain of the other. I get it- we don't want to deal with blinking lights and they don't want to deal a missing semicolon breaking everything.

Honestly I think "that's not my problem" is one of the worst attitudes you can have as part of an organization with common goals.

nucleardog · on Sept 4, 2021

Not being able to say "That's not my job." just leads to dysfunctional organizations, burnout, and resourcing issues.

Different job roles exist for a reason. In the case of developers and ops, developers develop, ops manages ... well, usually literally everything else, and there should be _some_ overlap in the middle.

If you can't say "no, I can't do your job for you", then that area of shared responsibility just gets wider and wider and, if one side _is_ doing that (as developers usually won't step into the ops end too deep), it shifts further and further to one side. In the case of a typical organization where you have maybe an ops person per dozen or two dozen developers, that person very quickly becomes a bottleneck. That person gets burned out. You need to hire a bunch of expensive ops people to do work cheaper developers could be doing.

Literally watched the lack of role definition and a bunch of ops people that, by virtue of almost everything being their job, won't say "not my job" do this at a company I'm leaving. A couple months back I was literally writing code in one of our apps because the team that owned the project "didn't know elasticsearch and didn't have time to figure it out".

time0ut · on Sept 3, 2021

It sounds like someone who is frustrated because the process or culture in their organization has lead to point 3. I tend to involve ops before I write a single line of code and definitely before deploying to a stage environment. Over the course of a project, they help me write the runbook, create dashboards, and alerts. After all, we are all on the hook when things go sideways at 3AM. I want them to know as much as possible about how things work.

generalk · on Sept 3, 2021

This is the way.

My previous company had a HUGE problem with Devs cowboying off and doing whatever and dumping it on the Ops team at the last minute.

One of the biggest (but for damn sure not the last) issues was a dev who designed and built an entire new product around a MongoDB database, which wasn't something we had in production, and something he didn't mention during the months of development and demos to stakeholders. Week before the launch date he hits up our Ops folks to get production set up.

Ops was calm and collected about the whole thing. "We don't have MongoDB in production. Are you volunteering to learn how to correctly install it, write monitors for alerting, be paged with issues, figure out backups and how to ensure our data stays safe, secure, and available? You're not? Then get the [redacted] out and rewrite your app. Yes it will affect the ship date, and yes it's your fault."

I'd love to say we used that opportunity to shore up our processes involving kicking off new applications and including Ops folks in from day one, but that took years more.

time0ut · on Sept 3, 2021

Something similar happened at my company like 5 years ago.

A developer was tasked with adding a major new feature to one of our older monoliths. He added MongoDB as a dependency. The application already had a well managed Oracle database. Nothing about the feature required MongoDB.

When it came time to go to production, the DBA and ops teams responded similarly to how you did. I wish I could say sanity prevailed, but the business mumbled something about contractually obligated release dates and forced it through to production. Pretty sure it is still there rotting away.

I've worked mostly on the app side of things and this sort of thing just makes me shake my head.

random_kris · on Sept 3, 2021

well at the end of the day you managed to ship it? Did it cause any big problems down the line? It seems the biggest problem is that it is rotting away somewhere, which to me means that it is working without need to do much care on it.

If they listened to your DBA/ops guys no value would be gettig shipped ;)

time0ut · on Sept 3, 2021

I don't know of any big problems other than the unnecessary cost. I agree meeting the needs of the company is king, but it was just a lot of unnecessary complexity because a dev wanted to put MongoDB on their resume. Could have been avoided by talking to the rest of the team early on. Of course, they would not have liked the answer of just creating a new table in boring old Oracle.

Aeolun · on Sept 3, 2021

To be fair, when forced to choose between Oracle and MongoDB I’d also have a serious dilemma.

KronisLV · on Sept 4, 2021

> I agree meeting the needs of the company is king, but it was just a lot of unnecessary complexity because a dev wanted to put MongoDB on their resume.

Counterpoint: the dev is doing this to remain employable, so that they can ensure higher success in the future for themselves.

Their goals simply don't align those of ops and are at best parallel with those of the company as a whole - of course it's to be expected that they'll attempt to prioritize their own when there's a lack of governance and oversight within the company.

It's something that i've noticed more and more, yet is something that noone really talks about - people wanting to use bleeding edge technologies just because they're at the top of their hype curve: wanting to implement microservices when they're just maintaining monoliths and there's no need for them.

Personally, i'm an advocate of both microservices (or at the very least modular monoliths), containers and many of the new technologies, with the exception that i've initially tried all of those out in personal projects in the evenings and weekends. Yet what is the person who doesn't code outside of work supposed to do to remain employable? Would you expect a doctor to practice new types of surgery in their own time? Actually, why don't companies fund a week every few months for their developers to upskill themselves? Just a bit of time that's treated like a vacation, but during which they're expected to hack together prototypes etc.? Clearly most companies out there don't do greenfield or pilot projects, so something like this could help.

I don't think i have any good answers for this, but it definitely deserves more consideration!

oblio · on Sept 3, 2021

> Ops was calm and collected about the whole thing. "We don't have MongoDB in production. Are you volunteering to learn how to correctly install it, write monitors for alerting, be paged with issues, figure out backups and how to ensure our data stays safe, secure, and available? You're not? Then get the [redacted] out and rewrite your app. Yes it will affect the ship date, and yes it's your fault."

Love the shoot-down!

davidgerard · on Sept 3, 2021

Ops here. The threat of 3am phone calls does wonders, in my experience.

If it turns out it was product owner pressure, the product owner gets a call too. Possibly first.

Aeolun · on Sept 3, 2021

So, you could have delayed the app by the same amount but now have a mongo environment for production as well?

Seems a bit of a waste to rewrite the app instead.

Not that I would recommend Mongo anywhere, production or dev, but it would apply for any other technology for which this happened.

generalk · on Sept 3, 2021

  > So, you could have delayed the app by the same amount 
  > but now have a mongo environment for production as well?

No, we couldn't have. Not just because we didn't want MongoDB, which at the time was notorious for data loss, but because our ops team didn't have the capacity at that point in their schedule or team size to handle it. Maybe had we discussed at the beginning of the project plans could have been made or altered, but we didn't and so they couldn't.

  > Seems a bit of a waste to rewrite the app instead.

The responsible dev took the time necessary to rewrite the data layer to better reflect the needs of the application.

Is what I wish had happened. Instead the developer jammed the huge JSON blobs into a column on an MSSQL table and changed a few lines. lolsob.

jimbokun · on Sept 3, 2021

> Instead the developer jammed the huge JSON blobs into a column on an MSSQL table and changed a few lines.

Sounds like quickest way to deliver value to the customer. As described, was far too late in the process to worry about deploying with a clean, extensible architecture.

A reasonable amount of technical debt in order to ship in the timeframe available.

rswail · on Sept 9, 2021

Depends on your definition of "reasonable". You now can't leverage your DBA's skills to optimize queries, because you're using the RDBMS as a key/value store.

You're misusing a tool because you didn't do the correct application design in the first place.

NoSQL has it's place, mostly in the trash. Lazy key/value stores (which is all that NoSQL is) throw away all the benefits of relational logic for a glorified combination of a file system and grep.

That's not "delivering value to a customer", that's delivering crap.

Standard "Agile" response. It was only "far too late in the process" due to a complete lack of process, oversight, product quality ownership and capabilities.

If nothing else, that developer should be "counselled" as should the PO, the Scrum Master and anyone else involved that allowed the situation to occur.

And the ongoing capex and opex for the additional unbudgeted support should be pushed back on the PO as a requirement to fix.

kazen44 · on Sept 3, 2021

except that shipping something with semi-broken infrastructure leads to losses down the line.

What if your mongodb database drops its data and now you have production impact? Are those losses calculated while making these decisions during development.

Aeolun · on Sept 3, 2021

> because our ops team didn't have the capacity at that point in their schedule or team size to handle it

Lol, I get your point, but that was also true for the dev organisation. Hence what you ended up with.

I doubt the needs of the application included a rewrite in MSSQL.

jodrellblank · on Sept 3, 2021

Pet hate: they're not operations' logs, they're developer logs. Developers write the code to create log messages on the principle "more is better". Logs are another example of the systemic hoarding problem with people and computers.

They're a ratchet pattern, adding more is easy but once they exist it's very difficult to find someone with the authority to authorise removing them and the willingness to stick their neck out and declare that they aren't required and the willingness to spend time on low-importance maintenance. As a consequence logs build up until something gives and they become high importance urgent failure. The middle bit where they "aren't important" but they still waste storage space and networking bandwidth and processing power (and money) and when there is something to debug they waste people's time because the important details are needle-in-haystack among tons of low-value filler, all gets ignored.

At the limit, it isn't sustainable to print the complete internal state of a system at every clock cycle. It "should" be possible to do a lot better troubleshooting_power-to-log_weight ratio than "print every state change which feels important at the time in whatever semi-English message format is convenient", shouldn't it?

_y5hn · on Sept 4, 2021

It speaks volumes Ops are even bothering with monitoring those "logs".. But people get ousted for demanding better!

sgarland · on Sept 3, 2021

> And yes, as developers, networking or physical drive space are things that we tend to abstract away.

Why is it Ops job to guess at your application requirements? You have the best understanding of what setting LOG_LEVEL=DEBUG is going to do to disk requirements.

rswail · on Sept 9, 2021

If you need to run with anything more granular than INFO, then it's not ready for production.

And logs are not tracing or metrics or canaries or any of the other required operational monitoring capabilities that an application requires.

jimbokun · on Sept 3, 2021

In theory, but pragmatically it's irresponsible to assume without running in staging or on a subset of production resources to monitor and see what actually happens.

_y5hn · on Sept 4, 2021

It's fully reasonable to solve the halting problem of your black box application.

protomyth · on Sept 3, 2021

QA teams sure do buffer a lot of crap. They also cost a bunch and slow down time to release.

If your QA team is slowing down releases then that is the developer's fault not the QA team. Frankly, this move fast, don't do proper QA is irresponsible and a danger to users.

marcosdumay · on Sept 3, 2021

They add latency, there's no way around it. Even if there are no software problems and their verification is instantaneous, QA by itself adds an extra hand-off to a team with an independent task queue.

exdsq · on Sept 3, 2021

If your QA team is a "thing" that gets features at the end of a sprint and churns out bugs or releases you're doing it wrong. They should be involved on a feature by feature basis working alongside the developer with QA time incorporated into every task. All unit/integration/system tests should be automated during the cycle so there is no "hand-off to QA". There should be less latency because you have a test expert speeding up implementing tests or being a force multiplier to developers by acting as an internal consultant who can advise on bits where needed.

icedchai · on Sept 3, 2021

Seriously, I haven't worked at a company with a QA team in almost 10 years. Do these actually exist anymore? It would certainly be nice to have.

Aeolun · on Sept 3, 2021

They do! They’re really good at their job but definitely slows down releases.

Then again, the entire point is to release after all the bugs are fixed, not to get all the bugs into production as quickly as possible :)

protomyth · on Sept 3, 2021

I guess it depends how you count a release. I think these fast moving teams spend more time in production debugging than the QA team adds. Shipping it should not be the final determination of release time.

I wish more companies valued QA teams, then maybe I wouldn't get so many notices of security breaches and need to keep checks on my credit.

mateo411 · on Sept 3, 2021

Security breaches are the responsibility of the InfoSec team. The QA team usually won't have the skillset to find security issues.

icedchai · on Sept 3, 2021

Or maybe you still would. Are most QA folks actively looking for security issues?

protomyth · on Sept 3, 2021

Some of the bonehead stuff will be caught by QA, but there are folks on some QA teams that get security. Sadly, developers talk down about QA so much that the people we need on QA teams are not going to go there.

Aeolun · on Sept 3, 2021

Not really. QA is functional. We have a product security team doing pentests on new and updated applications.

_AzMoo · on Sept 3, 2021

We have a fantastic QA team, and they test everything that goes to prod. Definitely slows things down (by about 1/3) but our user experience is significantly improved because of it. IMO a good QA/test team is critical to delivering an excellent user experience.

pjmlp · on Sept 4, 2021

Yes they certainly do, am I am thankful they are part of our team.

On our teams, we are all T shaped, so whoever is low on tickets might temporarily jump into the QA team as well.

emmelaich · on Sept 3, 2021

I totally agree with TFA; except it was ever thus. (And agile has helped reduce the problem if anything)

As an ops person I've had to explain the devs own architecture to them; they didn't know how it sent mail -- nothing to do with SMTP; they just hadn't shared the knowledge among themselves of the db/java app interaction.

I once had a developer tell me ridiculous things like "my java app can't write to java.tmpdir". They couldn't even tell me what file they were trying to write. I had to dive into apache docs and send it to them. I turned out to be a bug in an apache project code, nothing to do with tmpdir writeability.

The lack of basic responsibility and ownership was appalling.

_y5hn · on Sept 4, 2021

1. This is why DevOps means working together, and not siloing different components of the same solution.

2. QA is still needed sometimes. The tedious tasks need to be automated, but that never gets prioritized.

3. Traditional Ops only cover non-functional and operational requirements. What is agreed and documented with Ops?

4. Ops are rarely invited in early, if ever.

The problem is the siloing, aka not doing DevOps.

tucosan · on Sept 3, 2021

Wow. Starting your argument with an ad hominem attack qualifies you as one of those people I will never ever want to work with.

jameshart · on Sept 3, 2021

This isn't an ad hominem.

Ad hominem is where you aim to refute the argument someone is making by impugning the character of the person making it. Obviously, it doesn't follow that because someone is a bad person, that what they say is wrong.

"Don't believe this guy, he's an asshole" would be an ad hominem argument.

But this is not an ad hominem argument being made here. They are, instead, making the logical claim that, based on the attitudes described, the person comes across as an asshole.

They aren't then saying that that invalidates their arguments - they are taking the author's arguments at their face, and inferring the person's character from them.

Which seems to be a mode of argument you're comfortable with, since you've just done the same to the person you replied to.

runnerup · on Sept 4, 2021

I really appreciate this in-depth explanation.

znpy · on Sept 4, 2021

> 4. And yes, as developers, networking or physical drive space are things that we tend to abstract away

Seen from the other side, this is an insane approach.

If you cannot even estimate how much resources over time your code is going to use are you really a professional?

Really, I'm not even talking about a precise estimate, just a ballpark estimate.

mdekkers · on Sept 3, 2021

> wait a minute, you're going to blow up our logs.

   You really need to have that pointed out to you?

0n34n7 · on Sept 3, 2021

Agreed. Good application code often contains edge case handling, build time checks, unit tests and defensive flows that handle the unexpected so that users don't wake you up at night. Why can Ops not do the same? Why can Dockerfiles / Orchestrators / CI / playbooks not also implement sanity checks on deployments?

"Ooops... deployment failed. While deploying your artifact we found the following:

- Nothing is listening on the nominated port

- Your deployment is utilizing 100% CPU while idling

- We detected an abnormal volume of write operations to the mount

Please fix these issues and re-trigger the pipeline at your earliest convenience.

Regards, Ops."

clipradiowallet · on Sept 3, 2021

> - Nothing is listening on the nominated port

Now that just shouldn't happen... ie, we(ops) aren't going to deploy something that doesn't come with healthcheck(s). The healthcheck never passing(port isn't listening) is going to stop the deployment from ever completing. Ops job is to push back on developers if they try to hand us something like this to build a pipeline for. In my company, to hand Ops the name of a repo and say "build a pipeline"...there are a lot of requirements, and the biggest one is a list of SLAs. That list of SLAs is how we build monitoring for your application, and one of those should always be a list of port(s) and protocol(s) that are exposed; we build monitors against those.

jameshart · on Sept 4, 2021

I've been struggling a little in this thread trying to understand what these 'ops' teams as described are doing.

Being a human kubernetes seems to be the crux of it.

In my company, to hand 'ops' (kubernetes) the name of a repo and say 'build a pipeline', it's basically a matter of committing a gitlab-ci.yaml file.

clipradiowallet · on Sept 4, 2021

It varies with the company you are at... but building pipelines is just one of the ops tasks typically. Ie, in your gitlab-ci.yaml example, ops would have given you the template (assuming you were going to be the one committing it) that your pipeline had to follow - so it was uniform with the thousands of other pipelines in the company. It's unusual at most of the shops I've been at for devs to ever build their own deployment pipelines. That might be OK at a smaller shop, but once you have 100+ deployments of any type, the devs have lacked ability to keep it all uniform at scale.

A better way to put it... most of my colleagues in an Ops roles already did development for 10-15 years, and moved on to developing the tools to deploy other people's products. Additionally, kubernetes isn't everywhere - they also build pipelines that produce AMI's, GCP images, and write the terraform/cloudformation/HEAT/etc to deploy those things. If you wonder "who automated blue/green deployments?", that's your ops team.

Also, in your example of "human kubernetes" - ops builds those clusters, and monitors those node pools. If you half less than a dozen clusters, or less than 100 nodes among the pools - you might not even have an ops team.

jameshart · on Sept 4, 2021

What’s the value of uniform pipelines across thousands of projects? You’ve now got thousands of teams who don’t really understand how their stuff is deployed because someone handed that to them on a platter, and everyone is subjected to the uniform constraints and complexities of the shared solution whether they need to be or not…

What seem like efficiencies can rapidly become barriers.

It’s like any case where two systems share a requirement - you can factor it out into a shared library or you can duplicate the code; in the case where the common requirement is only ‘coincidental’ not ‘instrumental’, you are better off duplicating the code so that the two systems can evolve independently and not take a coupling to a shared dependency.

The same applies to infrastructure. Sure, you’ve got a dozen clusters, and it seems efficient to have one team set up and operate all of them - but are you sure the efficiency of one big team is better than twelve much smaller teams, closer to their dev orgs, who each run one cluster more tightly suited to that org’s needs?

How far down can you push that decentralization?

With smaller and smaller units of cloud compute and storage being available as services, the answer is increasingly ‘all the way to each individual application’.

jensensbutton · on Sept 3, 2021

> Why can Ops not do the same? Why can Dockerfiles / Orchestrators / CI / playbooks not also implement sanity checks on deployments?

All of those things were written by developers.

rswail · on Sept 9, 2021

Or you adopt SRE and the necessary guard rails, guidance, non functional requirements and gates so that PO/PMs cannot overrule them.

seniorThrowaway · on Sept 3, 2021

"Oh those are normal errors" - Every developer I've ever worked with

jasonlotito · on Sept 3, 2021

I am sorry but as someone who has been on both sides of this, I think this is all wrong. And I thank god both my app developers and operations people aren't assholes like you.

Hey, that's a pretty shitty way to start off a comment, don't you think? With a personal attack?

1. Yes. But it's not operations problem if you are whining that your PS5 game isn't running on the XBox. There is personal responsibility in this, too, and it's not operations job to hold your hand and explain how to do your job. If you aren't reaching out to operations to make requests, they aren't going to know what to do. Your entire comment shows that you think they are subservient to you, rather than you actually being an honest user. Tell them what you want, and work with them to get it.

2. QA teams do not slow down time to better quality releases. They do slow down time to half-baked or buggy releases. Regardless, the number of app developers to operations people is generally a very bad imbalance. I promise you, the good ones are working with the people that reach out to them.

3. Maybe if you invited the operations people earlier, they'd have some ownership in the product. But usually they release it without operations even knowing, and suddenly there is something in production that is half-working. They had no hand in it. They literally did not work on the project, so they can't know.

4. You can't abstract away things if you don't know how they work or account for them. Again, inviting operations people to earlier discussion is incredibly easy. You know what projects you are working on, they tend to not because there are far fewer of them than there are application developers. So, it's on you to reach out to them to get input. Yes, they have to make themselves available, but you have to invite. And guess what? When you do that, you get a wealth of information and makes the product better.

Your comment feels like someone who is used to expecting perfection from others while accepting their own mediocrity.

Wow... ending a comment with an insult is rather shitty, too. Why did you decide to go the route of writing a comment that starts of shitty and ends up that way?

Personally, I did it to hold up a mirror to you.

cf499 · on Sept 3, 2021

"Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs."

"Maybe if you invited the operations people earlier, they'd have some ownership in the product."

Awww... You like each other but none of you dare to make the first move :D

ozim · on Sept 3, 2021

Whole thread reads like bunch of guys shouting at each other "but but ... I know better!".

IMO this is main topic of the thread and of the article.

There are groups of people who instead of spending time to figure out how to work together and understand what other side has to say, they just throw shit over the fence.

Maybe some could start by reading points at least couple of times and try to understand instead of trying to write personal experiences as fast as they can in reply to other comment that hurts their ego.

_y5hn · on Sept 4, 2021

The gap is a leadership problem, someone is accountable for all the "shipping". It's a made by divide and conquer strategies (more "abstraction" at higher level).

But keep blaming the war on peons.

ozim · on Sept 5, 2021

It does not work in healthy companies.

Maybe it works if you assume all employees want to do bare minimum and don't get blame for what was not delivered.

What I see most of the time is that people want to deliver, people want to be valued by their work.

Of course I am cynical as the next guy from me in terms of "getting on high horse" but there is a lot of people who want to do their job and want to do it well.

Playing divide and conquer, playing non-existing scare deadlines is going to work once or twice and any smart employee will leave after that kind of crap. Other option is you are going to get smart employees who cannot afford to leave, but because of that crap they will just stop giving any fuck.

I sign up in reality into "self fulfilling prophecy employee", when you treat your employees or other people with expectation that they are thieves - in the end they will steal from you.

If you treat your employees as if they suck - they will suck.

Of course there are bad apples but if one goes the road that everyone want's to rob him, he will get robbed.

happymellon · on Sept 3, 2021

Ops/Infra teams don't usually start software projects and not aware that there is a project for them to offer their help with.

My experience has been that they can be very accomodating and supportive if you do talk to tham.

Aeolun · on Sept 3, 2021

Yeah, all those devs are just sitting there at their desks clacking away on their novels.

There is always a project for them to offer help with, since the business will not suffer devs to be idle.

jasonlotito · on Sept 7, 2021

So instead of devs inviting ops people to the meeting they want ops people at, you are saying ops people should go up to devs while they are working and offer to help, even when they aren't needed?

No, that's not smart.

_y5hn · on Sept 4, 2021

Help offered, but still radio silence.

Anyways, Ops have their own responsibilities and none of the core ones are centered on devs.

jasonlotito · on Sept 7, 2021

> Awww... You like each other but none of you dare to make the first move :D

Only if you ignore reality. If you hold a meeting and don't tell me about it, how can I attend? Both sides want the ops people involved. Maybe the one having the meeting should invite them.

izacus · on Sept 3, 2021

> And I thank god both my app developers and operations people aren't assholes like you.

Uhh... can we chill with the personal insults a bit?

d--b · on Sept 3, 2021

Cause the whole article reads like "developers are whiny assholes who don't know shit about computers". And yes, it starts with an attack and ends with an attack too.

1. It's not operations problem for sure, but I certainly don't bash people for not knowing things I am the expert of.

2. Fine

3. The OP's saying he doesn't want to know!

4. Well, writing applications is sitting atop a stack of technologies more and more abstract. A developer not knowing what happens in an IP packet is the same as an infrastructure guy not knowing what happens in an NP junction.

sgarland · on Sept 3, 2021

> A developer not knowing what happens in an IP packet

I don't care if devs understand IP packets, TCP congestion control algorithms, or anything similarly low-level. If they do, that's awesome, but it's not expected. I do expect them to have a basic understanding of expected latencies for intra-DC vs. internet, why running Flask in production isn't a good idea, and if they're really sharp, an inkling of how Kubernetes networking works.

rswail · on Sept 9, 2021

I do. Why should devs not understand the environment in which they are developing? If they don't understand the concepts then they're not developers.

Why do we call these people "software engineers" if they're not engineers. There is endless documentation, there is all the resources needed. Allowing devs not to know what they're doing is a complete failure of anything approaching professionalism.

arwineap · on Sept 3, 2021

I think I understand your sentiment, but what's wrong with flask??

kazen44 · on Sept 3, 2021

i assume the poster means running flask in production without something like nginx in front to serve as the webserver.

the flask build in webserver is not production grade software in my opinion.

Cyphus · on Sept 3, 2021

It is also the opinion of the people who wrote the built-in webserver. If you try to run it in production mode, it'll emits this warning on startup:

> WARNING: This is a development server. Do not use it in a production deployment. > Use a production WSGI server instead.

I don't expect junior devs to have a sense for what is production-grade and what is not, but if they try to ship software that explicitly warns against being used in production, you've got a real liability on your hands.

civilized · on Sept 3, 2021

> Cause the whole article reads like "developers are whiny assholes who don't know shit about computers". And yes, it starts with an attack and ends with an attack too.

There is no attack in the text. There is a complaint that issues presented to operations often lack the basic level of detail and due diligence that they should have. You are free to disagree with the author's expected level of due diligence on issues; I think you'd be wrong to, but you can. However, it isn't an attack.

You perceive a non-attack as an attack, and respond with an explicit attack and name-calling. That actually makes you the aggressor.

Hmm, who is the asshole here?

burnished · on Sept 3, 2021

It read more like "these developers are asking poorly formed, difficult to answer questions", and frankly reminded me of a LOT of r/CodingHelp problems I've seen lately. Aside from that the author seems to repeatedly have empathy and admiration for developers but thinks that there is a systematic disfunction. There is definitely a little "old man shouts at clouds" too, but at least to me this article read as a legitimate discussion of some pain points, certainly not a hit piece.

Aeolun · on Sept 3, 2021

Hmm, it sounds like the opposite to me. I find it really hard to read because of the constant ‘devs are stupid’ comments.

There is a legitimate point buried there, but I just kept seeing red reading it.

_y5hn · on Sept 4, 2021

There is an organizational gap. It's mostly everywhere, and why only DevOps can bridge that gap. It's caused by missing technical leadership.

So anything said before that happens, and it'll happen for you partly especially at first, is only going to enrage.

derefr · on Sept 4, 2021

> 3. By your own affirmation you treat applications as black boxes that should be deployed using a runbook that should just work. This is ridiculous. Application's ownership is shared between everyone who works on it.

Maybe you're imagining the wrong scale of organization, here. An ops department isn't usually "embedded" into a development team (especially when there's more than one development team!); it's essentially a company-internal Platform-as-a-Service provider that the development team deploys their app to. From that PaaS's perspective, the app is an opaque workload. It's not ops' job to fix your app, any more than it's Heroku's job to fix your app.

> Maybe if the infrastructure people were involved in development discussion earlier, they'd be able to raise their hands and say: wait a minute, you're going to blow up our logs.

"Infrastructure engineer" and "operations" are entirely distinct roles. (Maybe not at a startup with five people, but once you get to even 30-or-so, there's a clear delineation.)

Infrastructure engineers are fundamentally software engineers, who happen to know a lot about infrastructure, distributed systems, networking, etc. They know about the operational constraints of software. And as such, they usually get put in charge of release management for the software—i.e. get put in the critical path for changes—because they have an eye for what changes to the software might break the deployment.

Ops people, meanwhile, aren't anything like "in the loop" of your software engineering process. Their day-to-day is spent managing servers and various well-known software systems running thereupon (e.g. Kubernetes, Nginx, RabbitMQ, etc.) They get handed opaque components (those well-known systems, and also your app), glue them together, automate "around" those components using runbooks, document how to get things back into working order when they crash, etc.

In small companies doing "DevOps", there are no real "ops people." There are only infrastructure engineers doing ops.

When a company becomes large enough, there is a transition point where managing the servers and all the standardized stuff running on them gets too distracting to your infra engineers, and they find it hard to help with the app, because they're too busy fighting fires and doing maintenance to the operational infrastructure. At that point, you hire actual "pure" ops people, and build an actual "pure" ops department, to take that load off the infra engineers' plate, so they can get back to their true comparative advantage, of guiding the app in an infrastructure-conformant direction.

But that separation necessarily means that you now have people managing your servers who aren't engineers. They're technicians.

-----

A labored metaphor, for your enjoyment:

Your ops department is like the service center for a motorpool. The people working there are automotive technicians. They are not automotive engineers. They can't make you a car, or change the components of your badly-designed car so that they're better-designed, or tell you what your weird prototype car means with its weird nonstandard error messages.

They can do standardized probes, get industry-standard error messages out, and do things about them. They can swap out broken components for newer releases of the same components. They can replace consumables. And they can notice if something is weird in a statistical sense (i.e. if some of the weird proprietary metrics the car keeps are not within historical reference range), and point that fact out to your automotive engineer.

But you've got to have those automotive engineers, on staff, in the development team, to deal with that information.