I think people in this industry make using complicated, powerful paradigms part of their identity. They don’t feel like they’re important unless they’re reaching for N-tier architecture or exotic databases or lambdas or whatever else it is.
Most apps I’ve worked on could have been a monolith on postgres but they never ever are as soon as I’m not the sole engineer.
The architecture is the function of number of people in the system. How do you manage 100 people in a monolith? 250? What if one group gets to a broken state but another group needs to release to escape a broken state?
Architecture is often solving a human problem. That said, too many teams break out way too early.
I’ve heard this and at some large engineering orgs, I’ve even occasionally seen it applied sensibly, but IME irrational hype driven architecture dominates:
15 microservices at companies with 5 engineers. Event sourcing where it made no sense. Apache Spark used on a project where all the data and processing fit on a laptop. Orchestration layers; frontends for backends; queues everywhere; the list is endless.
We agree there. Many teams went way too far in the other direction. There is a happy medium of one service per 1-3 teams with a few dedicated outliers such as third party connectors, and using libraries judiciously within larger codebases.
Just today I had to explain to a two-man team that using five Git repos and four different PaaS products for what is basically a single web control — not a whole site — is a bit nuts
They mumbled something about “architecture” before returning to tinker on their Rube Goldberg machine… which is four months late and missing half the required features.
There just isn’t any common sense in engineering and everyone applauds monstrously complex solutions.
The one lone wolf suggesting “maaaaybe this just needs to be a single project in a single repo deploying to a vanilla web site” gets suspicious looks from managers that have already approved the project opex budget.
Often the way they attepmt to manage 100 people is to split the monolith into a distributed monolith. Now you have all the same problems plus some new ones, but hey, we're "managing" the human problem.
And considering they somehow muddle along, with one person sometimes breaking everything for the other 99, and all the other problems, I think they could very well muddle along with a monolith. With 100 or however many programmers.
Yes, the distributed system with well thought out splits into services would be an improvement. But it's clearly not a necessity. So it remains that some places at least, use it out of some other reason - fad, cargo culting, whatever.
Architecture should be solving the human or other problems, definitely. But how often it does... I guess each with their own experience.
Now, how do you explain to that executive that they will not get that feature in 3 months but rather in 5 years if the company survives that long with work screeching to a halt?
You have turned 20 teams working on 20 different focuses into 1 team. The focuses are interconnected but 90% independent. One team is working on billing. Another team is working on admin. Another team is working on a feature that has blocked five deals this quarter totaling $800,000 dollars in potential contract. Another team is working on imports from other external systems. Another team is working on exports to CSV and Looker and other platforms.
Yet another team is working on a feature that is only connected because they are the same user base but otherwise has no relation. Another team is directly tied to all of the same data, but could be flying on their own with a reasonable set of CRUD APIs.
These all get mashed into the same codebase early on because everyone is going as fast as possible with 8 developers two funding rounds ago.
I am not excusing systems that are a microservice to a developer, or worse, but these patterns evolved because there was a need.
Now, how do you explain to that executive that they will not get that feature in 3 months but rather in 5 years
I have no idea, but I don't worry about it because I haven't been persuaded that will happen.
You have turned 20 teams working on 20 different focuses into 1 team. The focuses are interconnected but 90% independent.
I would explain that what was originally presented to me as a monolith was later explained to me to be something else:
a family of interrelated services. Then I would say that's a different problem, invoke Conway's Law, and say that they can stay as 20 different teams. I would also say that doesn't necessarily mean 20 different network servers and 40 different tiers, which in my experience is how "micro-services" are typically envisioned.
these patterns evolved because there was a need
I'm also not persuaded there was a [single] need rather than a network of interrelated needs, just as I'm not persuaded anyone here (including me) has a complete understanding of what all those needs were.
Note that I am usually on the other side of this argument, but mostly due to nuance.
In my world, monoliths are usually interrelated services that are in the same codebase and have poor boundary protection because they were started with teams that were later split along arbitrary boundaries. However, it's poorly factored because everyone has been rushing for so long that splitting it out is a giant cluster headache, and nobody can quite figure out where the bounded context is because it truly is different for each team.
So, nobody knows what all the needs are, because there's enough work for 100 people and 15 product managers, and only a handful of people in the organization have a mental model of the entire system because they were an early employee, engineer or otherwise.
So, can we agree on these architectural principals, except in edge cases:
1. A team must be in control of its own destiny. Team A releases must be independent of team B releases, and any interconnected development must be independently releasable (by feature flags by one example, but other patterns exist.) Otherwise, you get into release management hell.
2. Any communication between teams must be done by an API. That can be an HTTP API. That can be a library. That can be a stored procedure. But there must be a documented interface such that changes between teams are made obvious.
From there, I think there are options. You can have multiple teams that each contribute a library to a monolith that releases on every library change. You can have microservices. You can have WAR files in java. You can have a monorepo and each team has a directory. There are many options, some of which are distributed. However, without those two architectural principals all development comes to a halt after 30-40 developers come on board.
Microservices are used often because nobody managed to write a good set of books and blogs about how to keep 100 to 1000 developers humming along without the tests taking 4 hours to run and needing release managers to control the chaos. I don't dispute there are other ways to work, but the microservices crowd did the work to document good working patterns that keep humans in mind.
and it comes back to my original point: The architecture is the function of number of people in the system.
"So, can we agree on these architectural principals, except in edge cases:
1. A team must be in control of its own destiny...
2. Any communication between teams must be done by... an interface such that changes between teams are made obvious."
Sure. You'll get no argument from me on these points...
"it comes back to my original point: The architecture is the function of number of people in the system."
...or on this one.
I will grant that a division of code along the same lines as the division of labor is both sensible and inevitable. I will also grant that 100 or 250 or 2500 or more people are sometimes needed for a firm to achieve its objectives. Will you grant that sometimes, they aren't? That sometimes, the tail wags the dog and the staff and its culture determine the architecture rather than the reverse? That sometimes, adding more people to a slow project just makes it slower? I ask these questions because in my world, micro-services have typically been narrowly defined as network servers in Java, Python, or Rust, each interacting with a database (sometimes, the same database) through an ORM, and a rigid adherence to this orthodoxy has padded resource budgets both in terms of compute and people and has sapped performance both in terms of compute and people.
It depends. Let's take a SaaS engineering department, for example.
If your customer base is tripling year over year based in the need of the market, you can end up with feature requests that would take decades even with an engineering team ten times the size. Those are often from sales on the backs of failed deals because the product didn't yet meet the client need.
If the goal is to keep the lights on and meet current customer need, you need a fraction of the total engineering team. However, we're on the message board of a venture capital site, so we can take an assumption of hypergrowth, as is the goal of a startup.
So, then, I'd argue that in growth scenarios, these people are required. That doesn't mean that they are being used the most efficiently, of course. I think this would be a main point of our disagreement.
That sometimes, the tail wags the dog and the staff and its culture determine the architecture rather than the reverse?
I agree. And some of that is the ZIRF culture as well. And I think we agree with the rest as well. However, I think I am sensing a separate point where I don't know if we agree or disagree.
Our field has not created the tools to scale from 10 to 100 or 100 to 250 well. The best tools that have been created to date have taken microservices as part of the orthodoxy. I don't think this is the only way to do it - Robert Martin has a good article here from a decade ago: https://blog.cleancoder.com/uncle-bob/2014/09/19/MicroServic...
However, everyone escaped the java ecosystem (because Oracle and because Spring, more than the language itself IMO) and solutions such as rails plugins didn't develop the rest of the ecosystem around it like AWS did with microservices.
And don't get me wrong - I'm currently living in nanoservice hell. We agree more than we disagree. However, I think we are looking at different constraints.
Were I a director of engineering at a seed funding company that was starting to feel the pain of a monolith, I'd take one engineer and create a plugin architecture that enforces APIs, and build a pseudo-schema enforced by peer review and linting - and performance exceptions must go through views (or stored procedures for creates and updates). It's painful to rename a table, but much less than moving it to another microservice. Then, I'd keep things in a monorepo as long as possible, at least until 100 people, with the rule that all things in main must be behind feature flags first and any database migrations must be independent of code changes.
But I take for granted that growing projects will usually need more people and more quickly than the architecture can easily accommodate, and I think we disagree there.
I'm having a hard time following you. All I'm saying is, I believe that all other things being equal, a more simple architecture with fewer tiers, layers, network servers, and moving parts will tend to require fewer people than a less simple architecture with more tiers, layers, network servers, and moving parts. If you're saying that isn't true in a hyper-growth startup then I guess I'll have to take your word for it as I've never worked in a hyper-growth startup (only in glacial-growth non-startups).
It took me a bit to realize the author is selling me something. I guess good job there sir.
I’ve built a bunch of distributed architectures. In every case I did, we would have been better served with a monolith architecture and a single relational DB like Postgres. In fact I’ve only worked on one system that had the kind of scale that would justify the additional complexity of a distributed architecture. Ironically that system was a monolith with Postgres.
> In fact I’ve only worked on one system that had the kind of scale that would justify the additional complexity of a distributed architecture. Ironically that system was a monolith with Postgres.
This...doesn't seem to support your case at all? Maybe if you'd turned all those distributed architectures into monoliths you would've then thought the distributed architecture was justified (since you have a 1 of 1 case where that was the case).
I'm guessing the truth is somewhere in the middle, but unfortunately it's not very useful to the reader to say "well, some systems are better distributed, some systems are better as monolith". The interesting question is which is which.
> This...doesn't seem to support your case at all?
Hm ok well I am not sure what you mean but its the internet so... <shrug>
What I am saying here is that you would be shocked at how far you can get with a simpler architecture. Distributed systems have massive trade offs and are the kind of thing you shouldn't do unless you are FORCED to.
> justify the additional complexity of a distributed architecture
Was your system built in a vacuum? I have to go back about 10 years for a system where we could choose for it not to be distributed (i.e. succeed or fail based on how well it handled messaging between partners)
So something goes wrong, and you need to back out an update to one of your microservices. But that back-out attempt goes wrong. Or happens after real-world actions have been based on that update you need to back out. Or the problem that caused a backout was transient, everything turns out to be fine, but now your backout is making its way across the microservices. Backout the backout? What if that goes wrong? The "or"s never end.
Just use a centralized relational database, use transactions, and be done with it. People not understanding what can go wrong, and how RDB transactions can deal with a vast subset of those problems -- that's like the 21st century version of how to safely use memory in C.
Yes, of course, centralized RDBs with transactions are sometimes the wrong answer, due to scale, or genuinely non-atomic update requirements, or transactions spanning multiple existing systems. But I have the sense that they are often rejected for nonsensical reasons, or not even considered at all.
I mostly agree. But I work at a place where scale precludes that, and as a result relational concepts are sneered at. It turns out that pulling atomicity concerns into a pile of Java code leads to consistency problems…
That is remarkably dumb. Don't use an RDB if it doesn't fit your requirements. But "relational concepts" covers a lot of valuable ground, and rejecting them out of hand is a combination of ignorant and dumb.
Listen kids: Codd's 1970 paper on the relational model of data was absolutely revolutionary. Data processing involved low-level navigation of records. Separation of logical concerns from physical concerns (like 80-column cards) was regarded as crazy. Codd's message was this: a set-oriented data model, manipulated by a high-level, non-procedural language was the right way to process data. And yes, a lot of practical details need to be worked out, but it's still right.
And then followed decades of research into the implementation of relational database systems, which gave us working systems that fully realized that set of ideas. It was an incredibly bold research program, which delivered, and gave us incredibly valuable technology such as:
- SQL (love it or hate it, it's a mostly non-procedural high-level language that does a great job of data processing).
- Query optimization.
- Transactions, with levels of isolation and many different implementation approaches, most of which have proven practical at one time or another. Also, early work on distributed algorithms, e.g. two-phase commit.
Ignore these "relational concepts", and you will be reinventing it, badly, and at great expense.
I feel like basic 1:n join concepts have gotten lumped in with trivia about crazy left inner joins with subselects that coalesce and do string ltrimming, all using vendor extensions.
If I write those articles, I admit I will have used ChatGPT to help me do the research. After a lot of background work, here's a snippet from what it wrote insofar as the Apple II relates to ORMs:
"So yes—tongue firmly in cheek—the Apple II is responsible for the Vietnam of Computer Science. It set a generation on an imperative path, which led to a decade of trying to bend SQL to our will with ORMs, with mixed success and plenty of scars."
Likewise, here's a snippet from what it wrote about the gender imbalance.
Let’s be clear: no single piece of hardware "caused" the gender gap in tech. But the Apple II symbolizes a turning point — when computing left institutions and entered homes, and when existing cultural biases about gender were quietly baked into the code of who got to belong. It's not about the machine itself — it’s about what we did with it.
> In the beginning (that is, the 90’s), developers created the three-tier application. [...] Of course, application architecture has evolved greatly since the 90's. [...] This complexity has created a new problem for application developers: how to coordinate operations in a distributed backend? For example: How to atomically perform a set of operations in multiple services, so that all happen or none do?
This doesn't seem like a correct description of events. Distributed systems existed in the 90s and there was e.g. Microsoft Transaction Server [0] which was intended to do exactly this. It's not a new problem.
And the article concludes:
> This manages the complexity of a distributed world, bringing the complexity of a microservice RPC call or third-party API call closer to that of a regular function call.
Ah, just like DCOM [1] then, just like in the 90s.
I only ever played with DCOM and Transaction Server, and never in production, but I do wonder what about that tech stack made it so absolutely unworkable, and such a technological dead-end? Did anyone ever manage to make it work?
What I remember is that there were social reasons, market reasons, and technical reasons that MTS didn't pan out. First, Microsoft was out-of-fashion in startup culture. Second, the exploding internet boom had little demand for distributed transactions. Third, COM was a proprietary technology that relied on C++ at a time when developers were flocking to easier memory-managed languages like Java, which was or at least was perceived to be more "open." I'm sure there were other reasons, but that's what looms in my mind.
When they get to what their implementation is, I’m not even sure what it is. Like how is the following different from a library, like acts-as_state_machine (https://github.com/aasm/aasm). Are they auto running the retries - in a background job which they handle (like “serverless” background job)?
“Implementing orchestration in a library connected to a database means you can eliminate the orchestration tier, pushing its functionality into the application tier (the library instruments your program) and the database tier (your workflow state is persisted to Postgres).“
"In the beginning (that is, the 90’s), developers created the three-tier application. Per Martin Fowler, these tiers were the data source tier, managing persistent data, the domain tier, implementing the application’s primary business logic, and the presentation tier, handling the interaction between the user and the software. The motivation for this separation is as relevant today as it was then: to improve modularity and allow different components of the system to be developed relatively independently."
Immediately, I see problems. Martin Fowler's "Patterns of Enterprise Application Architecture" was first published in 2002, a year that I think most people will agree was not in "the 90's." Also, was that the motivation? Are we sure? Who had that motivation? Were there any other motivations at play?
Well, Martin's book came out after we were doing these patterns in the 90s. My teams had that motivation - data worked with logic; logic worked with UI teams. Separation of concerns and division of labour are, generally, good ideas.
ETA: one of the groups that was motivated was MS: use SQL Server + SP ; then COM in the Logic layer and then ASP in the UI.
Yes. I was happy when Fowler came out because we could all start using the same terminologies for the same things, and work from common concepts when solving the same problem.
(It didn't work out that way, though. It seemed like most people used Fowler as some kind of bible or ending point, when it should have been a starting point/ source of inspiration. Somehow it seemed to turn people's brains off, making them dumber and less insightful about the systems they were building.)
This is a good perspective - the books that were written in the early 2000s were documenting a lot of the practices that had evolved through the 90s, and giving them a nomenclature. There was a period from roughly 1995-2005 where it felt like we were evolving a real discipline of software engineering, with patterns of how to build things and a language to communicate with each other.
"ETA: one of the groups that was motivated was MS: use SQL Server + SP ; then COM in the Logic layer and then ASP in the UI."
I remember SQL Server and its Stored Procedures having a brief period of popularity because Visual Basic had been popular, in no small part to non-university-trained people starting their careers by graduating from MS Excel VBA macros to VB UIs in enterprise client-server apps, with MSSQL SP providing the logic layer. As the Dot-com boom took off, ASP made it possible to leverage that experience outside of the enterprise building internet applications, which was were the money and excitement was. I remember MS pushing COM, but never getting very far with it as it quickly lost ground to Java. Java was exploding in popularity and was a rich and fertile valley where the "kingdom of the application layer" could be built re-using paving stones from the "3-tier and n-tier" era that somewhat preceded it but was somewhat coincident with it.
I remember Microsoft being a huge marketing proponent of the 3-tier architecture in the late 90's, particularly after the release of ASP. The model was promoted everywhere - MSDN, books, blogs, conferences. At this point COM was out of the picture and ASP served as both front-end (serving HTML) and back-end (handling server responses).
If the claim is that Martin reported on, summarized, and put a name to a pattern that had been in use for some time, I would grant that, though 2002 seems very late to the party to me. I vividly remember "3-tier architecture" and "n-tier architecture" being quite current concepts by no later than 1999. Three years is a long time in tech as in life, and 2002 felt to me like a different era: post "Dot-com bubble", post EJB-excesses, post "GoF patterns", post-9/11, post Y2K, post "Bush v. Gore". By 2002, the number of tiers in your architecture was boring old news. "REST", "SOA", and "AJAX" were hot topics then, just as they would give way to "Big Data", "NoSQL", "microservices", and so on.
The reason this is important to me is because it raises within me the questions, "What was the '3-tier architecture' a reaction to?" and "Why was it so important circa 1997-1999 that there be 3 or more tiers?" I think the answer to the first question is, "The '3-tier architecture' was a reaction to the 'client-server (2-tier) architecture'." I think the answer to the second question is, "At least one of the reasons it was so important circa 1997-1999 to replace client-server architectures with 3-tier or n-tier architectures is that, for sociological and demographic reasons, there was an important new cohort of developers who wanted to use general-purpose programming languages like Visual Basic and Java rather than the SQL of the previous generation: young men who first learned BASIC when they were adolescent boys during the home computer revolution of the late 1970s and early 1980s." To the extent that's true, then it casts some doubt on the proposition that an "application tier" outside of database was based on merit. It raises the possibility that the motivation was less technological and more psychological than is usually acknowledged: as an attempt to hold the database and its alien programming model (SQL) at arm's length by people who started out with BASIC and never strayed very far from its familiar imperative model, eventually hiding it behind an ORM layer.
To the extent that's true, then it also casts some doubt on the claim in the article that "The motivation for this separation is as relevant today as it was then: to improve modularity and allow different components of the system to be developed relatively independently." I'm sure some people had that motivation and that's laudable, but it's not the whole story. There were other factors as well, some of them sociological and demographic. But, demographics change. The "Gen-Xers" who were 10 in 1980 and 27 in 1997 are at 55 now approaching retirement and are being replaced by subsequent generations who aren't hidebound by a formative experience that occurred decades before they were born.
Tying this back to the DBOS article, in general I liked it and consider it interesting technology. I just want to push back gently on a familiar tone I perceive, which tends to present whatever product or technology is being offered as somehow "standard", "accepted", "optimal", and "the natural and logical current end state of a tidy process of innovation which relegates earlier technologies as historical but no longer relevant, if it mentions them at all." The world and technology isn't that tidy, and a lot of old ideas are still relevant.
Workflows/orchestration/reconciliation-loops are basically table stakes for any service that is solving significant problems for customers. You might think you don't need this, but when you start needing to run async jobs in response to customer requests, you will always eventually implement one of the above solutions.
IMO the next big improvement in this space is improving the authoring experience. In short, when it comes to workflows, we are basically still writing assembly code.
Writing workflows today is done in either a totally separate language (StepFunctions), function-level annotations (Temporal, DBOS, etc), or event/reconciliation loops that read state from the DB/queue. In all cases, devs must manually determine when state should be written back to the persistence layer. This adds a level of complexity most devs aren't used to and shouldn't have to reason about.
Personally, I think the ideal here is writing code in any structure the language supports, and having the language runtime automatically persist program state at appropriate times. The runtime should understand when persistence is needed (i.e. which API calls are idempotent and for how long) and commit the intermediate state accordingly.
There seems to be a lot of negativity about this opinion, but I heartily agree with you.
Anytime you’re dealing with large systems that have a multitude of external integrations you’re going to need some kind of orchestration.
Anytime you perform a write operation, you cannot safely and idempotently do another IO operation in the same process, without risking a non-retryable exception event of the entire process.
Most people when faced with problem will look at some kind of queuing abstraction. The message fails, and you try it automatically later. If you’re a masochist you’ll let it go in a dead letter queue and deal with it manually later.
Sagas is one way to orchestrate this kind of system design. Routing slips is another that has the benefit of no central orchestrator and state is just carried in the routing slip. Both are adequate but in the end you’ll end up with a lot of infrastructure and architecture to make it work.
Systems like Temporal take a lot of that pain away, allowing developers to focus on writing business code and not infrastructural and architectural code.
So I am fully in on this new pattern for the horrible integrations I’m forced do deal with. Web services that are badly written RPC claiming to be REST, or poorly designed SOAP services. REST services that choose to make me to a GET request for the object I just created, because REST purists don’t return objects on creation, only location headers. Flaky web services that are routed over 2 VPN’s because that’s the way the infrastructure team decided to manage it. The worst cast I ever had to deal with was having to process XML instructions over email. And not as an attachment, I mean XML as text in the body of the email. Some people are just cruel.
Give someone a greenfield and I’d agree, simplicity rules. But when you’re playing in someone else’s dirty sandpit, you’re always designing for the worst case failure.
And for the readers that are still wondering why this matters, I recommend this video from 7 years ago called “six little lines of fail”.
bought a ten years old company, a division of a public company, some million dollars.
got an overly complex, over 30 micro services architecture, over usd20k in monthly cloud fees.
rewrote the thing into a monolith in 6 months. reduced development team in half, costs of servers by 80-90%, latency by over 60%
newer is not better. each micro service must be born from a real necessity out of usage stats, server stats, cost analisis. not by default following tutorials.
It’s telling that you revised both application architecture and org structure to be simpler and more efficient.
Microservices are sometimes a reflection of the org; the separation of concerns is about ensuring everyone knows who’s working on what, and enforcing that in the tech.
(Not defending that, it’s often inefficient and can be a straight jacket that constrains the product and org)
I've seen the opposite: single monolithic codebase, where the different bits of functionality eventually end up tightly coupled, so it's actually pretty difficult to extract bits into a separate service later, so a different type of architecture isn't possible even if you wanted to split it up.
Why do that? Well, when a big Excel file is uploaded to import a bunch of data, or when some reports are generated, or when a crapton of emails is being sent, or when batch processes are sending data over to another system, both the API and the UI become slow for everyone. Scale it vertically, would be the first thought - for a plethora of reasons, that doesn't work. There are bottlenecks in the DB thread pool solution, there are bottlenecks in the HTTP request processing, there are bottlenecks all over the place that can be resolved (for example, replacing HikariCP with DBCP2, oddly enough) but each fix takes a bunch of time and it's anyone's guess whether something will break. Suddenly updating the dependencies of the monolith is also a mess, something like bumping the runtime version also leads to all sorts of things breaking, sometimes at compile time, other times at runtime (which leads to out of date packages needing to be used). Definitionally, a big ball of mud.
Can you just "build better software" or "write code without bugs"? Well, yes, but no.
I've seen plenty of cases of microservices also becoming a chatty mess, but what strikes me as odd is that people don't attempt to go for something like the following:
* keep the business functionality, whatever that may be, in one central service as much as possible
* instead of chopping up the domain model, extract functionality that pertains to specific types of mechanisms or workloads (e.g. batch processing, file uploads or processing, data import etc.) into separate services, even if some of it might still use the main DB
Yet, usually it's either a mess due to shoving too many things into one codebase, or a mess due to creating too much complexity by attempting to have a service for every small set of entities in your project ("user service", "order service", ...).
> single monolithic codebase, where the different bits of functionality eventually end up tightly coupled, so it's actually pretty difficult to extract bits into a separate service later, so a different type of architecture isn't possible even if you wanted to split it up.
In my experience, this is the time to refactor the monolith, not try to introduce microservices.
> grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too
The document upload & send data to db + render UI are some of the primary functions of SharePoint. All done within the context of the same ASP.NET worker process with no UI slowdowns for everyone.
It's inherently async & multithreaded, of course.
What you're describing sounds like a single threaded sync solution, which we'd all agree will cause UI lag and/or timeout. But it doesn't have to be that way with a monolith.
> But it doesn't have to be that way with a monolith.
Tell that to some old Java Spring app running on JDK 8 (though I've also seen worse). It would be cool if I didn't see most of the software out there breaking in interesting ways, but it's also nice when you at least can either limit the fallout or scale specific parts of the system to lessen the impact of whatever doesn't behave too well, until it can be addressed properly (sometimes never).
Whether that's a modular monolith (same codebase, just modules enabled or disabled during startup based on feature flags), microservices, anything really, isn't even that relevant - as long as it's not the type of system that I've also seen plenty of, "singleton apps", that can only ever have one instance running and cannot be scaled (e.g. if you store sessions or any other long lived state in RAM, if you rely on data being present on a file system that's not mounted over the network and can't be shared with other instances etc.).
Your suggestion aligns well with how Ruby on Rails tends to handle this. All of the stuff in your list of workloads would be considered “jobs” and they get enqueued asynchronously and run at some later time. The jobs run in another process, and can even be (often are) on another server so it’s not bogging down the main app, and they can communicate their success or failure via the main database.
I run the tech org for an insurance company. We've got a team of about 30 folks working on a system that manages hundreds of thousands of policies. Apart from the API's we call (a couple of them internal, belonging to our Analytics team), it's one big monolith, and I don't see that changing anytime soon. At our scale, a monolith is more stable and performant, easier to understand, easier to deploy, easier to test, easier to modify, and easier to fix when something goes wrong.
I haven't noticed the same trend or evolution of application tiers, perhaps we live in different echo chambers. Teams using microsevices need to evaluate whether it's still a good fit considering the inherent overhead it brings. Applying a bandaid solution on top of it, if it isn't a good fit, only makes the problem worse.
I think the term "microservice" is useless here. It doesn't matter if you run your backend logic in a monolith or in some complex microservice architecture, because both will depend on external runtime dependencies. Especially the smallest of startups will rely heavily on external APIs to connect their monolith (which is, in enterprise terms, probably a single microservice) to external services such as stripe, to some product analytics tool, to a CRM, to openAI etc. etc. pp.
I don't know anything about this post's solution, but if it delivers on the idea to not having to worry that much about failed calls to 3rd parties (or even my own database!), I'd like it a lot. Why would you call that a bandaid solution?
Following the Getting Started[0] section it seems like DBOS requires the configuration of a Postgres-compatible database[1] (NOTE: DBOS currently only supports Postgres-compatible databases.). Then, after decorating your application functions as workflow steps[2], you'll basically run those workflows by spawning a bunch of worker threads[3] next to your application process.
Isn't that a bit... unoptimized? The orchestrator domain doesn't seem to be demanding on compute, so why aren't they making proper use of asyncio here in the first place? And why aren't they outsourcing their runtime to an independent process?
EDIT:
So "To manage this complexity, we believe that any good solution to the orchestration problem should combine the orchestration and application tiers." (from the article) means that your application runtime will also become the orchestrator for its own workflow steps. Is that a good solution?
EDIT2:
Are they effectively just shifting any uptime responsibility (delivery guarantees included) to the application process?
The point is that your application already has uptime responsibilities, so why not build the orchestration right into it instead of adding another service that will have its own uptime responsibilities?
Well, my application servers are usually designed stateless to provide sub-second responses, whereas orchestration workflows can take up to hours. I ususally scale my workers differently than my REST APIs, as their failure scenario looks quite different: an unresponse orchestration engine might just delay its jobs (inconsistent, outdated data), whereas an unavailable API won't provide any responses at all (no data).
How'd that work in a microservice architecture anyway? Does each service has some part of the orchestration logic defined? Or will I end up writing a separate orchestration engine as one service anyway? Wouldn't that then contradict the promise of the article?
It is a library you import to annotate your code. Most APIs do have a service layer that need some form or orchestration. This library makes the service layer automatically orchestrated.
Your workers too can import the library and have embedded orchestration.
In process durable execution gives you freedom for a lot longer than having to hire Airflow or Temporal. You just need postgres (which you likely already have)
> By persisting execution state to a database, a lightweight library can fulfill the primary goal of an orchestration system: guaranteeing code executes correctly despite failures. If a program fails, the library can look up its state in Postgres to figure out what step to take next, retrying transient issues and recovering interrupted executions from their last completed step.
program =
email all customers
failure =
throttled by mailchimp
Your program would gracefully handle this though, because the workflow would fail and it would retry.
This requires you to write your program in a way that does not trigger mailchimp's throttling -- a problem that would happen no matter how you write your app and a problem you have to deal with no matter how run your app.
> At the technical and organizational scale of modern enterprises, the complexity of orchestrating distributed systems is unavoidable.
*citation needed
We continue to make things much more complex than they need to be. Even better when NON "enterprise" applications also buy into the insane complexity because they feel like they have to (but they have nowhere near the resources to manage that complexity).
That would be difficult (impossible?) to prove. But if the claim is not true, a much easier thing would be to show an example of a large enterprise which did not introduce distributed processing. Is there one? I've not heard of that. Even basics like auth eventually require SSO if you want to preserve your sanity, and that's a distributed system.
Most apps I’ve worked on could have been a monolith on postgres but they never ever are as soon as I’m not the sole engineer.