Hacker News new | past | comments | ask | show | jobs | submit login

What's the value of a monorepo if developers only ever check out a small subset of it? Wouldn't multiple repos allow greater scale without any practical reduction in utility?

For example, all the localisation files could live in a separate project (if we accept the need to commit them at all). Some tools would be needed to deal with the inevitable problem that developer working sets would not align with project boundaries, but that seems like an easier job than making git scale while maintaining response times.




One of the benefits of monorepo is refactoring. You can just apply a renaming command across all of the files in the solution and all the related names are properly updated.

Not that easy to get this to work on multiple repos.


I think it's important to understand how often is the OP's company benefiting from this: is it worth the trouble they've gone through through the years? I've checked the rest of engineering posts, and none of them talk of the benefits of using a monorepo explicitly.

Monorepos, like basically all solutions, solves some problems and introduces new ones which you didn't have before. It depends on each individual case which drawbacks are more worthwhile.


All developer activities related to code – refactoring, editing, building etc all happen in a developer's workspace. A workspace can be composed of multiple git repo checkouts. An infrequent activity like renaming a lot of files can be done with minor inconveniences even if they are spread across the workspace in different repos.

Only the code that is closely related – read/modified/built together frequently – should live in the same repo. If two pieces of code that don't have much to do with each other (that is, they are not read/modified/built by a developer in a single developer workflow frequently) live in the same repo, then they are just being a burden to the overall development lifecycle of devs who work on those disjoint sets of code.

The unrelated code in the same repo is a distraction to the developers who checkout that code as it costs storage space, iops, cpu cycles and network bandwidth to lug that code around, load/index in IDEs, track changes, build and discard dependency graphs by build/dependency systems etc. Then, to deal with these issues more complexities are incurred. Instead, it is better to optimise for the common case and deal with the complexity only for the rare cases.


How would you do refactoring over monorepo if you have sparse checkout?


CI/CD build the entire project. So if you make a breaking change in library the build will fail.


Sure, and what next? So considering that on big projects CI/CD may take up to an hour (in one of my projects it took 4 hours) the feedback loop would be great


> on big projects CI/CD may take up to an hour

No, it may not. Perhaps occasionally it does. That is a bug that you must fix - a pipeline that takes even 30 minutes is horrifically slow.


One of the nice things about using Bazel is caching builds, avoiding rebuilding parts of the monorepo that are completely unaffected by someone's changes.


Sure, how did we ever manage to rename something without monorepos. Oh wait, maybe that's what this "versionning" thing is all about.


Right, it's "I have to send 5 PRs to 5 different repos, get them all cross merged, and then at the end it's wrong anyways so I have to start all over".

Multirepo management is extremely frustrating compared to "it's all in the same folder".


How many scenarios are there where the rename both matters (beyond taste and philosophy) and is across interface boundaries?

Surely if it is an advantage to rename once in a ginormous, single code base there must also be leaky abstractions, poorly defined interfaces, god objects, etc present at the same time?

Whenever I find I need to rename anything across domains, it's a matter of updating the "core" repository and then just pulling the newest version.


Monorepo is not necessarily synced deployment, and even if it was, each deployment of a single component is usually racy with itself (as long as you're deploying to at least two nodes).

Which means that you've got to do independent backwards-compatible changes anyway, and that for anything remotely complex, you are better off with separate branches (and PR/MRs) anyway.

Monorepos mostly have a benefit for trivial changes across all repos (eg. we've decided to rename our "Shop" to "Shoppe"), where it doesn't really take much to explain with multiple repos, but is mostly a lot of tedious work to get all the PRs up and such.


I think that when you have large enough systems that works. I do not believe that "microservice" is the right size for repo splits.

Sometimes you have to ship a feature. Shipping that requires changing 3 parts of your app. A lot of times that _entire_ set of changes is less than 100 lines of code.

Having a full vision of what is being accomplished across your system in one go is very helpful for reviewing code! It justifies the need for changes, makes it easier to propose alternatives, and makes the go/no-go operation much more straightforward.

At a smaller scale, you often see the idea of splitting frontend and backend into separate repos. Of course you can ship an API and then ship the changes to the frontend. But for a lot of trivial stuff, just having both lets you actually see API usage in practice.

I think this is much more applicable for companies under 100 people though. When you get super large you're going to put into place a superstructure that will cause these splits anyways.


TBH, I am not a fan of frontend/backend split either: ideally, you'd still be splitting per component/service, so frontend and backend for a single thing could live in the same place: you get the benefit of seeing the API in use with each PR, without the costs of monorepo otherwise.

Most projects start out as monoliths (which is good) and splitting up on this axis is unfortunately very hard/costly.


This is why I'm a fan of tools like Bazel, where you can still get most of the tooling benefits from a single repo, but get testing speed benefits (and, if you roll that way, the design benefits from the separation) of a multirepo setup.

Unfortunately it's hard for me to recommend Bazel, it's such an uphill climb to get things working within that system.


What I took away from TFA is that monorepo management at this scale is “extremely frustrating” too.

ISTM that the complexity of managing any repo will be bounded by the size of that repo; a monorepo, being unbounded in size, will, in time, become arbitrarily complex to manage.

While a multirepo might occasionally require developers to apply changes to more than one repo at a time, I’ve never found this to be much more than a minor inconvenience; one that could be solved readily with simple tooling, if we had ever felt that the “problem” was even worth solving.


At my $dailyjob we (kinda unfortunately) went with tons of repositories and libraries upon libraries, and the only sane way for me to make changes across multiple repos is combining them into single build locally. In .NET it's not that complex - remove a Nuget dependency from your project, and add reference to locally checked-out repository and make sure you're using proper tags. It's mundane, happens to be frustrating, but I can make it work.


The process you're describing looks like some trial and error PRs...

Multirepo also allows you to roll out that change incrementally instead of big banging all the time.


Well for trivial changes it's even worse, cuz instead of "change 3 files across this boundary" it's "send 2 sets of changes to different places, babysit it until merging, then send a third PR in the integration point to use the updated vesrion and then get it merged".

Meanwhile reviewers don't have context about changes, so it's easier to get lost in the weeds.

It's not always this, of course. But I think that way too many tools are based on "repo" being the largest element, so things like cross-repo review are just miserable.


But in the monorepo you almost never can do the change in a single commit as it will cause incompatibilities during gradual deployment


Canva engineer here: we do compatibility checking of interservice contracts (Proto) to ensure that gradual deploys are always safe and can always be safely rolled back.


Google does such a thing in its monorepo quite routinely.


It takes X*N amount of work to merge change X across N repos. 1 repo just takes X.

Then there's version management. Do all your repos use the same versioning scheme? "They should", but in the real world, they sometimes don't. Whereas if you only have 1 repo, you are guaranteed 1 versioning scheme, and 1 version for everything.

How do you know which version of what correlates to what else? With N repos, do you maintain a DAG which maps every version of every repo to every other repo, so when you revert a change from 1 repo, you can go back in history and revert all the other repos to their versions from the same time? Most people do not, so reverting a change leads to regressions. With a multirepo, there only is one version of everything and everything is in lock-step with everything else, so you can either revert a single change, or do an entire rollback of everything, with 1 command.

How do you deploy changes? If each repo has an independent deployment process (if your repos even have a deployment process that isn't just waiting for Phil to do something from his laptop), are you going to deploy each one at a time, or all at once? What if one of them fails? How do you find out when they've all passed and deployed successfully? Pull up 5 different CI results in your browser every couple hours, and when one fails, go ask that team to fix something? If you only have 1 repo, there is 1 deploy process (though different jobs) and merging triggers exactly what needs to happen in exactly the right order.

The reason people use multirepos is they don't want to build a fully automated CI/CD pipeline. They don't want to add tests and quality gates, they don't want to set up a deployment system that can handle all the code. They just want to keep their own snowflake repo of code and deal with everything via manual toil. But at scale (not "Google scale", but just "We have 6 different teams working on one product" scale) it becomes incredibly wasteful to have all these divergent processes and not enough automation. Multirepo wastes time, adds complexity, and introduces errors.


> What's the value of a monorepo if developers only ever check out a small subset of it?

There's a couple of different views of this:

If the subsets are overlapping then the monorepo has had great value. Let's say you've got modules A, B, C and D. Dev 1 is interested in A and B, Dev 2 is interested in B and C, etc. In a multirepo world you have to draw a line somewhere and if someone has concerns overlapping that line then they're going to have to play the 'updating two projects with versioning' game.

The other way of looking at it is "data model" vs "presentation". Too often with git we confuse the two. Spare checkout is a way of presenting the relevant subset to each user. It is nice to be able to consider that separately from whether we want to store all that data together.


> For example, all the localisation files could live in a separate project (if we accept the need to commit them at all).

That's the wrong way to split files: it's as if you said let's split a monorepo so all the .sh files are in one repo, all the Makefiles are in another, all the .py files in yet another...

What you want is to split into "natural" repositories instead. Having 50 or 150 localisation files in an otherwise 40-file repo is not a big deal for anyone. Of course, how the split happens would have an outsized influence on the ergonomics.

Also note that localisation files are tightly linked to source code (the way they use them, similar to GNU gettext model, though they do use XLIFF): you put English strings in the source code, and when you change them (reword, fix typos, or outright change them), all translations need to get their English version updated and translations potentially needing updates marked as such. In short, they are managing their translations as source code (even if translators would be using translation tools akin to IDEs for development).


If you can checkout your monorepo as if it's multiple repos, but then also check it out as a monorepo when you want it, that seems to me more utility than splitting into multiple repos, then you can never check it out as a monorepo.


In a world where submodules worked (side note: We use PlasticSCM which has xlinks [0] which are substantially better than submodules, but Plastic itself has it's own set of problems), you could have each "subrepo" as an independent repo, and then have a monorepo comprised entirely of submodules.

If submodules worked.

[0] https://www.plasticscm.com/documentation/xlinks/plastic-scm-...


> without any practical reduction in utility?

There is a massive loss in dependency management if you move to multiple repos.

Do polyrepo build systems exist that give you the same capabilities as bazel? Particularly with regard to querying your dependency graph.


Atomic linearisable updates to your code.


The point is that developers can check out bigger subsets if they need.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: