Everything you need to know about monorepos

TekMol · on Feb 23, 2022

"Ask HN: Is there any way to detect websites that are SEO-optimized on Google?"

Unfortunately, those seem to even get into HN. This page is a perfect example. This is how the page starts:

    Everything you need to know about monorepos,
    and the tools to build them.

    Understanding Monorepos

    Monorepos are hot right now, especially among Web
    developers. We created this resource to help developers
    understand what monorepos are, what benefitsthey can
    bring, and the tools available to make monorepo
    development delightful.

    There are many great monorepo tools, built by great
    teams, with different philosophies...

I got so tired at this point that I stopped reading.

In my mind, I see the job description on Fiverr "Fast SEO writer wanted! Please write a 3000 word page about monorepos. Make sure to mention 'monorepos' and related terms like 'web developers', 'tools', 'development' etc frequently."

jdlshore · on Feb 23, 2022

What a terrible take. As someone with an interest in monorepos, who's currently working with a company adopting Nx, I found the page interesting and compelling. I spent a good 30-60 minutes following links and investigating deeper.

You didn't even read past the fourth paragraph. You should be ashamed for derailing what could have been a productive discussion about an interesting topic with your shallow dismissal.

TekMol · on Feb 24, 2022

I have to say that, in retrospect, my comment really was too harsh.

Sorry.

teekert · on Feb 23, 2022

You must be primed to detect this, I did not find this at all bothersome. I actually like the page, I always wondered if a monorepo is best for us, now I have some extra arguments to say yes if the discussion arises.

scoutt · on Feb 23, 2022

I felt exactly the same. So I left the page and went to https://en.wikipedia.org/wiki/Monorepo

skylanh · on Feb 23, 2022

I think... it has been. OTOH, if you cycle to the end, it's a collaboration by the developers or "community outreach" of those tool chains:

"The tools we'll focus on are: Bazel (by Google), Gradle Build Tool (by Gradle, Inc), Lage (by Microsoft), Lerna, Nx (by Nrwl), Rush (by Microsoft), and Turborepo (by Vercel). We chose these tools because of their usage or recognition in the Web development community."

Contributors:

- Alex Eagle / Bazel

- Kenneth Chau / Lage

- Jeff Cross / Nx

- Victor Savkin / Nx

- Pete Gonzalez / Rush

- Justin Reock / Gradle

So, yeah, it would be a focused website.

InGoodFaith · on Feb 23, 2022

You can scroll directly to the bottom and see the final comparison table.

Here's a screenshot for your convenience [1]

I agree with you that I prefer to get straight to the point, but this pet-peeve tangent doesn't seem to be a productive discussion of the actual merits of the tooling.

1: https://i.imgur.com/8Vzbh3c.png

rfoo · on Feb 23, 2022

There should be a big warning on the top:

Monorepo is a way to morph dependency management problem into source control problem within your organization. Currently, FOSS tools solve none of them.

bfung · on Feb 23, 2022

Agree - the site focuses a lot on build, but ignores scm tooling. At a certain size, git no longer works well as a pure monorepo w/o submodules and these mega companies have teams of people optimizing code time vs build time checkouts of these monorepos to handle subsections.

SantiagoElf · on Feb 23, 2022

Ahahaha, good one.

But, this is what you get when the barrier of entry for software development has been set so low :)

Millions of developers using tools that they don't fully understand to produce sub-par solutions.

I am so happy that this is the state, so people who can build working, delivered on time solutions are getting paid very well.

rfoo · on Feb 23, 2022

FWIW I like that there aren't much gatekeeping in the industry and think it is a genuinely good thing, not only because my "hobby" (application security) relies on it :)

Also, with an unsolved problem you get paid to make whatever crazy attempt you want, which is absolutely a perk.

avgcorrection · on Feb 23, 2022

What does one thing have to do with the other? Monorepos are famously used at giant tech companies. Clearly they are being introduced by tech management in those cases, not a couple of people in their garage that don’t know what they are doing.

dataangel · on Feb 23, 2022

Could you elaborate? I use a monorepo at work and if anything dealing with 3rd party dependencies is easier because you don't have to coordinate upgrading versions across teams. For 1st party stuff in the repo we don't have a need to version libraries at all, if it all builds and passes all the tests everything is good. The whole point is to use the whole tree from a consistent snapshot as a release, so you never worry about using a new first party library with an old first party binary.

jsnell · on Feb 23, 2022

If you're able to release everything as a consistent snapshot, it probably is not a monorepo. Instead it is just a normal repo containing a single big project.

Here's an example: let's say that is you've got a single product with a backend server, database, web frontend, and a iOS app. How would you release all those projects as an atomic unit?

If there's a new field in this release, the database schema needs to be changed on the servers before the backend is released. The backend needs to change before the frontends do. You have no control over the deployment speed of some of those components, so releasing them all at the same time is impossible.

Similar issues would happen if you update 3rd party library, and software using the new Vs. old versions of the library are incompatible.

So the value of this monorepo isn't that you could cut a release for all of the components at once. It is that everyone doing development has a shared view of the current state of the system.

dataangel · on March 8, 2022

We deal with that too, but only for network/file format compatibility. We don't ship libraries used by an external group.

rfoo · on Feb 23, 2022

Yeah, that's what I meant by "morphing dependency management problem to a source control problem". With a monorepo dependency management is way easier! But, sometimes:

- git on large repo 100% pain 0% fun. hg is slightly better, but not much.

- No version means no prebuilt libraries, which translates to "you need a great build cache to keep build time reasonable".

- "passes all the tests everything is good" if only we can run all the tests on such changes.

- People hate coordinating on imported/pinned third-party dependency versions, sometimes you need tools for large-scale automated changes to make progress, but :(

- Similarly, not all places make all their codes accessible to all engineers.

i.e. source control problem is, sometimes, harder.

renke1 · on Feb 23, 2022

When I talk about monorepos in our company, I always try to make the distinction between JS monorepo tooling (say nx, turbo or more low-level pnpm/npm/yarn workspaces) and real (?) monorepo tooling (say Bazel). Whereas the latter has more focus on dealing with a wide variety of source code types and artifacts the former is exclusively dealing with NPM packages (which may include other stuff like Go/Rust sometimes). Does this distinction even make sense?

n42 · on Feb 23, 2022

I don't know what it is, but it feels like the JS tooling community so often pretends that the rest of the world does not exist in their marketing. I find myself having to dig into docs or the GitHub repo before I figure out what language or ecosystem I'm even reading about.

somehow, the authors of this website neglect to even mention Nix. maybe that has something to do with the fact that this is a marketing page for the tool they named Nx (seriously?).

renke1 · on Feb 23, 2022

Yes, I think the JS ecosystem (which I am certainly part of) does sometimes ignore established terminology and solutions from other ecosystems. Although I must say that the JS ecosystem really has amazing tooling in certain areas (say prettier, eslint which I am missing in the Java world for instance).

I was actually about to mention Nix in my post as well. Being a casual NixOS user myself I wonder if there is any kind of monorepo tooling based on Nix? Without ever having used Bazel myself I always thought of it as Nix-like.

tazjin · on Feb 23, 2022

Yes, there is! We (https://tvl.fyi) have been building Nix monorepo tooling for a while. You can see the current state of our repo at cs.tvl.fyi (+ reviews at cl.tvl.fyi and dynamic CI on tvl.fyi/builds).

We use josh[0] to let people clone "just in time" repos with the tooling needed for our setup[1]. We've also started a consultancy (tvl.su) that helps companies move onto this setup, and have customers going for it already.

The reasons we've not been making a lot of noise about this are that we have other large projects(like Tvix[2]) taking up time, and also the integration with customers moving to this setup lets us more confidently figure out what parts we need to smoothen for "non-TVL" use-cases.

As for using Nix in a Bazel-like way, the common experience with Nix is that language-specific build systems are wrapped. This being possible enables projects written in any language to be wrapped in Nix, and integrated in a Nix-based monorepo (something which makes it distinctly more powerful than other solutions).

However, there's nothing in principle preventing Nix from dropping down a layer to the project level itself, and we've implemented (and use) this for Go[3] and Common Lisp[4].

[0]: https://github.com/josh-project/josh

[1]: https://cs.tvl.fyi/depot/-/blob/views/kit/README.md

[2]: https://tvl.fyi/blog/rewriting-nix

[3]: https://code.tvl.fyi/about/nix/buildGo

[4]: https://code.tvl.fyi/about/nix/buildLisp

politelemon · on Feb 23, 2022

Title should reflect what the website has used, "Monorepo explained". It certainly doesn't cover "everything you need to know about monorepos" and glosses over its disadvantages and the things you need to watch out for.

The most important one being that you need to have an org/team structure that is set up to support it. You cannot say that it will make the org more efficient as organizations are not all the same. In order to push monorepos, the decision makers ought to know what those caveats and tradeoffs are, or they're going to be in for a sad time.

Site does do a good job of going over the tooling around it. Now this might be a matter of perception, it seems that the tooling is getting better, though not yet very mature. I see a few instances of "write your own" where the tooling is lacking, which is not a great way to go about things, and once again, makes assumptions about the nature of the orgs.

rix0r · on Feb 23, 2022

Something very important not covered by the article:

Is the tool going to help me detect when I accidentally bypass the declared dependencies?

For example, in a basic monorepo it's very easy to accidentally rely on the file layout on disk (require'ing a dependency not in your package.json but that has been hoisted because it's a dependency of a different package accidentally succeeds, cp'ing files from `../some-other-project` should not be allowed but is possible). All of these invalidate some optimizations that monorepo tools want to make.

At scale with many contributors, it's HARD to teach and remember and apply all these rules, and so the monorepo tool really should help you detect and fix them (basically: fail the build if you mess up).

The article doesn't really make it clear which tools will do that for you. Pretty sure that Bazel does, Nx probably does, and lerna and turborepo don't.

withinboredom · on Feb 23, 2022

In our mono repo at work, we have a few hundred devs in there daily working just fine. Linters check that there are no relative paths allowed (so you can’t rely on directory structure) and no absolute paths either. If you want to load a file in the tree, you must use a “blessed” constant or function to get the base path of your current code or some other code.

TBF, if you have centralized dependencies or your dependency on another module affects your dependencies, you are probably doing it wrong. APIs between parts should be well defined and not require the entire dependency runtime to be loaded to interact with it.

Aeolun · on Feb 23, 2022

pnpm definitely doesn’t do that hoisting (unless you specifically ask it to).

It’s nice to suddenly see 10 missing explicit dependencies simply by virtue of running ‘pnpm install’ instead of ‘npm install’.

echelon · on Feb 23, 2022

I'm building a pretty big service that has four user-facing websites and even more backends (HTTP servers, highly bespoke job queues to run ML workloads, etc.)

This was an absolute nightmare to try managing in separate repos. I've finally settled on two monorepos: a Yarn/TypeScript/React frontend monorepo, and a Rust/Docker backend monorepo.

Does anyone have any advice on these? I sort of stumbled into this pattern on my own and haven't optimized any of it yet.

For Rust, I'm curious if folks have used Bazel for true monorepo build optimization. I don't want to rebuild the world on every push to master.

Likewise for the frontend, is there any way to not trigger Netlify builds for all projects if only one project (or its dependencies) change?

Would super appreciate any advice.

cies · on Feb 23, 2022

If the (web) API surface between your BE and FE is based on a schema (a.k.a. typed API, like with OpenAPIv3 or GraphQL) then I'd put them in a mono repo. This way you can recompile the FE automatically if the schema changed (usually an FE client lib is generated from the API schema). This helps discovering errors at compile time.

If your API is not schema-based, you have no way of knowing something broke without FE/UI testing.

echelon · on Feb 23, 2022

Beautiful! Thank you!

marcyb5st · on Feb 23, 2022

Bazel should be smart enough to build only what changed. Is it possible that your CI doesn't cache previous runs? With Bazel I successfully used Google cloud build to achieve that by storing the bazel-* folders to Google Cloud Storage as last step of every build and downloading them as first step.

The target bucket I use has a very short object lifecycle setting so I don't even have to clean up old artifacts manually.

echelon · on Feb 23, 2022

Fantastic info! Thank you!

I'm using Github to run builds. I'll have to investigate your setup, because that sounds perfect. I don't know if Github can do that.

What do you do if you need an artifact that gets garbage collected? Manually force a rebuild of that SHA? Have things on continuous deploy and update regularly? I may need better CI/CD practices.

marcyb5st · on Feb 25, 2022

To be honest I never optimized my setup to a single artifact level. The way I set up this was that in a Google Cloud Storage bucket I have a subfolder for each build whose name is monotonically increasing (ie by including time in the folder name like 20220225_23_49_50/bazel-* folders). That way I can copy the latest build to the cloud build VM and still retain history. The object lifecycle settings I use keep artifacts around for 1 month and I never had the need to find something outside that time window.

There could be smarter ways to do so tbh, like having time&date_<Sha of commit>, but I didn't have the need for any of that yet.

About GitHub I am not sure, bit cloud build can be triggered by GitHub commits: https://cloud.google.com/build/docs/automating-builds/build-... . I hope it helps!

mvkel · on Feb 23, 2022

The way this is written sets off my bs detectors that were built from the eras of "serverless solves everything" and "mvc solves everything."

The only thing I'm convinced of these days is: whatever way you choose is the right way.

wdb · on Feb 23, 2022

I wish they explained how to merge existing repos into one new (mono)repo while keeping git history. Still haven’t cracked that problem

duijf · on Feb 23, 2022

Here's a way you can do this with git. This trick relies on `git merge --allow-unrelated-histories`.

Assuming you have repos `foo` and `bar` and want to move them to the new repo `mono`.

    $ ls
    foo
    bar
    
    # Prepare for import: we want to move all files into a new subdir `foo` so
    # we don't get conflicts later. This uses Zsh's extended globs. See
    # https://stackoverflow.com/questions/670460/move-all-files-except-one for
    # bash syntax.
    $ cd foo
    $ setopt extended_glob
    $ mkdir foo
    $ mv ^foo foo
    $ git add .
    $ git commit -m "Prepare foo for import"
    
    # Follow those "move to subdir" steps for `bar` as well.
    
    # Now make the final monorepo
    $ cd ..
    $ mkdir mono
    $ cd mono
    $ git init
    $ touch README.md
    $ git add README.md
    $ git commit -m "Initial commit in mono"
    
    $ git remote add foo ../foo
    $ git fetch foo
    $ git remote add bar ../bar
    $ git fetch bar
    
    # Substitute `main` for `master` or whatever branch you want to import.
    $ git merge --allow-unrelated-histories foo/main
    $ git merge --allow-unrelated-histories bar/main

    # Inspect the final history:
    $ git log --oneline --graph
    *   8aa67e5 (HEAD -> main) Import bar
    |\
    | * eec0abd (bar/main) Prepare bar for import
    | * 9741d6d More stuff in bar
    | * 634ba3d Initial commit bar
    *   43be6e9 Import foo
    |\
    | * d4805a0 (foo/main) Prepare foo for import
    | * 4d2ca10 More stuff in foo
    | * 72072a1 Initial commit foo
    * bfcb339 Initial commit in mono

oftenwrong · on Feb 23, 2022

For the "move to subdir" step, I recommend using git-filter-repo, which should be preferred over git-filter-branch (older code snippets often use it).

Use git-filter-repo's --to-subdirectory-filter and --tag-rename:

https://github.com/newren/git-filter-repo#solving-this-with-...

wdb · on Feb 23, 2022

Do you think this will speed up things? I tried the above suggestion and it's already for four hours to merge two repo's into one (3 years worth of git history)

oftenwrong · on Feb 24, 2022

I am not sure; I don't know much about performance of git operations. Which step is slow? Have you figured out why? I am curious.

wdb · on Feb 23, 2022

Thanks, I will give this a shot

tazjin · on Feb 23, 2022

There are several ways to do this. Having extensively experimented with all of them I can say that the best are josh[0] (if you need external history continuity) and git subtree[1] (if you just need the commits to remain valid within your repository).

[0]: https://github.com/josh-project/josh

[1]: https://manpages.debian.org/testing/git-man/git-subtree.1.en...

wdb · on Feb 23, 2022

Thank you, Josh looks interesting. I will need to look into this. At first read it looks like the end result is not a brand new Git repo that combines/merges a bunch of repos. I am not sure if a proxy is going to work well with Gitlab CI

tazjin · on Feb 23, 2022

If you can define exactly what you mean by "keeping history" (i.e. which operations do you want to support, and in what context?) I might be able to tell you how to do it :)

sixstringtheory · on Feb 23, 2022

Check out https://github.com/newren/git-filter-repo/

contravariant · on Feb 23, 2022

I'm curious about that as well. Maybe it'd be possible to start a repo with a single empty commit, rebase everything on that in a separate branche for each of the git repos and then merge them all into the master branche? Although some file renaming may be in order, otherwise everything ends up in the same folder.

fb_0ne · on Feb 23, 2022

Ive used this approach before (https://mattsch.com/2015/06/19/move-directory-from-one-repos...) with good results.

siscia · on Feb 23, 2022

It is possible and I did it.

In target repo you create a folder and in that folder you rebase your dependency repository.

Maybe I can find better documentation that I remember writing it down somewhere.

onox · on Feb 23, 2022

You can use git pull with the --allow-unrelated-histories option.

coryrc · on Feb 23, 2022

git subtree

wdb · on Feb 23, 2022

I will have to look into this. I always understood that this won't generate a new repo but somehow combine the other repos. The idea to merge the existing repos into a monorepo and then archive the old repos. I don't think that's possible when using subtree's

coryrc · on Feb 24, 2022

Subtree merges a whole repo into the subdirectory of another repo. You can git blame yourself back to the original repo. Unlike submodules, there's nothing in the file tree which signifies there is something special about this directory (it searches commit messages to get that metadata). From the monorepo POV, archiving is just never doing another pull. Using submodules is a nightmare.

surrTurr · on Feb 23, 2022

Great information. I've been building a monorepo of my own for a system that consists of RESTful and event-driven microservices. They are defined via Open API and Async API respectively. Does anyone know a tool to generate documentations for each type of service and put them together into one cohesive documentation?

deliriousferret · on Feb 23, 2022

I don't know about AsyncAPI but with https://readme.com/ you can upload an OpenAPI file and it generates a nice documentation

surrTurr · on Feb 23, 2022

readme.com is great, but unfortunately it does not work for Async API.

revskill · on Feb 23, 2022

When you use monorepo to solve one problem, you have two problems to solve.

cies · on Feb 23, 2022

Without reasoning your statement is baseless.

revskill · on Feb 23, 2022

It's enough i think once u tried to get monorepos done right for your project. It's a problem itself.

cies · on Feb 23, 2022

"it's bad because your will find out it's bad once you try. it's bad."

Do you more often have trouble articulating yourself in written text? I think you can most likely do better! Please try.

I find mono repos much better in all cases the code inside the individual repos is even the slightest dependent on eachouther.

tazjin · on Feb 23, 2022

And setting up polyrepos isn't?

adrianomartins · on Feb 23, 2022

This is website is amazing. It's doing a very needed job in the internet. I feel it should also be the trunkbaseddevelopment.tools and pairprogramming.tools :)

PufPufPuf · on Feb 23, 2022

https://trunkbaseddevelopment.com/

https://www.pairprogramwith.me/