Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Git submodules aren't bad in that they're buggy, they do what the documentation suggests.

I think they're difficult to use, because it breaks my mental model of how I expect a repository to work. It creates a new level of abstraction for figuring out how everything is related, and what commands you need to be able to keep things in sync (as opposed to just a normal pull/branch/push flow). It creates a whole new layer to the way your VCS works the consumer needs to understand.

The two alternatives are

1. Have a bunch of repositories with an understanding of what an expected file structure is, ala ./projects/repo_1, ./projects/repo_2. You have a master repo with a readme instructing people on how to set it up. In theory, there's a disadvantage here in that it puts more work on the end user to manually set that up, but the advantage is there's a simpler understanding of how everything works together.

2. A mono repo. If you absolutely want all of the files to be linked together in a large repo, why not just put them in the same repo, rather than forking everything out across many repos. You lose a little flexibility in being able to mix and match branches, but nothing a good cherry-pick when needed can't fix.

Either of these strategies solve the same problem sub-modules are usually used to solve, without creating a more burdensome mental model, in my opinion. So the question becomes why use them and add more to understand, if there are simpler patterns to use instead.



You completely missed the problem that submodules are actually supposed to solve though. Using them for either of those cases would almost definitely be the wrong choice.

What they're really for, is vendoring someone else's code into yours. They're still not great even at that, but sometimes they're the best option.


Interesting. When you say that's the problem they're intended to solve, do you have a link to that as their intended use case.

IE is that "a" usecase or "the" usecase? I've never seen submodules used for that, only for internal dep management, so if there's content about "what they're really for," I'd love to read more.


When I worked in games we did exactly this but with perforce. All of our libraries were in a vendor tree and all in source. We slapped our own build over the top of them and checked in the build artifacts (perforce). If we needed an update, we updated their code and maybe our build script.

It'd be nice to use submodules for this but I gave up years ago.

The other big use is where you have your own libraries and you'd like to be able to share them across projects. My friend does game jams and has his own simple engine, he versions the engine and adds abilities and uses it across game projects.


check the first paragraph on this page: https://git-scm.com/book/en/v2/Git-Tools-Submodules


This î. Also subtree is an interesting relevant tool too.


> vendoring someone else's code into yours

Vendoring usually implies some kind of snapshot-copying of third-party code. A repo you depend on by value. That's actually solved by subtrees. If you buy that metaphor, then submodules, in contrast, express a dependency by reference.

tl;dr anyone vendoring with submodules is prolly doing it wrong


"Have a bunch of repositories with an understanding of what an expected file structure is, ala ./projects/repo_1, ./projects/repo_2. You have a master repo with a readme instructing people on how to set it up. In theory, there's a disadvantage here in that it puts more work on the end user to manually set that up, but the advantage is there's a simpler understanding of how everything works together."

This is what I do. I have something like 17 code repos organized this way, plus lots of testing repos, plus an extra "hub" repo. (Credit to a friend for calling this repo "hub": short, to the point, requires no explanation.) The hub repo is a bunch of scripts and makefiles that configure everything and even clone the rest of the repos for me. It also has special grep and find scrips that will run on all of the repos as their target. The hub repo just needs one env var to tell it where the root of all the repos is. Note that in the file system the hub repo is under the root and a sibling of the code repos, not their parent in the file system.

Each code or test repo has an "externs" subdir populated only with softlinks to the other repos on which it depends. The scripts configure this by default, but it is also straightforward to configure by hand if you want to do something non-typical. For example, if you want to have multiple versions of a repo checked out on, say different branches/commits, you can do that and name each directory with a suffix of the branch/commit. Then the client repos can just point at the one they want. You can have all kinds of different configurations set up at any time. Doing this makes it straightforward to know what you depend on just by looking at the softlinks. There is no confusion at any time.

There are ways of configuring the system that do not even need all of the repos, so this is ideal. Using the hub repo makefile I can clone the whole system with one make target (after cloning hub), I can build the whole system with one target, I can test the whole system with one target. It is a testament to how well it works that I don't even know exactly how many repos I have. In short, it works great.


There's actually a third alternative, called Git X-Modules (https://gitmodules.com). It's a tool to release the PIA submodules are causing, as described in may comment above :-) In short, it takes all synchronization to the server's side. So you can combine repositories together in any way you like, and still work with a multi-module repository as if it was a regular one - no special commands, specific course of actions, etc.


Maybe it's more helpful to think of submodules as a convention in .git to manage commit ids of external repos for code your main repo code depends on, with some assumptions (ie. own subdirectory) and porcelain that might or might not match your workflow with respect to how that external code is integrated. It can get tedious if having to deal with submodules of submodules etc., but so would other ways to track ids of transitive deps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: