I think SQLite is fantastic and Richard is obviously a genius. But I always found his obsession with single binary monoliths odd.
As you mentioned it goes against the Unix philosophy of do one thing and do it well. To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
I have no idea why Richard focuses on such things but:
I'm old enough to remember when developers spent time making sure they could plow all of their build assets into a single binary distributable. It often had kind of a zest to it and when you dealt with software had directories full of stuff it looked both "corporate" and "sloppy".
I've never quite gotten over the feeling that the piles of dynamically linked libraries hasn't helped things. I know objectively that there's a notion that you can update libraries instead of applications, but it feels like it makes updating applications fragile and you inevitably end up in some kind of dependency hell. "But breaking insecure applications is a feature!" I mean, okay, I guess. But I still need something to work and don't want to spend all day fixing broken layers of dependent stuff. If I have to do that, I may as well just update a single binary.
Go seems to have come back around to this style of thinking, and in a sense container images and JAR files are often trying to replicate what it's like to just download a binary, chmod +x it, and execute it.
Directories full of stuff is similar to websites with URL paths like "/site.php?page_id=18231238". Or even better when subdomains get involved and it looks like "secure3.action.domain.com/admin.php?page=123424". It technically works but is a bit ugly.
Also another web analogy might be dynamic linking being similar to microservices. People want to build and ship smaller components that can be swapped out independently. It works but does seem to make updating and testing fragile. You can test heavily at the boundaries but there's still kind of an "air gap" between the main app and the lib/microservice. If you want to be really sure there's no breakage, you have to test the whole thing, at which point you might as well just ship a monolith.
>Directories full of stuff is similar to websites with URL paths like "/site.php?page_id=18231238". Or even better when subdomains get involved and it looks like "secure3.action.domain.com/admin.php?page=123424". It technically works but is a bit ugly.
OOC, why does this stand out for you? Just to explain my curiosity, I've worked on Mac since I was a kid starting with System 6 and then going to OS X when it came out, so Apple's "your program is all in that file" just kind of made sense to me and it was really convenient to just drag a file to the trash and the app is _mostly_ gone, minus a few .plist and other config files in ~/Library.
But I _like_ the old forums and sites that still show the stuff like page_id=N; for the boards and forums I go to, it's very useful to just jump around long topics or you can play with it on your shitposting.
Plus most modern browsers truncate or hide the full URL anyways; I dislike this feature personally, but at least Safari's concise tabs are a good balance for someone like me.
Fair enough, for message boards it's fine. I think I was mostly just thinking about old/sloppy WordPress sites where you might click on "about us" and it takes you to ?page_id=1234. Feels like a lack of attention to detail compared to /about-us. Similarly, a binary surrounded by a bunch of folders and dlls feels like a lack of attention to detail (and thus kind of "corporate" as the previous poster mentioned).
Dynamic linking is the bane of backwards compatibility.
Now everything is containers, appimages, flatpacks, docker images and so on, and all they do is pack all the libraries a binary may need, in a more wasteful and inefficient format than static linking.
In that sense, we truly have the worst of both worlds.
The situation on windows is fascinating: everyone links these libraries dynamically, and yet, there are about two hundred of them on my system, every application using its own uniquely outdated version.
In my practical experience the set of things that can go wrong if you link apps dynamically is much larger than the problems that arise when they are statically linked.
For one, it is more complicate to keep track of which of the many shared libraries on a typical system are used by which application. It is common that the same library occurs multiple times in different versions, built by different people/organizations and residing in different directories.
Quick, without looking: which TLS library do your network exposed subsystems use, which directories are they in and where did you install them from. When you do go to look: did you find what you expected?
Have a look at all the other shared libraries on your system. Do you know which binaries use them? Do you know which versions of which libraries work with which binaries? Do you trust the information your package manager has about version requirements? Does it even have that information?
Then there's the problem of what happens when you upgrade. The servers you run might have a rigorous battery of tests. But now you are running them with libraries they were not tested against. Sure, most of the time it'll work. But you don't know that. And you have no way of knowing that without downloading, building and running the tests. Or have someone else do that.
I've been in the situation where someone inadvertently updated a library in production and everything came crashing down. Not only did it take down the site, but it took a while to figure out what happened. Both because the person who did it wasn't aware of what they'd done. And the problem didn't manifest itself in a way that made the root cause obvious.
The clearest risk with statically linked binaries is if they are not updated when there is, for instance a security problem. But in practice I find that easier to deal with since I know what I'm running, and for anything important, I'm usually aware of what version it is or when I last checked for updates/problems.
> For one, it is more complicate to keep track of which of the many shared libraries on a typical system are used by which application. It is common that the same library occurs multiple times in different versions, built by different people/organizations and residing in different directories.
That's not common at all, man. I strongly recommend you don't do that.
> Quick, without looking: which TLS library do your network exposed subsystems use, which directories are they in and where did you install them from.
Openssl 3.x.y. It's /usr/lib64/openssl.so or similar. They are installed from my distro's repository.
> When you do go to look: did you find what you expected?
Yes. Openssl 3.1.1-r2. The OpenSSL binaries are actually named /usr/lib64/libssl.so and /usr/lib64/libcrypto.so. Upstream version is 3.1.2. There have been two low priority CVEs since 3.1.1 (never change openssl...) and my distro has backported the fixes for both of them into 3.1.1-r2.
> Do you know which versions of which libraries work with which binaries?
What do you mean "which versions of which libraries"? There's only one version of each library. If the package manager needs to keep an old version of a library around, it gives a loud warning about it so I can either fix the problem or ignore it at my own peril.
Those two .so files (libssl.so and libcrypto.so) as used by postfix, dovecot, and nginx. They are also linked by opendkim, spamassassin and cyrus-sasl, but those don't have open ports on the internet, so they don't really count. OpenSSH can optionally link to openssl; as it happens, my openssh does not link against a crypto library, openssl or otherwise. It just uses openssh's built in crypto schemes.
> Do you trust the information your package manager has about version requirements?
Yes.
> Does it even have that information?
... wat..? Of course it does?
> I've been in the situation where someone inadvertently updated a library in production and everything came crashing down. Not only did it take down the site, but it took a while to figure out what happened. Both because the person who did it wasn't aware of what they'd done. And the problem didn't manifest itself in a way that made the root cause obvious.
I've been in the situation where a security guard at my last job inadvertently discharged his service revolver into a Windows machine, and it crashed. That doesn't mean I stopped using Windows. (I mean, I did stop using Windows...)
That's genuinely just not a problem that I've had. Not since 2004 and all the C++ programs on my computer broke because I force upgraded from GCC-3.3 to GCC-3.4 and the ABI changed. Or that time in 2009 where I installed a 0.x version of Pulseaudio on my gaming machine. Or that time I replaced OpenSSL with LibreSSL on my personal computer. If your server takes a shit because somebody was fucking around doing stupid shit on prod, and you do root cause analysis and come up with a reason that it broke other than, "employee was fucking around and doing stupid shit on prod" and the recommendation is something other than "don't fuck around and do stupid shit on prod" I don't know what to tell you. Dynamic linking isn't going to stop a sufficiently determined idiot from bringing down your server. Neither will static linking.
> What do you mean "which versions of which libraries"?
If you upgrade a shared library to fix a problem, how do you know that the application has been tested against the fixed version?
And no, your package manager won't know.
Congratulations on a) not having multiple installs of shared libraries on your system and b) for knowing which version you have. Knowing this isn't very common.
> If you upgrade a shared library to fix a problem, how do you know that the application has been tested against the fixed version?
Distro's like Debian solve that problem by not upgrading. The only things deemed worthy of "fixing" are security issues, and they are fixed by backporting the fix (only) to the existing shared library. Thus no API's (of any sort - even unofficial ones like screen scraping) are upgraded or changed, so no testing is necessary.
And thus:
> And no, your package manager won't know.
It doesn't have to know, because the package manager can assume all releases for Debian stable are backward compatible with all the packages in that release.
A lot of the noise you see on HN comes from people using distro's on their desktops. To them a distro is a collection of pre-packaged software with all the latest shinies, which they upgrade regularly. But Linux's desktop usage is 3%, whereas it server usage is claimed to be over 95% (which eclipses Windows Desktop share). Consequently distros are largely shaped not by the noisy desktop users, but by the requirements of sysadmin's. They need a platform that is guaranteed both stable and secure for years. To keep it stable, they must solve the problem you describe, and for the most part they have.
If you're linking to OpenSSL, it's scary to have that upgraded from under you. Maybe it got better in the 3 series, but I seem to recall pretty much all the important 1.0.1? releases would be something you'd need to mitigate a bit vulnerability, but would also have api changes that would break your application if you were trying to do newish things. Sometimes justified, but still a pita.
Somehow this makes me think of games Back In The Day where you could simply replace your crosshair by editing a bitmap file, versus now where everything's so much more locked-down behind proprietary container formats and baked-in checksums, etc.
Monolithic builds are great if you have no control over the deployed environment (IE desktop apps sans OS supplied libs). They’re worse if you do control the environment and how the upgrade paths get followed
Doesn't it seem that more and more people are just given access to some managed environment they have little control over anyway?
I feel like sometimes the dependency on dynamically linked stuff is akin to "well transistor radios are great if you don't care about soldering on fresh vacuum tubes like a real radio person would."
A dynamically linked library need only have one image of itself in memory.
If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?
That's probably the biggest benefit. But it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up, but can link to an already in-memory image of its library(s).
The only real downside is exporting your executable into another environment where the various dynamic library versions might cause a problem. For that we have Docker these days. Just ship the entire package.
> If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?
> it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up
I'm not sure about Windows and Mac, but Linux uses "demand paging" and only loads the used pages of the executable as needed. It doesn't load the entire executable on startup
You'd love NixOS. Gives you the flexibility of dynamic libraries with the isolation and full dependency bundling per app and less janky than snap or flatpak.
Slight tangent: it bugs me when people say "it goes against the Unix philosophy" as though The Unix Philosophy were some kind of religious text. Not everything should be a pluggable Unix executable, and Fossil making non-Unixy choices doesn't reflect poorly on it. They just chose a different philosophy.
I’m so thankful that Rust is helping popularize the solo exe that “just works”.
I don’t care if a program uses DLLs or not. But my rule is “ship your fucking dependencies”. Python is the worst offender at making it god damned impossible to build and run a fucking program. I swear Docker and friends only exist because merely executing a modern program is so complicated and fragile it requires a full system image.
> I’m so thankful that Rust is helping popularize the solo exe that “just works”.
Wasn't it Go that did that? I mean, not only was Go doing that before Rust, but even currently there's maybe 100 Go-employed developers churning out code for every 1 Rust-employed developer.
Either way “Rust is helping” is true. And given that Go is a managed language it never really factored into the shared library debate to begin with, whereas Rust forces the issue.
Maybe, but it's misleading. Using the assertion that "$FOO made $BAR popular" when $FOO contributed 1% of that effort and $BAZ contributed the other 99% is enough to make most people consider the original statement inaccurate.
> And given that Go is a managed language it never really factored into the shared library debate to begin with, whereas Rust forces the issue.
How so? Rust allows both shared and static compilation, so it's actually the opposite - Rust specifically doesn't force the use of single-binaries.
I'm struggling to interpret what it is you are saying: Go specifically forces the use of static linkage, whereas in Rust it's optional, is it not?
I am under the belief that in Rust you can opt-out of static linkage, while I know that in Go you cannot.
Are you saying that Rust doesn't allow opt-out of static linkage?
> Using the assertion that "$FOO made $BAR popular"
Thankfully that’s not what I said! This sub-thread is very silly.
FWIW Rust is exceptionally bad at dynamic/shared libraries. There’s a kajillion Rust CLI tools and approximately all of them are single file executables. It’s great.
I have lots of experience with Rust, the Rust community, and a smorgasbords of “rewrite it in Rust” tools. I personally have zero experience with Go, it’s community, and afaik Go tools. I’m sure I’ve used something written in Go without realizing it. YMMV.
Ehhh. You can compile a single exe with C or C++. I’ve personally come across far more Rust tools than Go. But I don’t really touch anything web related. YMMV.
The choice is actually between dealing with complexity and shifting responsibility for that to someone else. The tools themselves (e.g. virtual environments) can be used for both. Either people responsible for packaging (authors, distribution maintainers, etc.) have some vague or precise understanding of how their code is used, on which systems, what are its dependencies (not mere names and versions, but functional blocks and their relative importance), when they might not be available, and which releases break which compatibility options, or they say “it builds for me with default settings, everything else is not my problem”.
> Either people responsible for packaging have some vague or precise understanding of how their code is used, on which systems, what are its dependencies
But with python it’s a total mess. I’ve been using automatic1111 lately to generate stable diffusion images. The tool maintains multiple multi-hundred line script files for each OS which try to guess the correct version of all the dependencies to download and install. What a mess! And why is the job of figuring out the right version of pytorch the job of an end user program? I don’t know if PyTorch is uniquely bad at this, but all this work is the job of a package manager with well designed packages.
It should be as easy as “cargo run” to run the program, no matter how many or how few dependencies there are. No matter what operating system I’m using. Even npm does a better job of this than python.
A lot of problem with Python packages is the fact that a lot of Python programs is not just Python. You have a significant amount of C++, Cython, and binaries (like Intel MKL) when it comes to scientific Python and machine learning. All of these tools have different build processes than pip so if you want to ship with them you end up bring the whole barn with you. A lot of these problems was fixed with python wheels, where they pack the binary in the package.
Personally, I haven't ran into a problem with Python packaging recently. I was running https://github.com/zyddnys/manga-image-translator (very cool project btw) and I didn't ran into any issues getting it to work locally on a Windows machine with Nvidia GPU.
Then the author of that script is the one who deals with said complexity in that specific manner, either because of upstream inability to provide releases for every combination of operating system and hardware, or because some people are strictly focused on hard problems in their part of implementation, or something else.
A package manager with “well designed” packages still can't define what they do, invent program logic and behavior. Someone has to choose just the same, and can make good or bad decisions. For example, nothing prohibits a calculator application that depends on a full compile and build system for certain language (in run-time), or on Electron framework. In fact, it's totally possible to have such example programs. However, we can't automatically deduce whether packaging that for a different system is going to be problematic, and which are better alternatives.
> A package manager with “well designed” packages still can't define what they do, invent program logic and behavior.
The solution to this is easy and widespread. Just ship scripts with the package which allow it to compile and configure itself for the host system. Apt, npm, homebrew and cargo all allow packages to do this when necessary.
A well designed PyTorch package (in a well designed package manager) could contain a stub that, when installed, looks at the host system and select and locally installs the correct version of the PyTorch binary based on its environment and configuration.
This should be the job of the PyTorch package. Not the job of every single downstream consumer of PyTorch to handle independently.
> Just ship scripts with the package which allow it to compile and configure itself for the host system.
Eek. That sounds awful to me. it is exceptionally complex, fragile, and error prone. The easy solution is to SHIP YOUR FUCKING DEPENDENCIES.
I’m a Windows man. Which means I don’t really use an OS level packages manager. What I expect is a zip file that I can extract and double-click an exe. To be clear I’m talking about running a program as an end user.
Compiling and packaging a program is a different and intrinsically more complex story. That said, I 1000% believe that build systems should exclusively use toolchains that are part of the monorepo. Build systems should never use any system installed tools. This is more complex to setup, but quite delightful and reliable once you have it.
I remember having to modify one of those dependency scripts to get it running at all on my laptop.
In the end I had more luck with Easy Diffusion. Not sure why, but it also generated better images with the same models out of the box.
The only way I know to manage python dependencies is Bazel as the build system, and implementing a custom set of rules that download and build all python dependencies. The download is done in a git repo. All magically missing libs must be added to the repo and Bazel. And finally you might have a way to... tar the output into a docker container... sigh
> it goes against the Unix philosophy of do one thing and do it well
For me, Perl shows just how restricted that viewpoint was.
After I learned Perl, I stopped caring about tr, and sed, and many of the other "one thing well" command-line tools. And I've no desire to swap out and modify the 's//' component of perl.
Perl does "one thing" - interpret Perl programs - even though it also replaces many things.
I know 'rmdir' exists. It does one thing, well - remove an empty directory. It's been around since the first release of Unix.
However, I use "rm -rf" because it's easier to use a more powerful tool which handles empty directory removal as a special case.
You can also change your viewpoint and say that Fossil does do one thing well: it's a distributed project control system. That's a new category I just made up, to highlight just how subjective "one thing" is.
I like `rmdir` because I don't have to check if a directory that I think is empty is actually empty with `ls -la` before removing it. This happens a lot with moving stuff out of directories (sometimes to different destinations).
% man cc
...
DESCRIPTION
clang is a C, C++, and Objective-C compiler which encompasses
preprocessing, parsing, optimization, code generation, assembly, and
linking.
% man gcc
...
NAME
gcc - GNU project C and C++ compiler
...
> To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
But where do you draw the line? What's "one thing"? - The size and complexity between cli tools doing one thing varies by orders of magnitude, some of those programs are much larger than this Fossil "monolith". Should a component have more functionality if separation means 10 times slower performance? What if it has hundreds of such features? What if separating those features means a hundredfold increase in complexity for setting up the software as it now has distributed dependencies? Should you have a separate audio player when a video player could already do the job out of necessity? Should a terminal support scrolling if you can already get that via tmux?
The Unix philosophy is bad for judging individual programs.
Unix’s philosophy is more of what you’d call ‘guidelines’, and is not universally applicable — not all problems can be decomposed nicely, and IPC just gives you a badly debuggable hodgepodge of added accidental complexity. It’s good for trivial tools like ls, cat, etc, but something more complex is likely better off as a monolith.
> As you mentioned it goes against the Unix philosophy of do one thing and do it well. To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
Anecdote: when i first met Richard in 2011, after having contributed to Fossil since 2008, i asked him why he chose to implement fossil as a monolithic app instead of as a library. His answer was, "because I wanted it working next week instead of next month." It was a matter of expedience and rewriting it now would be a major undertaking for little benefit.
Reimplementing fossil as a library is a years-long undertaking (literally) and is not something we're interested in doing directly within the Fossil project, but is something i maintain as a semi-third-party effort, along with a handful of other Fossil contributors, over at <https://fossil.wanderinghorse.net/r/libfossil>.
> Imagine editing a spreadsheet like `cat foo.xls | select-cell B3 | replace '=B2+1' > foo.xls`.
It would be even more cumbersome than that. After that command you'd have to restore foo.xls from a backup, and then do the edit again this time remembering that the "> foo.xls" executes before the pipe executes. :-)
I wonder if anyone has written something to make pipes like that work? E.g., write two programs, "replace" and "with" that could be used like this:
replace foo.xls | ... | with foo.xls
What "replace [file]" would do is set a write lock on the file, copy the file to stdout, then release the lock.
What "with [file]" would do is copy stdin to the file, after obtaining a write lock on the file. I think most shells would start the components of a pipe in order so "replace" should be able to set its lock before "with" tries to get a lock, but to be safe "with" could be written to buffer incoming data and only start checking the lock after it has received a significant amount or seen an EOF on stdin. Or "replace" and "with" could coordinate using some out-of-band method.
I think the "Unix philosophy" is best applied to problems that indeed can be de-composed into clear discrete steps. In fact, that's the metric I use when I write command line utilities: does this make sense in a pipe?
There are a lot of things where this isn't very practical. For instance, imagine building a web server that consists of a couple of dozen discrete utilities that are then cobbled together using pipes. Or even implementing the core feature set of Git in this manner. Would it be practical? Would it be better if Git was an enormous shellscript that connected all of these "things" into an application? What does that give you? And what would be the cost?
How would you do SQLite (the CLI application) as a bunch of discrete commands?
The UNIX philosophy of minimal cmdline tools that do one thing right is fine and the Go-style 'monolithic exe' without runtime dependencies except the OS is also fine (and both philosophies actually don't need to collide).
The problem is all the software that depends on tons of dependencies that are brought in dynamically. Just look at Jekyll vs Hugo. I have Jekyll break regularly when something (seemingly) unrelated changes on my machine, but Hugo is just rock solid.
Or another much more annoying example: Linux executables linking with glibc and then not running on systems that don't have that particular version of glibc installed.
> To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
Why is "cleaner" the only thing that matters? Why not "functional/featureful"? It's open source so it can be modified, but I'm not sure why ability to swap matters.
Exceptional things rarely happen without some outlier conviction involved. Most things happening due to outlier convictions are just that, follies that lead nowhere. But when the stars align and there's both great ability involved and a genuine gap to fill that would have remained undiscovered workout the outlier conviction, something like SQLite happens.
As you mentioned it goes against the Unix philosophy of do one thing and do it well. To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.