In my hobby project, I started always passing an allocator argument to every function or object which requires allocation (inspired by Zig) and I love it so far. Often I can just pass a bump pointer allocator or a stack-based allocator and do not care about deallocation of individual objects. I also wrote a simple unit testing framework to test out-of-memory conditions because it's easy to do when you're in control of allocators. Basically I inject an allocator which calculates how many allocations are done when a unit test is run, and then unit tests are later rerun again by injecting OOM at every known allocation point. A lot of bugs and crashes happen when an OOM is encountered because such paths are rarely tested. The idea of my pet project is a very resilient HTTP server with request-scoped allocations and recovery from OOM without crashes.
Adding that when many linux distributions face OOM, a killer daemon steps in and might kill your service even if you were handling the situation properly.
Interestingly (confusingly), Linux's OOM killer is invoked for a different notion of OOM than a null return from malloc / bad_alloc exception. On a 64-bit machine, the latter will pretty much only ever happen if you set a vsize ulimit or you pass an absurd size into malloc. The OOM killer is the only response when you actually run out of memory.
If you want to avoid your program triggering the OOM killer all on its own, you need to set up a vsize such that you'll get an application level error before actually exhausting memory. Even that isn't completely foolproof (obviously anyone with a shell can allocate a large amount of RAM), but in practice -- if your program is the only significant thing on the system -- you can get it to be very reliable this way.
Add in some cgroup settings and you should be able to keep your program from being OOM killed at all, though that step is a bit more complex.
I wonder if it is possible to avoid OOM by making sure that all allocations are done from a named (on disk, not shm) memory file. This way in principle is always possible to swap to disk and never overcommit.
I guess in practice the kernel might be in such dire straits that it is not able to even swap to disk and might need to kill indiscriminately.
That would have to be all allocations in all processes (and the kernel and drivers)
In extreme circumstances, the OOM killer can decide to kill your process even if it barely uses any memory (a simple way to get there is by fork-bombing copies of such processes)
You would also need to prevent overcommit of disk; you'd typically mmap to a sparse file, and then you've got the same problem of overcommit on disk as you did in memory.
If you're going to do drastic things, you can configure Linux's memory overcommit behavior, although strictly avoiding overcommit usually results in trouble from software not written with that in mind.
The idea is that the server must have a known allocation budget, similar to Java's max heap size. There's a tree of allocators, i.e. a temporarily created arena allocator needs initial memory for its arena, so it can grab it from the root allocator. And the root allocator ultimately must be fixed-size and deterministic. Sure if there are other processes in the system allocating without concern for other apps, then the OOM killer can kill the server. But if there's no such process, I think it should be pretty stable.
Oh wow that is a really interesting test solution. That would be an interesting thing to add to all zig tests (I know they already have the testing allocator and good valgrind support but I don't think that tests/simulates oom).
I love things like these that use existing tests and expand the to just test further thing in already covered flows. We have done similar things at my work where we test expansion of data models against old models to check that we cover upgrade scenarios.
This is a clear sign of a badly designed language. You should never see a fixed-size (less than page size) allocation fail, simply because there's nothing you can reasonably do if it does fail. Either you should crash or it should block until it is possible again.
(Where crash means a worker process or something limited to something less than the entire system. See Erlang for the logical extension of this.)
I realize this implies Windows and Java are badly designed and my answer to that is "yes".
So if my browser tries to allocate memory for a tab and the allocation fails it should just crash or block instead of handling the failure gracefully by not creating a new tab and telling me the system is running low on memory?
As long as the syscalls don't themselves fail, it's no problem until you run out of address space. At which point you should crash because you probably need to allocate to handle any reasonable errors.
I've been using this helper: https://github.com/judofyr/zini/blob/ea91f645b7dc061adcedc91.... It starts by making the first allocation fail, then the second, then the third, and so on. As long as it returns OutOfMemory (without leaking memory) then everything is fine.