In my hobby project, I started always passing an allocator argument to every fun...

habibur · on Sept 27, 2023

Adding that when many linux distributions face OOM, a killer daemon steps in and might kill your service even if you were handling the situation properly.

amalcon · on Sept 27, 2023

Interestingly (confusingly), Linux's OOM killer is invoked for a different notion of OOM than a null return from malloc / bad_alloc exception. On a 64-bit machine, the latter will pretty much only ever happen if you set a vsize ulimit or you pass an absurd size into malloc. The OOM killer is the only response when you actually run out of memory.

If you want to avoid your program triggering the OOM killer all on its own, you need to set up a vsize such that you'll get an application level error before actually exhausting memory. Even that isn't completely foolproof (obviously anyone with a shell can allocate a large amount of RAM), but in practice -- if your program is the only significant thing on the system -- you can get it to be very reliable this way.

Add in some cgroup settings and you should be able to keep your program from being OOM killed at all, though that step is a bit more complex.

gpderetta · on Sept 27, 2023

I wonder if it is possible to avoid OOM by making sure that all allocations are done from a named (on disk, not shm) memory file. This way in principle is always possible to swap to disk and never overcommit.

I guess in practice the kernel might be in such dire straits that it is not able to even swap to disk and might need to kill indiscriminately.

Someone · on Sept 27, 2023

That would have to be all allocations in all processes (and the kernel and drivers)

In extreme circumstances, the OOM killer can decide to kill your process even if it barely uses any memory (a simple way to get there is by fork-bombing copies of such processes)

Also, using oom_score_adj (https://www.baeldung.com/linux/memory-overcommitment-oom-kil...) is a lot easier.

toast0 · on Sept 27, 2023

You would also need to prevent overcommit of disk; you'd typically mmap to a sparse file, and then you've got the same problem of overcommit on disk as you did in memory.

If you're going to do drastic things, you can configure Linux's memory overcommit behavior, although strictly avoiding overcommit usually results in trouble from software not written with that in mind.

kgeist · on Sept 27, 2023

The idea is that the server must have a known allocation budget, similar to Java's max heap size. There's a tree of allocators, i.e. a temporarily created arena allocator needs initial memory for its arena, so it can grab it from the root allocator. And the root allocator ultimately must be fixed-size and deterministic. Sure if there are other processes in the system allocating without concern for other apps, then the OOM killer can kill the server. But if there's no such process, I think it should be pretty stable.

AndyKelley · on Sept 27, 2023

You can disable the OOM killer on your server OS:

https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

olodus · on Sept 27, 2023

Oh wow that is a really interesting test solution. That would be an interesting thing to add to all zig tests (I know they already have the testing allocator and good valgrind support but I don't think that tests/simulates oom).

I love things like these that use existing tests and expand the to just test further thing in already covered flows. We have done similar things at my work where we test expansion of data models against old models to check that we cover upgrade scenarios.

squeek502 · on Sept 27, 2023

There's support for exactly this type of OOM testing in Zig via std.testing.checkAllAllocationFailures:

- https://github.com/ziglang/zig/blob/1606717b5fed83ee64ba1a91...

- https://www.ryanliptak.com/blog/zig-intro-to-check-all-alloc...

astrange · on Sept 27, 2023

This is a clear sign of a badly designed language. You should never see a fixed-size (less than page size) allocation fail, simply because there's nothing you can reasonably do if it does fail. Either you should crash or it should block until it is possible again.

(Where crash means a worker process or something limited to something less than the entire system. See Erlang for the logical extension of this.)

I realize this implies Windows and Java are badly designed and my answer to that is "yes".

matheusmoreira · on Sept 28, 2023

So if my browser tries to allocate memory for a tab and the allocation fails it should just crash or block instead of handling the failure gracefully by not creating a new tab and telling me the system is running low on memory?

astrange · on Sept 28, 2023

Tabs are separate processes in every modern browser.

Actually since you skipped all the less than a page stuff I'm not sure you actually understand what my post is about…?

gnubison · on Sept 28, 2023

Is Linux designed in such a way that less-than-page size allocations never fail? How would that work?

astrange · on Sept 28, 2023

As long as the syscalls don't themselves fail, it's no problem until you run out of address space. At which point you should crash because you probably need to allocate to handle any reasonable errors.

cassepipe · on Oct 1, 2023

So is there no point to checking malloc return ?

olodus · on Sept 27, 2023

Oh cool, didn't know that. Thanks.

judofyr · on Sept 27, 2023

I've been using this helper: https://github.com/judofyr/zini/blob/ea91f645b7dc061adcedc91.... It starts by making the first allocation fail, then the second, then the third, and so on. As long as it returns OutOfMemory (without leaking memory) then everything is fine.