Finding memory leaks in Postgres C code

pornel · on March 29, 2024

You don't need to wait until program's exit. Valgrind has a GDB server that can be used to check for leaks during program's runtime:

https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mo...

eatonphil · on March 29, 2024

Thank you! I could have been more explicit that I was aware of this but I was ideally trying to avoid manual code changes and/or manual gdb intervention if possible.

hyperman1 · on March 29, 2024

Valgrind has client requests, which you can use to teach it about custom mempools. This should make it possible to avoid the problems this author was having.

See manual chapter 4.7 and 4.8:

https://valgrind.org/docs/manual/mc-manual.html#mc-manual.cl...

kccqzy · on March 29, 2024

The author's definition of a leak is somewhat unusual. If a memory is allocated and eventually freed it's not really a leak in the strict sense. That's why the author is having so much trouble with typical tools like Valgrind or leak sanitizer: the definition of the leak is different!

I would approach this problem by using regular profiling. Collect a few memory profiles and see whether there's any suspiciously large chunk of memory not yet freed.

mauvehaus · on March 29, 2024

Claiming that memory that's freed eventually isn't actually leaked may be factually true, but it isn't usefully true in all contexts.

A more vague but more useful definition of a memory leak is that if the memory consumption has a net increase over time, and that increase causes one or more problems, it's a leak.

Leaking a few kB or even MB in a short-lived client program isn't necessarily a leak in a practical sense.

Not freeing a few kB in a long-lived server process is a leak in practice if the process is going to crash or suffer degraded performance before they'll be eventually freed.

bruce511 · on March 29, 2024

Firstly, I'd say a leak is a leak. A leak is not defined as "bad" - it's defined as allocated memory not bring freed. Whether the leak is "bad" or not depends on the context, but doesn't change the definition.

>> A more vague but more useful definition of a memory leak is that if the memory consumption has a net increase over time, and that increase causes one or more problems, it's a leak.

No. Its a problem. Caused by a leak. Your definition falls over for the case where a program is performing the task correctly, but on insufficient hardware (not enough ram) to complete the task.

none_to_remain · on March 29, 2024

It's just a leak from an internal allocator

pflanze · on March 30, 2024

After a cursory inspection of the memleak program source and Brendan Gregg's blog post it's still not clear to me what it is tracking to be able to decide that there's a leak here. It would seem to me that palloc causes a mmap call for the initial allocation (or to enlarge the region?); but why would an individual allocation within the region be treated as a leak? I can see how eBPF can allow to track mmap usage for custom allocators where valgrind or ASAN might not (I'm not sure about that), but it being reported as a leak would imply that the whole region is missing a deallocation call? How would it recognize via eBPF tracing that 4 KB within the region remain without a pointer?

PS. My assumption here is that palloc is allocating within a region. Which is how OP describes it (oddly it doesn't take a MemoryContext as an argument so that must be in some global or thread-local var, or palloc is a macro and takes the context from a lexical variable (uh)?).

fdr · on March 29, 2024

This is pretty interesting. I'm not sure if many commenters have abstracted this approach the way I would have, but it is sure is a handy trick to couple virtual memory related system calls (brk, mmap) to stack dumps and aggregation thereof, and it would not have been so easy to do in, say, 2009.

saurik · on March 29, 2024

I was definitely doing this kind of thing back in 2009 but I swear also in 2003 by just wrapping the few standard library memory allocation functions and adding a backtrace loop to store information about the caller. I am not sure where the difficulty is in implementing it?

fdr · on April 10, 2024

This requires no wrapping, you can instrument with perf using the syscall rather than indirecting a library. You might say "same difference," but it is less invasive, and can be enabled or disabled on the fly.

gkfasdfasdf · on March 29, 2024

jemalloc as well has some handy leak / memory profiling abilities: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-P...

nwellnhof · on March 29, 2024

With LSan, you can use __lsan_do_recoverable_leak_check to get leak reports during run time.

gkfasdfasdf · on March 29, 2024

That wouldn't help find this particular resource leakage since the memory is still referenced somewhere by the app - so LSan wouldn't regard the memory as leaked.

zare_st · on March 29, 2024

Valgrind will show these as "still reachable"

eatonphil · on March 29, 2024

Thank you! I wasn't aware of this.

archy_ · on March 29, 2024

Would using Rust have prevented this?

steveklabnik · on March 29, 2024

Rust does not claim to prevent memory leaks.

wongarsu · on March 29, 2024

It is true that Rust considers memory leaks safe. But Rust's type system still helps compared to C. The two toy examples in the article wouldn't have happened in idiomatic Rust. Though the same can be said about C++, or any other language preferring RAII over explicit free calls.