Object pooling was very common in Java last century. Today, it survives only for the most heavyweight objects: threads, database connections, web browsers, that sort of thing.
What killed it off was generational collection. Compacting generational collectors make allocation very cheap, and they make maintenance of short-lived objects very cheap. As a result, it's reliably cheaper to allocate an object and throw it away than it is to attempt to reuse it.
So it's surprising and interesting to read that the author of this article clearly thinks that there are still benefits to object pooling. Since Alex Petrov is a very sharp guy, i have to take him at his word. But it's a shame he didn't include benchmarks to compare it to idiomatic use of a modern GC.
It's very much alive in the java low latency space where even young GCs are a problem. Allocations of temps/short lived objects also wash your cpu caches, and generally there are other reasons to avoid any GC in this space.
Which is, IMO, a part of the reason why Java isn't great for that. A serious pro can make it go like a race horse, but beginners and even long time users may fall into traps. If a language has less things to be mindful of, it's users may be less mindful with fewer dire consequences.
Of course, this totally depends on who might be touching the codebase...
Any language used for producing high performance software requires more than just beginner knowledge, each one just has its own specific things along with general cross-lang concepts to keep in mind.
Possibly true, in the sense that any GCed language suffers in high-performance or soft realtime apps (e.g. games, audio software). However, other constraints might influence the choice of technology, such as existing experience with the available team members, or vendor-mandated technology choices (e.g. Android apps [ignoring the NDK]), or speed-to-market - and therefore development speed - or, debateably, program correctness, is more important.
You'd be surprised. With an allocator and collector that is aware of real time constraints, GC can actually be a pretty huge advantage for achieving low latency.
GC is essentially never an advantage for low latency, but it is not incompatible with it either. Things like metronome can give you extremely well defined latencies.
It's fairly moot for hard real-time programs though, as those typically completely eschew dynamic allocation (malloc can have unpredictable time too).
> GC is essentially never an advantage for low latency
I can't really agree with that statement. One way to get to lower latency is to avoid using locks and rely on lock free algorithms.
Many of those are much easier to implement if you can rely on a GC, because the GC solves the problem that you can have objects that are still referenced in some thread, but that aren't reachable from the lock-free datastructure anymore. There are ways around this, e.g. using RCU or hazard pointers, but mostly it's easier with a GC.
Do you have an example? I'm not super familiar with lock-free structures, since when I've worked on low-latency things there has been a need to quantify the worst-case timing which rules out most of the lock-free options.
It might make it easier, no? I'm working on a perf-sensitive program now. It's written in C (mainly for performance). It's spending about 25% of CPU time in free/malloc. Yikes.
This happened because it has an event dispatcher where each event has a bunch of associated name/value keypairs. Even though most of the names are fixed ("SourceIP", "SourceProfile", "SessionUuid", etc.) the event system ends up strdup'ing all of them, each time. With GC we could simply ignore this. All the constant string names would just end up in a high gen, and the dynamic stuff would get cleaned in gen0, no additional code. (As-is, I'm looking at a fairly heavy rewrite, affecting thousands of callsites.)
So what's the reason for strdup'ing vs having const names that never get freed? Also, sounds like you could use ints/enum to represent the key and provide string conversion util functions. Anyway, spending 25%in malloc/free is just poor code, but you already know that. This really isn't about GC :).
Gen0 or young GC still involves a safepoint, a few trips to kernel scheduler, trashes the cpu instruction and data caches, possibly causes other promotions to tenured space (with knock on effects later), etc. It's no panacea when those tens/hundreds of millis are important.
'Cause all of the strings aren't const, some are created dynamically. Third parties add on to these event names at runtime, so we don't know ahead of time. An int-string registry would work at runtime, except for the dynamic names.
I was just pointing out that GC can "help", by reducing complexity and enabling a team that otherwise might get mired in details to deliver something OK.
In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.
Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.
GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.
> In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.
This only works if you enter a critical section with sufficient free heap. You could have just malloc()ed that space ahead of time if you weren't using a GC, so I don't see an improvement, just a convenience.
> Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.
Amortizing costs is often the opposite of what you want to do to minimize latency; with hard real-time you care more about the worst-case than the average-case, and amortizing only helps the average-case (often at the expense of the worst-case)
> GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.
Yes, and GC can be made fully compatible with hard real-time systems; any incremental GC can be made fixed-cost with very little effort. It's somewhat moot since most hard real-time systems also want to never run out of heap, and the easiest way to do that is to never heap allocate after initialization, so most hard real-time systems don't use malloc() either.
I made a game for Android phones a few years back that would instantiate, at peak, a few thousand agents that would roam around an artificial environment, devouring renewable resources, before (maybe) reproducing and eventually dying off. So these agent instances were being continually created and destroyed, and the GC lag absolutely killed performance and made the game totally unplayable. It was unplayable even on a Galaxy S2 - a reasonably high-performance phone at the time. Once I implemented object pooling to handle the creation and destruction of these agents, the game ran very smoothly on an S2, and I could even run the smaller simulations on an old HTC Magic I had lying around. It made an enormous difference.
On the CLR, I've found that in tight processing loops (say, 50K msg/sec) even a few tiny (~32 bytes) allocations are measurable. F# didn't lift lambdas (it'd generate a new "function pointer" object each time even though it didn't need to) and just rewriting to force evaluation to a static var was a gain.
Well, there are still cases where object pooling might make sense. Some games on mobile devices still use object pooling. I suppose I could also see the need to use object pooling for a very busy server or a very large hadoop job.
What killed it off was generational collection. Compacting generational collectors make allocation very cheap, and they make maintenance of short-lived objects very cheap. As a result, it's reliably cheaper to allocate an object and throw it away than it is to attempt to reuse it.
So it's surprising and interesting to read that the author of this article clearly thinks that there are still benefits to object pooling. Since Alex Petrov is a very sharp guy, i have to take him at his word. But it's a shame he didn't include benchmarks to compare it to idiomatic use of a modern GC.