Object pooling was very common in Java last century. Today, it survives only for...

vitalyd · on July 29, 2015

It's very much alive in the java low latency space where even young GCs are a problem. Allocations of temps/short lived objects also wash your cpu caches, and generally there are other reasons to avoid any GC in this space.

ketralnis · on July 29, 2015

> java low latency space where even young GCs are a problem ... there are other reasons to avoid any GC in this space

I'm no Java hater, but that sounds like the wrong place to use Java in the first place

vitalyd · on July 29, 2015

It has its pros/cons but you'd be surprised (perhaps) at the throughput and latency you can get if you're mindful of things to avoid.

kctess5 · on July 30, 2015

Which is, IMO, a part of the reason why Java isn't great for that. A serious pro can make it go like a race horse, but beginners and even long time users may fall into traps. If a language has less things to be mindful of, it's users may be less mindful with fewer dire consequences.

Of course, this totally depends on who might be touching the codebase...

EdSharkey · on July 30, 2015

I think this is more an article about garbage collecting vm's, namely the JVM, than it is about java, the language.

The JVM is being used in Mt. Everest-sized workloads. Object pooling is a useful tool when architecting something huge and performance sensitive.

vitalyd · on July 30, 2015

Any language used for producing high performance software requires more than just beginner knowledge, each one just has its own specific things along with general cross-lang concepts to keep in mind.

jestar_jokin · on July 29, 2015

Possibly true, in the sense that any GCed language suffers in high-performance or soft realtime apps (e.g. games, audio software). However, other constraints might influence the choice of technology, such as existing experience with the available team members, or vendor-mandated technology choices (e.g. Android apps [ignoring the NDK]), or speed-to-market - and therefore development speed - or, debateably, program correctness, is more important.

cbsmith · on July 30, 2015

You'd be surprised. With an allocator and collector that is aware of real time constraints, GC can actually be a pretty huge advantage for achieving low latency.

aidenn0 · on July 30, 2015

GC is essentially never an advantage for low latency, but it is not incompatible with it either. Things like metronome can give you extremely well defined latencies.

It's fairly moot for hard real-time programs though, as those typically completely eschew dynamic allocation (malloc can have unpredictable time too).

anarazel · on July 30, 2015

> GC is essentially never an advantage for low latency

I can't really agree with that statement. One way to get to lower latency is to avoid using locks and rely on lock free algorithms.

Many of those are much easier to implement if you can rely on a GC, because the GC solves the problem that you can have objects that are still referenced in some thread, but that aren't reachable from the lock-free datastructure anymore. There are ways around this, e.g. using RCU or hazard pointers, but mostly it's easier with a GC.

aidenn0 · on Aug 4, 2015

Do you have an example? I'm not super familiar with lock-free structures, since when I've worked on low-latency things there has been a need to quantify the worst-case timing which rules out most of the lock-free options.

MichaelGG · on July 30, 2015

It might make it easier, no? I'm working on a perf-sensitive program now. It's written in C (mainly for performance). It's spending about 25% of CPU time in free/malloc. Yikes.

This happened because it has an event dispatcher where each event has a bunch of associated name/value keypairs. Even though most of the names are fixed ("SourceIP", "SourceProfile", "SessionUuid", etc.) the event system ends up strdup'ing all of them, each time. With GC we could simply ignore this. All the constant string names would just end up in a high gen, and the dynamic stuff would get cleaned in gen0, no additional code. (As-is, I'm looking at a fairly heavy rewrite, affecting thousands of callsites.)

vitalyd · on July 30, 2015

So what's the reason for strdup'ing vs having const names that never get freed? Also, sounds like you could use ints/enum to represent the key and provide string conversion util functions. Anyway, spending 25%in malloc/free is just poor code, but you already know that. This really isn't about GC :).

Gen0 or young GC still involves a safepoint, a few trips to kernel scheduler, trashes the cpu instruction and data caches, possibly causes other promotions to tenured space (with knock on effects later), etc. It's no panacea when those tens/hundreds of millis are important.

MichaelGG · on July 30, 2015

'Cause all of the strings aren't const, some are created dynamically. Third parties add on to these event names at runtime, so we don't know ahead of time. An int-string registry would work at runtime, except for the dynamic names.

I was just pointing out that GC can "help", by reducing complexity and enabling a team that otherwise might get mired in details to deliver something OK.

cbsmith · on July 30, 2015

In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.

Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.

GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.

aidenn0 · on Aug 4, 2015

> In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.

This only works if you enter a critical section with sufficient free heap. You could have just malloc()ed that space ahead of time if you weren't using a GC, so I don't see an improvement, just a convenience.

> Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.

Amortizing costs is often the opposite of what you want to do to minimize latency; with hard real-time you care more about the worst-case than the average-case, and amortizing only helps the average-case (often at the expense of the worst-case)

> GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.

Yes, and GC can be made fully compatible with hard real-time systems; any incremental GC can be made fixed-cost with very little effort. It's somewhat moot since most hard real-time systems also want to never run out of heap, and the easiest way to do that is to never heap allocate after initialization, so most hard real-time systems don't use malloc() either.

deciplex · on July 30, 2015

I made a game for Android phones a few years back that would instantiate, at peak, a few thousand agents that would roam around an artificial environment, devouring renewable resources, before (maybe) reproducing and eventually dying off. So these agent instances were being continually created and destroyed, and the GC lag absolutely killed performance and made the game totally unplayable. It was unplayable even on a Galaxy S2 - a reasonably high-performance phone at the time. Once I implemented object pooling to handle the creation and destruction of these agents, the game ran very smoothly on an S2, and I could even run the smaller simulations on an old HTC Magic I had lying around. It made an enormous difference.

guelo · on July 30, 2015

I'm sure twic was referring to the JVM. Android's VM doesn't use a generational collector.

aidenn0 · on July 30, 2015

I'm not from the JVM world, but indeed a generational collector can be a huge boon, particularly if they use a Cheney style collector for the nursery.

MichaelGG · on July 30, 2015

On the CLR, I've found that in tight processing loops (say, 50K msg/sec) even a few tiny (~32 bytes) allocations are measurable. F# didn't lift lambdas (it'd generate a new "function pointer" object each time even though it didn't need to) and just rewriting to force evaluation to a static var was a gain.

j_baker · on July 29, 2015

Well, there are still cases where object pooling might make sense. Some games on mobile devices still use object pooling. I suppose I could also see the need to use object pooling for a very busy server or a very large hadoop job.

jdonaldson · on July 30, 2015

I can say that object pooling is critically important on certain mobile devices.

rescendent · on July 30, 2015

Medium lifetime objects are good candidates for pooling esp with a generational GC