There are also load and store barriers which add work when accessing objects from the heap. In many cases, adding work in the parallel path is good if it allows you to avoid single-threaded sections, but not in all cases. Single-threaded programs with a lot of reads can be pretty significantly impacted by barriers,
Sure, but other forms of memory management are costly, too. Even if you allocate everything from the OS upfront and then pool stuff, you still need to spend some computational work on the pool [1]. Working with bounded memory necessarily requires spending at least some CPU on memory management. It's not that the alternative to barriers is zero CPU spent on memory management.
> The Parallel GC is still useful sometimes!
Certainly for batch-processing programs.
BTW, the paper you linked is already at least somewhat out of date, as it's from 2021. The implementation of the GCs in the JDK changes very quickly. The newest GC in the JDK (and one that may be appropriate for a very large portion of programs) didn't even exist back then, and even G1 has changed a lot since. (Many performance evaluations of HotSpot implementation details may be out of date after two years.)
[1]: The cheapest, which is similar in some ways to moving-tracing collectors, especially in how it can convert RAM to CPU, is arenas, but they can have other kinds of costs.
The difference with manual memory management or parallel GC is that concurrent GCs create a performance penalty on every reads and writes (modulo what the JIT can elide). That performance penalty is absolutely measurable even with the most recent GCs. If you look at the assembly produced for the same code running with ZGC and Parallel, you’ll see that read instructions translate to way more cpu instructions in the former. We were just looking at a bug (in our code) at work this week, on Java 25 that was exposed by the new G1 barrier late expansion.
Different applications will see different overall performance changes (positive or negative) with different GCs. I agree with you that most applications (especially realistic multi threaded ones representative of the kind of work that people do on the JVM) benefit from the amazing GC technology that the JVM brings. It is absolutely not the case however that the only negative impact is on memory footprint.
> The difference with manual memory management or parallel GC is that concurrent GCs create a performance penalty on every reads and writes
Not on every read and write, but it could be on every load and store of a reference (i.e. reading a reference from the heap to a register or writing a reference from a register to the heap). But what difference does it make where exactly the cost is? What matters is how much CPU is spent on memory management (directly or indirectly) in total and how much latency memory management can add. You are right that the low-latency collectors do use up more CPU overall than a parallel STW collector, but so does manual memory management (unless you use arenas well).
There are also load and store barriers which add work when accessing objects from the heap. In many cases, adding work in the parallel path is good if it allows you to avoid single-threaded sections, but not in all cases. Single-threaded programs with a lot of reads can be pretty significantly impacted by barriers,
https://rodrigo-bruno.github.io/mentoring/77998-Carlos-Gonca...
The Parallel GC is still useful sometimes!