Now I'm no Java expert, far from it, so would appreciate any answers to this. I'm interacting with bunch of CLIs that are either in Java or using the JVM otherwise (Clojure mostly), how much of the startup time for this things can be attributed to the GC? It's mentioned in the article that short-running programs (almost all CLIs I use) could use Epsilon since the heap is cleared on exit anyways. But wondering how much of the typical program actually spends on, what I guess is initializing the GC?
Application startup time usually is dominated by class loading, JITs and actual application initialization code. GC overhead should be fairly small unless the heap is badly sized.
You might be able to shave off a few milliseconds by tuning the GC, but the lion's share is somewhere else.
You could try OpenJ9 for CLI tools which claims to offer faster startup times out of the box compared to openjdk. Tweaking the JIT behavior (number of compiler threads, compilation thresholds for the tiers etc.) or using CDS can help too.
If you have access to the source code, you can try to recompile it with Graal Native and make it, well, native executable. It will shave off the load times considerably. But if you have a dozen of them, each of them will have their own JRE embedded so you'll waste disk space
If you have assigned a JVM too little memory for the short running task it might be a significant amount of time, but if the amount of memory is set correctly the initial GC setup is a fraction of the time compared to the time spent JIT:ing.
Although you might gain some performance, I guess this "GC" is going to be used mostly by
people running financial - or other - low latency analysis/streaming code where it has been common for years to try to tune the JVM to never even attempt to GC - to avoid latency.
The code in these cases are written to reuse most memory, and when the unreclaimable part grows to big, that cluster node stops taking requests - and is then restarted.
There's some precedent for chopping out the GC for short-lived applications. DMD, the (self-hosting) D compiler, does this, effectively using a push-only stack for the heap, never doing anything akin to freeing memory. [0]
In modern GCs, allocation is already as fast as can be (pointer-bump allocation), so I imagine the only win in chopping out the GC is that you don't need to initialize the GC (it's otherwise roughly equivalent to simply terminating before the GC needs to be invoked).
Perhaps the DMD example isn't quite the same, though, as it's possible its GC has slower allocation than pointer-bump.
The published AppCDS + Clojure results show a much smaller speedup and require a higher degree of customization int he build. Like 1.5s -> 0.5s for AppCDS+AOT vs 1.5s -> 0.005s for Graal. And you can just use the clj or leiningen native-image plugins/templates. The minuses of Graal include some compatibility snags and being an Oracle product.
One interesting thing for you may be Class-Data Sharing feature that keeps already parsed class data across restarts and can re-use it with other JVM instances running same code. It also allows JVM to share these data in memory with other JVMs running on same host so in some use cases it can both speed up startup time and save memory.
I was working with Clojure a lot some years back, and I'm sure there's been a lot of progress in the ecosystem since then. However, what I learned then was that Clojure had an inherent startup overhead, because it had to get the language itself ready. The reason why you don't see this with for example Scala, which also runs on JVM, is because Clojure is very dynamic, as far as JVM languages go. Compare it to Java, for which the JVM was designed. These dynamic qualities come in part at the cost of the startup time.
I was especially frustrated by this, because I had spent some time writting a Clojure program that had to be able to cold-start fast. Decompiling the program's JAR and pruning out unnecessary classes gave a considerable speed-up, but it was not enough.
There's an use-case in twelve factor apps where GC pauses would be unacceptable but high availability would allow downtime of an individual stateless app instance. So instead of spending any time GCing, just eat memory and throw it all away and start over fresh as necessary. With various tricks, an instance can be swapped quickly (start a new instance just before killing)... probably want some sort of user-space "OOM killer" to handle it. ulimits lower than JVM option limits would work too, but wouldn't have fast restarts without some magic.
You might be replying to the wrong comment, or I'm not making my question clear enough. I'm wondering how much of the startup time the GC currently takes, and if using Epsilon will make startup faster.