Setting aside GC, nailgun (JDK <= 8?) and drip already solves/d short-running VMs. This is often how to speed-up CLI tools like JRuby, ant, mvn, sbt, etc.
The post misleads readers into thinking that JVM runs the GC before exit. It does not.
When I was writing the Epsilon JEP, I meant that it might be futile to have a hundreds-of-ms-long GC cycle, when the program exits very soon anyway, and the heap would be abandoned wholesale. The important bit of trivia is that GC might be invoked long before 'the whole memory' is exhausted. There are several reasons to do this: learning the application profile to size up generations or collection triggers, minimizing the startup footprint, etc. GC cycle then can be seen as the upfront cost that pays off in future. With the extremely short-lived job that future never comes.
Contrived example:
$ cat AL.java
import java.util.*;
public class AL {
public static void main(String... args) throws Throwable {
List<Object> l = new ArrayList<>();
for (int c = 0; c < 100_000_000; c++) {
l.add(new Object());
}
System.out.println(l.size());
}
}
$ javac AL.java
Ooof, 12.5 seconds to run, and about 2 cpu-minutes taken with Parallel:
$ time jdk11.0.5/bin/java -XX:+UnlockExperimentalVMOptions -Xms3g -Xmx3g -XX:+UseParallelGC -Xlog:gc AL
[0.015s][info][gc] Using Parallel
[0.988s][info][gc] GC(0) Pause Young (Allocation Failure) 768M->469M(2944M) 550.699ms
...
[12.281s][info][gc] GC(3) Pause Full (Ergonomics) 1795M->1615M(2944M) 7660.045ms
100000000
real 0m12.464s
user 1m53.618s
sys 0m1.087s
Much better with G1, but we still took 11 cycles that accrued enough pauses to affect the end-to-end timing. Plus GC threads took some of our precious CPU.
$ time jdk11.0.5/bin/java -XX:+UnlockExperimentalVMOptions -Xms3g -Xmx3g -XX:+UseG1GC -Xlog:gc AL
[0.031s][info][gc] Using G1
[0.452s][info][gc] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 316M->314M(3072M) 124.119ms
...
[2.518s][info][gc] GC(11) Pause Young (Normal) (G1 Evacuation Pause) 2321M->2324M(3072M) 79.496ms
100000000
real 0m2.953s
user 0m16.880s
sys 0m0.872s
Now Epsilon, whoosh, 1.5s end-to-end, and less than 1s of user time, which is probably the only running Java thread itself, plus some OS memory management on allocation path.
$ time jdk11.0.5/bin/java -XX:+UnlockExperimentalVMOptions -Xms3g -Xmx3g -XX:+UseEpsilonGC -Xlog:gc AL
[0.004s][info][gc] Using Epsilon
...
[1.387s][info][gc] Heap: 3072M reserved, 3072M (100.00%) committed, 2731M (88.93%) used
real 0m1.480s
user 0m0.830s
sys 0m0.699s
You might think fully concurrent GCs would solve this, and they partially do, by avoiding large pauses. But they still eat CPUs. For example, while Shenandoah is close to Epsilon in doing the whole thing in about 1.7s wall clock time, it still takes quite significant CPU time. Therefore, that benefit is there because machine has spare CPUs to offload that work to.
$ time jdk11-shenandoah/bin/java -XX:+UnlockExperimentalVMOptions -Xms3g -Xmx3g -XX:+UseShenandoahGC -Xlog:gc AL
[0.009s][info][gc] Using Shenandoah
...
[0.913s][info][gc] Trigger: Learning 3 of 5. Free (1651M) is below initial threshold (2150M)
[0.913s][info][gc] GC(2) Concurrent reset 1265M->1267M(3072M) 0.689ms
[0.914s][info][gc] GC(2) Pause Init Mark 0.111ms
[1.276s][info][gc] GC(2) Concurrent marking 1267M->1925M(3072M) 361.985ms
[1.306s][info][gc] GC(2) Pause Final Mark 0.465ms
[1.306s][info][gc] GC(2) Concurrent cleanup 1924M->1748M(3072M) 0.171ms
real 0m1.761s
user 0m5.688s
sys 0m0.633s
Perhaps there may be objects that depend on the finalizer callback for correctnesss. I have seen people use finalizer to do things like close file handles, and presumably not calling close may not guarantee data is persisted.
It's not a issue? It's one of the cases where it does make sense to use Epsilon as the heap is cleared anyway on program exit.
From the post:
> There is a strong temptation to use Epsilon on deployed programs, rather than to confine it to performance tuning work. As a rule, the Java team discourages this use, with two exceptions. Short-running programs, like all programs, invoke the garbage collector at the end of their run. However, as JEP 318 explains, “accepting the garbage collection cycle to futilely clean up the heap is a waste of time, because the heap would be freed on exit anyway.”
Memory might need to be cleaned up if the program was being run embedded in something else (it's not unheard of to embed JVMs inside e.g. C++ applications, and it's very common in scripting languages to do this).
Additionally, global destructors, while not guaranteed, can be very helpful if you let them run rather than just exiting and letting the system clean up file descriptors: for example, a clean disconnect from a database is often faster overall (on the database side, e.g. freeing up a connection slot) than a dirty "client hasn't phoned in for awhile/received unexpected FIN" disconnect via hard-exit.
> Memory might need to be cleaned up if the program was being run embedded in something else
Just unmap the heap pages. Don't run the GC!
> global destructors, while not guaranteed, can be very helpful if you let them run
If you want them to run on exit then you want Runtime.runFinalizersOnExit, not the GC. Finalizers are non-deterministic, asynchronous, and would take an indefinite number of GC cycles to run them for all objects.
I think the concern is those resources might be external and not cleaning up correctly leaves them in an inconsistent state. Not saying this is best practice but I've seen it done.
Finalisers are not guaranteed to be called by GC in theory, and in practice they run asynchronously even if they are going to be called, so aren't likely to be called if you GC and then exit.
I agree with you and how it's not reliable. I just remember the Rust community going through this same kerfuffle with their Drop trait not being guaranteed not too long ago.