The Go runtime is really starting to look sexy. 20 ms GC pauses on 200+ GB heaps!
I remember a thread discussing a pauseless GC on the dev mailing-list; where Gil Tene, of Azul C4's fame, commented on having a clear policy on safepoints from the beginning was paramount to having a good GC. It looked like the community is very biased towards doing the fundamental things well, and attracting the right people.
You have to be very careful about these sorts of GC statistics. Things are often not quite what they seem and they depend a lot on the type of app you run.
The first thing to be aware of is that with modern collectors (I have no idea how modern Go's new collector is though), GC pause time depends on how much live data there is in the young generation. So you can easily have enormous heaps with very low pause times if all your objects die young and hardly ever make it into the old generations, because then you never really need to collect the rest of the heap at all.
Of course, outside of synthetic benchmarks, many apps don't show such pleasant behaviour, and often GC algorithms face difficult tradeoffs that are only really knowable by the developers. For instance, do you want low pause times, or less CPU used by the collector (higher throughput)? It turns out that's a fundamental tradeoff and the right answer usually depends whether your app is a user facing server (needs low pause times) or a batch job (better to pause for long periods but complete faster). No runtime can know that, which is why the JVM has tons of tuning knobs. Left to its own devices you can theoretically get away with only tweaking a single knob which is target pause time (in G1). Set it lower and CPU usage of the collector goes up but it'll try and pause for less time. Set it higher and the collector gets more efficient.
Or you can just buy Zing and get rid of GC pauses entirely.
So a stat by itself like "20ms GC pauses on 200GB heaps" doesn't mean much by itself. You can get very low pause times with huge heaps out of the JVM as well:
The GO gc is not generational, those 20ms are for a full gc. Certainly the details of the heap usage of the application is going to affect gc times, but this is not the time for just a nursery collection.
OK, the tradeoff they're making that I missed is that it's not a compacting collector. So eventually your heap can fragment to the point where allocation gets expensive or impossible. Unusual design choice.
Unlike Java Go has first class value types and memory layout can be controlled by developers. So it leads to much less objects on heap and compact layouts both will lead to far less fragmentation. As you can see here Go apps use quite less memory than Java.
https://benchmarksgame.alioth.debian.org/u64q/go.html
Unfortunately it's impossible to reliably measure the memory usage of Java that way because the JVM will happily prefer to keep allocating memory from the OS rather than garbage collect. It makes a kind of sense: GC has a CPU cost that gets lower the more memory is given to the heap, so if you have spare memory lying around, may as well deploy it to make things run faster.
Of course that isn't always what you want (e.g. desktop apps) ... sometimes you'd rather spend the CPU and minimise the heap size. The latest Java versions on some platforms will keep track of total free system RAM and if some other program is allocating memory quickly, it'll GC harder to reduce its own usage and give back memory to the OS.
In the benchmarks game I suspect there aren't any other programs running at the same time, so Java will go ahead and use all the RAM it can get. Measuring it therefore won't give reasonable results as the heap will be full of garbage.
Value types don't have much to do with fragmentation, if anything they make it worse because embedding a value type into a larger container type results in needing larger allocations that are harder to satisfy when fragmentation gets serious. But ultimately a similar amount of data is going to end up in the heap no matter what. Yes, you can save some pointers and some object headers, so it'll be a bit less. But not so much that it solves fragmentation.
You can't really compare total memory usage of a JIT to total memory usage of an AOT compiler that way if what you're trying to show is that value types reduce memory usage.
Also, I suspect that the fact that JVMs use a generational GC (and a compacting GC) blows everything else out of the water when it comes to fragmentation. There's no way a best-fit malloc implementation can possibly beat bump allocation in the nursery for low fragmentation.
Those default memory use measurements are just a way to check if a particular 100 line toy benchmark program has been written to exploit time / space trade-off.
That is nice. But I hope that you understand that such a solution is not always preferred in every situation. For example, for performance reasons, it is always better to code in the native language.
I remember a thread discussing a pauseless GC on the dev mailing-list; where Gil Tene, of Azul C4's fame, commented on having a clear policy on safepoints from the beginning was paramount to having a good GC. It looked like the community is very biased towards doing the fundamental things well, and attracting the right people.
And on top of that we're get BLAS and LAPACK bindings from https://github.com/gonum