To me, the most interesting change is the performance improvement due to the new register-based calling convention. Your CPU-bound programs will magically get 5% faster when compiled with 1.17:
> Go 1.17 implements a new way of passing function arguments and results using registers instead of the stack. Benchmarks for a representative set of Go packages and programs show performance improvements of about 5%, and a typical reduction in binary size of about 2%. This is currently enabled for Linux, macOS, and Windows on the 64-bit x86 architecture (the linux/amd64, darwin/amd64, and windows/amd64 ports).
I love how they're doing it in such an iterative fashion: even assembly functions don't have to be rewritten. Then again, I guess doing it progressively like this is the only feasible way to avoid reworking all the low-level assembly routines in one fell swoop.
Wow, this update is awesome: my GoAWK interpreter (https://github.com/benhoyt/goawk) runs a simple CPU-bound AWK program 38% faster when compiled with Go 1.17 (compared to 1.16).
$ time goawk_go1.16 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
4999999950000000
real 0m10.158s ...
$ time goawk_go1.17 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
4999999950000000
real 0m6.268s ...
I wonder why it's so much better than their advertised 5% perf improvement? Here's a quick CPU profile: https://i.imgur.com/csJyOYq.png ... I don't get too much out of it at a glance, just seems like everything's a bunch faster.
Hi, I'm one of the people who worked on it, and the guy who did the initial estimate back in early 2017. 5% is the geomean of a lot of benchmarks; a whole lot fall in the the 4-8% range, a few do worse because the new ABI creates new patterns of register use that don't fit well with the current register allocator, and the fix was larger than we wanted to risk. (See https://github.com/golang/go/issues/46216 )
The benefits come primarily from avoiding extra work spilling arguments to/from the stack on function calls. If you are making lots and lots of function calls, particularly to small functions that can't be inlined, there could certainly be much bigger improvements.
The speed gained depends a lot on the structure of the code benchmarked. Natively written Go code has more computation happening in local loops without many function calls, the optimization brings less effect.
An interpreter often calls a function for every single directive executed. This means, you have a lot of function calls inside loops, sometimes for every single operation executed. This of course profits massively from this optimization.
Look at the disassembly and observe how your function calls have far fewer push/pop operations going on, and how the function prologues/epilogues are smaller.
> Go 1.17 implements a new way of passing function arguments and results using registers instead of the stack. Benchmarks for a representative set of Go packages and programs show performance improvements of about 5%, and a typical reduction in binary size of about 2%. This is currently enabled for Linux, macOS, and Windows on the 64-bit x86 architecture (the linux/amd64, darwin/amd64, and windows/amd64 ports).
I love how they're doing it in such an iterative fashion: even assembly functions don't have to be rewritten. Then again, I guess doing it progressively like this is the only feasible way to avoid reworking all the low-level assembly routines in one fell swoop.