To me, the most interesting change is the performance improvement due to the new...

benhoyt · on Aug 16, 2021

Wow, this update is awesome: my GoAWK interpreter (https://github.com/benhoyt/goawk) runs a simple CPU-bound AWK program 38% faster when compiled with Go 1.17 (compared to 1.16).

  $ time goawk_go1.16 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
  4999999950000000
  real    0m10.158s ...
  $ time goawk_go1.17 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
  4999999950000000
  real    0m6.268s ...

I wonder why it's so much better than their advertised 5% perf improvement? Here's a quick CPU profile: https://i.imgur.com/csJyOYq.png ... I don't get too much out of it at a glance, just seems like everything's a bunch faster.

drchase · on Aug 17, 2021

Hi, I'm one of the people who worked on it, and the guy who did the initial estimate back in early 2017. 5% is the geomean of a lot of benchmarks; a whole lot fall in the the 4-8% range, a few do worse because the new ABI creates new patterns of register use that don't fit well with the current register allocator, and the fix was larger than we wanted to risk. (See https://github.com/golang/go/issues/46216 )

benhoyt · on Aug 17, 2021

Overall for GoAWK I get an 18% speed increase on my micro-benchmarks between Go 1.16 and 1.17 (see https://github.com/benhoyt/goawk/commit/1f314f421273b3dc164f...) and I measured an 8% speed increase on my "slightly more real-world" benchmarks (these ones: https://github.com/benhoyt/goawk/blob/master/benchmark_awks....).

prattmic · on Aug 16, 2021

The benefits come primarily from avoiding extra work spilling arguments to/from the stack on function calls. If you are making lots and lots of function calls, particularly to small functions that can't be inlined, there could certainly be much bigger improvements.

barsonme · on Aug 16, 2021

just an fyi: you can use the -diff_base flag to diff the profiles without opening both profiles side-by-side.

benhoyt · on Aug 16, 2021

Oh, good to know, thanks!

_ph_ · on Aug 17, 2021

The speed gained depends a lot on the structure of the code benchmarked. Natively written Go code has more computation happening in local loops without many function calls, the optimization brings less effect. An interpreter often calls a function for every single directive executed. This means, you have a lot of function calls inside loops, sometimes for every single operation executed. This of course profits massively from this optimization.

Scaevolus · on Aug 17, 2021

Look at the disassembly and observe how your function calls have far fewer push/pop operations going on, and how the function prologues/epilogues are smaller.

jen20 · on Aug 17, 2021

This also likely improves compatibility with observability tooling which assumes the standard calling convention, which is welcome.

benhoyt · on Aug 17, 2021

Note that this switches to a register-based calling convention, but it's not the platform "standard" one. The Go authors describe the rationale for this here: https://go.googlesource.com/proposal/+/master/design/40724-r...