Go 1.1 performance improvements

JulienSchmidt · on May 21, 2013

... and the good news is, that this wasn't all that can be squeezed out of Go.

Go 1.1 comes with some impressive performance optimizations, but there are more optimizations that can be done and there are already some great ideas discussed, like a Preemptive Scheduler.

Just see what performance related changes where already made since the Go 1.1 release last week:

• SSE-powered string/bytes compare: http://code.google.com/p/go/source/detail?r=b2f1f8cb2fcb7025...

• less allocations in the JSON package: http://code.google.com/p/go/source/detail?r=00d69aa6619e77d8...

• optimizations to malloc(): http://code.google.com/p/go/source/detail?r=0fe374e887455a57... and http://code.google.com/p/go/source/detail?r=931a7362e30c2139...

• optimized aeshash: http://code.google.com/p/go/source/detail?r=80c8a9f81e4816e0...

• faster x86 memmove: http://code.google.com/p/go/source/detail?r=4cb93e2900d0c748...

• less allocations in the buffered I/O package: http://code.google.com/p/go/source/detail?r=bce231eb0fdd4bbe...

• optimizations in the flate compression package: http://code.google.com/p/go/source/detail?r=bd653e485f1d9b5c...

• optimizations to the net/http package: http://code.google.com/p/go/source/detail?r=647f336edfe88a31... and http://code.google.com/p/go/source/detail?r=d1d76fc0ab6a33f1...

• integrated network poller for BSD: http://code.google.com/p/go/source/detail?r=9d60132d77847262...

yassim · on May 21, 2013

Wonderful links. Thankyou.

cwzwarich · on May 21, 2013

When will Go have a completely precise garbage collector? In 2013 it seems like poor form to have automatic memory management that isn't asymptotically safe-for-space.

songgao · on May 21, 2013

Check this out: https://github.com/songgao/bufferManager.go

Not a precise GC, but will be really helpful in some scenarios.

vanderZwan · on May 21, 2013

Have you updated the benchmarks with Go 1.1? How does it compare to other buffer packages?

http://godoc.org/github.com/cznic/bufs

http://godoc.org/github.com/kotokoko/chihaya/bufferpool

songgao · on May 21, 2013

This one is not exactly "buffer" as shown in the two links. It uses a custom structure data type instead of []byte, to avoid conversion between actual data and []byte (marshaling) or type assertion (conversion from/to interface{}). Check https://github.com/songgao/bufferManager.go/blob/master/buff... , inside Data there can be anything. The internal implementation is closer to `bufferpool` (2nd one) though, since they both use a channel as the pool.

As to performance in GO 1.1, I did try it with this implementation but have not updated result to Github. The speedup is similar. You could simply clone the repo and run `run_test.sh` yourself :-)

vanderZwan · on May 21, 2013

Very clear answer!

Apologies for being lazy and just thinking "ooh, buffer, like the other packages with the name buffer" without digging into the sources, and thanks for answering anyway :)

Groxx · on May 21, 2013

I assume there are tradeoffs? Or are precise garbage collectors simply better everywhere (except maybe complexity)?

cwzwarich · on May 21, 2013

They're better everywhere except complexity. The two main things that add to the complexity in Go's case are:

1) Existing compilers. Both GCC and LLVM don't precisely track object pointers throughout code generation.

2) Existing FFIs. If you expose an FFI that allows object pointers leak to C code without anything to anchor them as a root then you lock yourself into a conservative GC.

cmccabe · on May 21, 2013

I'm pretty sure that the existing conservative GC in cgo doesn't even scan the C part of the heap, so if you try to use unanchored pointers to Go objects, you will crash. More details here, in the comments: https://code.google.com/p/go/source/browse/misc/cgo/gmp/gmp....

I'm not sure how gccgo does it. There may be some limitations there based on the gcc internals, but I don't see why that would prevent cgo from developing a precise collector.

cwzwarich · on May 21, 2013

It wasn't obvious to me looking at the source of te actual GC implementatiob whether it avoids the C stack frames between Go frames. If it does then they don't have much to worry about.

cmccabe · on May 22, 2013

I'm not sure about stack frames, but I do remember a lot of discussion about golang using a single contiguous area of virtual memory for its heap. Since C wouldn't be using that same address range, there would be no potential issues where someone would put a Go object on the C heap for a long time and have that result in implicit GC pinning. C stack frames may be an issue (I haven't checked either) but only for upcalls from C, which seem rare.

pvnick · on May 21, 2013

Go looks really exciting and is likely the language I'm going to commit to learning next.

icey · on May 21, 2013

It doesn't take much commitment... set aside a weekend & you'll learn pretty much all the stuff you really need to know. There are some corner cases of course, but they're pretty dang rare.

pvnick · on May 21, 2013

Thanks for the encouragement!

gabipurcaru · on May 21, 2013

I managed to learn the language _while_ doing a side project, and it was just as quick as doing it in a language I knew.

netcraft · on May 21, 2013

out of curiosity what was your primary language?

gabipurcaru · on May 21, 2013

python

Myrmornis · on May 21, 2013

This is awesome: https://gobyexample.com/ I'm no go expert but that's a really efficient way to learn a language.

lazyjones · on May 21, 2013

Nice, I'd also like to see benchmarks with GOMAXPROCS > 1 so we can see whether the (scheduling, inter-process communication, context switch ...) overhead it brings with it has been reduced.

out_of_protocol · on May 21, 2013

That's impressive how much performance can be added by simple (uhh, really not at all) compiler optimizations. Is it a language's maximum performance? Don't think so. Look at javascript (V8) vs Ruby/Python performance. Somehow almost equivalent languages now playing in different leagues - the only differenct in time & money sent for optimizations

pjmlp · on May 21, 2013

This is why young generations should learn about compiler design and that language != implementation.

Sure the language semantics may outlaw certain types of optimizations (e.g. aliasing), but in the end nothing forbids having compilers, interpreters or JIT implementations for a given language.

The current generation tends to think strong typing, dynamic, or a GC enabled language, requires a VM.

This is what I think it is positive about Go's designer decision of using a native compiler as canonical implementation, as it shows these youngsters there are other ways to implement languages.

nly · on May 21, 2013

Some language features do preclude some optimizations. For example, once you have dynamic typing you'll typically need a butt load of engineering effort to generate the fastest possible numerical computation.

Now that people seem to want to do everything from crypto to image processing in Javascript, projects like JägerMonkey, v8 (method JITs) and asm.js have cropped up that show this is still an unsolved problem in many respects. One could easily make the case that Javascript is, in fact, becoming less and less suitable for the web.

pjmlp · on May 21, 2013

> Some language features do preclude some optimizations

True, that is why I mentioned the aliasing problem, but that doesn't forbid a specific type implementation.

> Now that people seem to want to do everything from crypto to image processing in Javascript, projects like JägerMonkey, v8 (method JITs) and asm.js have cropped up that show this is still an unsolved problem in many respects.

The funny thing is that advanced JIT engines like Hotspot were actually developed for dynamic languages (Self).

The good thing about JIT research for JavaScript is that it helps advancing the status of compilers for dynamic languages, even for AOT compilation scenarios I would say.

tsewlliw · on May 21, 2013

I think JS is actually a radically simpler language than either Python or Ruby. Lua is a much closer comparison, and its certainly got a nice interpreter and JIT available.

stavros · on May 21, 2013

> I think JS is actually a radically simpler language than either Python or Ruby.

Why do you think that?

pkroll · on May 21, 2013

I'm not familiar with Python, but Ruby SEEMS a whole lot more complicated than JavaScript, in syntax and how you can manipulate code.

stavros · on May 21, 2013

I probably agree with you, I was just wondering if there was some more concrete justification.

jlouis · on May 21, 2013

Note that Go is a language where it is well known how to optimize large parts of the runtime. The language design is sane, especially for execution speed.

Ruby and JS are very expressive languages, but their design in semantics make a lot of optimizations nontrivial to carry out. In essence you need to figure out type information at runtime and then use another JIT pass on the code to make it run fast. This is way more complex than what you can do in a simple Go compiler.

Note that a rather unoptimizing Go compiler produces code way faster than V8 or a Ruby interpreter has ever produced. This is what you can expect once your language design has few trouble spots w.r.t. optimizations.

ConceitedCode · on May 21, 2013

It helps that Go is a relatively new language and the easier optimizations are getting taken care of. I doubt we will see this much of a performance gain from another 1.* release. There are definitely significant improvements to come, but I don't believe it will be this substantial.

talloaktrees · on May 21, 2013

They are still working with low hanging fruit. Brad Fitzpatrick mentioned his optimization method was to run the profile, and pick the thing on the top to work on.

kisielk · on May 21, 2013

Isn't that the standard way to decide what to optimize? There's often not much gain to be had if you are optimizing things that are not the bottleneck.

cloudwizard · on May 21, 2013

Rob Pike said that he believed that they could get similar performance enhancements in the next release. He was talking at Google I/O fireside chat on Go.

vanderZwan · on May 21, 2013

I do expect the "bottlenecks" in Go programs to have shifted a bit thought.

enneff · on May 21, 2013

I wouldn't be so sure. The compiler is still very simple by modern standards.

songgao · on May 21, 2013

If I understand it correctly, Gob encoding is slower in 1.1?

enneff · on May 21, 2013

Correct.

It's a regression in that benchmark, but it's possible that improvements elsewhere will still make your program faster overall. For instance, gob is mostly used in network servers, and the scheduler improvements may be good enough to overshadow the gob regression.

We're always looking for more real-world performance measurements. If you have any programs whose performance has regressed between 1.0 and 1.1, we'd like to hear about it.

jgrahamc · on May 21, 2013

CloudFlare uses gob a lot in a latency sensitive way and so I figured I would run my own test with a small program like this. The 'Thing' structure is meant to mirror the sort of items we send using gob.

  package main

  import (
    "bytes"
    "encoding/gob"
    "fmt"
    "time"
  )

  type Thing struct {
    B []byte
    S string
    I int
  }

  func main() {
    var n bytes.Buffer
    e := gob.NewEncoder(&n)

    t := Thing{[]byte(p), p, 42}

    count := 10000000
    mark := time.Now()
    for i := 0; i < count; i++ {
      if err := e.Encode(t); err != nil {
        panic(err)
      }

      n.Truncate(0)
      t.I += 1
    }

    fmt.Printf("%f\n", time.Since(mark).Seconds())
  }

  var p = `` // Cut for space but is a 15,344 byte web page as a string

I had two versions of Go on my machine: 16c93a202587 (http://code.google.com/p/go/source/detail?r=16c93a202587f499... from March 1) and 9d60132d7784 (https://code.google.com/p/go/source/detail?r=9d60132d7784 from yesterday).

Doing 10 runs of each and averaging the reported duration I see:

    16c93a202587: 29.4829122s
    9d60132d7784: 26.4422463s

So, in my test Go 1.1 (represented by 9d60132d7784) is 10% faster than a previous version from March.

Tests were done on a AMD64 Linux.

jemfinch · on May 21, 2013

> Doing 10 runs of each and averaging the reported duration

I wish people would stop doing that. Your program is doing the same work in all benchmarking runs; any variance in the runtime is a result of factors you're probably not interested in measuring, like kernel scheduling nuances or what CPUs your benchmark happened to run on. You should take the min of microbenchmark duration, not the average.

paulsmith · on May 21, 2013

How do you install multiple versions of Go on a single machine?

JulienSchmidt · on May 21, 2013

Just install them in different paths. You have three options then:

1. Use absolute paths for the Go tools: $ /home/paul/go-tip/bin/go run myfile.go

2. Set the GOPATH every time accordingly (script)

3. Rename the folder. For example on my Windows PC I have GOPATH pointed to D:/dev/go. In the dev folder I have Go installations in go/ (currently 1.1), go1.0.3 (guess the version) and go-tip. If I want to use another version I just rename the folder to go/

pkulak · on May 21, 2013

Looking for Arrested Development joke...