... and the good news is, that this wasn't all that can be squeezed out of Go.
Go 1.1 comes with some impressive performance optimizations, but there are more optimizations that can be done and there are already some great ideas discussed, like a Preemptive Scheduler.
Just see what performance related changes where already made since the Go 1.1 release last week:
When will Go have a completely precise garbage collector? In 2013 it seems like poor form to have automatic memory management that isn't asymptotically safe-for-space.
This one is not exactly "buffer" as shown in the two links. It uses a custom structure data type instead of []byte, to avoid conversion between actual data and []byte (marshaling) or type assertion (conversion from/to interface{}). Check https://github.com/songgao/bufferManager.go/blob/master/buff... , inside Data there can be anything. The internal implementation is closer to `bufferpool` (2nd one) though, since they both use a channel as the pool.
As to performance in GO 1.1, I did try it with this implementation but have not updated result to Github. The speedup is similar. You could simply clone the repo and run `run_test.sh` yourself :-)
Apologies for being lazy and just thinking "ooh, buffer, like the other packages with the name buffer" without digging into the sources, and thanks for answering anyway :)
They're better everywhere except complexity. The two main things that add to the complexity in Go's case are:
1) Existing compilers. Both GCC and LLVM don't precisely track object pointers throughout code generation.
2) Existing FFIs. If you expose an FFI that allows object pointers leak to C code without anything to anchor them as a root then you lock yourself into a conservative GC.
I'm pretty sure that the existing conservative GC in cgo doesn't even scan the C part of the heap, so if you try to use unanchored pointers to Go objects, you will crash. More details here, in the comments: https://code.google.com/p/go/source/browse/misc/cgo/gmp/gmp....
I'm not sure how gccgo does it. There may be some limitations there based on the gcc internals, but I don't see why that would prevent cgo from developing a precise collector.
It wasn't obvious to me looking at the source of te actual GC implementatiob whether it avoids the C stack frames between Go frames. If it does then they don't have much to worry about.
I'm not sure about stack frames, but I do remember a lot of discussion about golang using a single contiguous area of virtual memory for its heap. Since C wouldn't be using that same address range, there would be no potential issues where someone would put a Go object on the C heap for a long time and have that result in implicit GC pinning. C stack frames may be an issue (I haven't checked either) but only for upcalls from C, which seem rare.
It doesn't take much commitment... set aside a weekend & you'll learn pretty much all the stuff you really need to know. There are some corner cases of course, but they're pretty dang rare.
Nice, I'd also like to see benchmarks with GOMAXPROCS > 1 so we can see whether the (scheduling, inter-process communication, context switch ...) overhead it brings with it has been reduced.
That's impressive how much performance can be added by simple (uhh, really not at all) compiler optimizations. Is it a language's maximum performance? Don't think so.
Look at javascript (V8) vs Ruby/Python performance. Somehow almost equivalent languages now playing in different leagues - the only differenct in time & money sent for optimizations
This is why young generations should learn about compiler design and that language != implementation.
Sure the language semantics may outlaw certain types of optimizations (e.g. aliasing), but in the end nothing forbids having compilers, interpreters or JIT implementations for a given language.
The current generation tends to think strong typing, dynamic, or a GC enabled language, requires a VM.
This is what I think it is positive about Go's designer decision of using a native compiler as canonical implementation, as it shows these youngsters there are other ways to implement languages.
Some language features do preclude some optimizations. For example, once you have dynamic typing you'll typically need a butt load of engineering effort to generate the fastest possible numerical computation.
Now that people seem to want to do everything from crypto to image processing in Javascript, projects like JägerMonkey, v8 (method JITs) and asm.js have cropped up that show this is still an unsolved problem in many respects. One could easily make the case that Javascript is, in fact, becoming less and less suitable for the web.
> Some language features do preclude some optimizations
True, that is why I mentioned the aliasing problem, but that doesn't forbid a specific type implementation.
> Now that people seem to want to do everything from crypto to image processing in Javascript, projects like JägerMonkey, v8 (method JITs) and asm.js have cropped up that show this is still an unsolved problem in many respects.
The funny thing is that advanced JIT engines like Hotspot were actually developed for dynamic languages (Self).
The good thing about JIT research for JavaScript is that it helps advancing the status of compilers for dynamic languages, even for AOT compilation scenarios I would say.
I think JS is actually a radically simpler language than either Python or Ruby. Lua is a much closer comparison, and its certainly got a nice interpreter and JIT available.
Note that Go is a language where it is well known how to optimize large parts of the runtime. The language design is sane, especially for execution speed.
Ruby and JS are very expressive languages, but their design in semantics make a lot of optimizations nontrivial to carry out. In essence you need to figure out type information at runtime and then use another JIT pass on the code to make it run fast. This is way more complex than what you can do in a simple Go compiler.
Note that a rather unoptimizing Go compiler produces code way faster than V8 or a Ruby interpreter has ever produced. This is what you can expect once your language design has few trouble spots w.r.t. optimizations.
It helps that Go is a relatively new language and the easier optimizations are getting taken care of. I doubt we will see this much of a performance gain from another 1.* release. There are definitely significant improvements to come, but I don't believe it will be this substantial.
They are still working with low hanging fruit. Brad Fitzpatrick mentioned his optimization method was to run the profile, and pick the thing on the top to work on.
Isn't that the standard way to decide what to optimize? There's often not much gain to be had if you are optimizing things that are not the bottleneck.
Rob Pike said that he believed that they could get similar performance enhancements in the next release. He was talking at Google I/O fireside chat on Go.
It's a regression in that benchmark, but it's possible that improvements elsewhere will still make your program faster overall. For instance, gob is mostly used in network servers, and the scheduler improvements may be good enough to overshadow the gob regression.
We're always looking for more real-world performance measurements. If you have any programs whose performance has regressed between 1.0 and 1.1, we'd like to hear about it.
CloudFlare uses gob a lot in a latency sensitive way and so I figured I would run my own test with a small program like this. The 'Thing' structure is meant to mirror the sort of items we send using gob.
package main
import (
"bytes"
"encoding/gob"
"fmt"
"time"
)
type Thing struct {
B []byte
S string
I int
}
func main() {
var n bytes.Buffer
e := gob.NewEncoder(&n)
t := Thing{[]byte(p), p, 42}
count := 10000000
mark := time.Now()
for i := 0; i < count; i++ {
if err := e.Encode(t); err != nil {
panic(err)
}
n.Truncate(0)
t.I += 1
}
fmt.Printf("%f\n", time.Since(mark).Seconds())
}
var p = `` // Cut for space but is a 15,344 byte web page as a string
> Doing 10 runs of each and averaging the reported duration
I wish people would stop doing that. Your program is doing the same work in all benchmarking runs; any variance in the runtime is a result of factors you're probably not interested in measuring, like kernel scheduling nuances or what CPUs your benchmark happened to run on. You should take the min of microbenchmark duration, not the average.
Just install them in different paths. You have three options then:
1. Use absolute paths for the Go tools: $ /home/paul/go-tip/bin/go run myfile.go
2. Set the GOPATH every time accordingly (script)
3. Rename the folder. For example on my Windows PC I have GOPATH pointed to D:/dev/go. In the dev folder I have Go installations in go/ (currently 1.1), go1.0.3 (guess the version) and go-tip. If I want to use another version I just rename the folder to go/
Go 1.1 comes with some impressive performance optimizations, but there are more optimizations that can be done and there are already some great ideas discussed, like a Preemptive Scheduler.
Just see what performance related changes where already made since the Go 1.1 release last week:
• SSE-powered string/bytes compare: http://code.google.com/p/go/source/detail?r=b2f1f8cb2fcb7025...
• less allocations in the JSON package: http://code.google.com/p/go/source/detail?r=00d69aa6619e77d8...
• optimizations to malloc(): http://code.google.com/p/go/source/detail?r=0fe374e887455a57... and http://code.google.com/p/go/source/detail?r=931a7362e30c2139...
• optimized aeshash: http://code.google.com/p/go/source/detail?r=80c8a9f81e4816e0...
• faster x86 memmove: http://code.google.com/p/go/source/detail?r=4cb93e2900d0c748...
• less allocations in the buffered I/O package: http://code.google.com/p/go/source/detail?r=bce231eb0fdd4bbe...
• optimizations in the flate compression package: http://code.google.com/p/go/source/detail?r=bd653e485f1d9b5c...
• optimizations to the net/http package: http://code.google.com/p/go/source/detail?r=647f336edfe88a31... and http://code.google.com/p/go/source/detail?r=d1d76fc0ab6a33f1...
• integrated network poller for BSD: http://code.google.com/p/go/source/detail?r=9d60132d77847262...