This is not only a nice enhancement to the Go compiler and thus for all Go users...

pcwalton · on April 14, 2017

Task parallelism is fine with the goroutine model, but not data parallelism, which is just as important if not more so. (Although note that task parallelism depends on having good locking strategies, and having more static checking than Go offers in this department is always nice...)

Why do we use GPUs, for example? Data parallelism. What have all the advances in video/audio codec implementation been the result of? Data parallelism again. What makes hashing fast? You guessed it...

chrisseaton · on April 14, 2017

I would have thought you could perfectly model data parallelism in terms of task parallelism though? But not the other way around.

pcwalton · on April 14, 2017

Theoretically, yes. In practice, when performance matters, no.

lossolo · on April 14, 2017

> Go is a great language which combines C-like speeds

I am using Go in production. It does not combine C-like speeds (I've comparison after rewriting one of the services from C to Go), but there is a lot of space for additional optimizations in future. Go will never be as efficient as C because of additional indirection in certain cases and additional bookkeeping. But in many cases it's faster than Java.

_ph_ · on April 14, 2017

I am also using Go in production, and I am really getting C like speeds. Low level code is basically the same efficiency - the Go compiler does not optimize as well as good C compilers, but that gap closes constantly, and there is some overhead due e.g. bounds checking. But many those checks can be optimized away and of course, safe C code would have explicit checks as part of the programs usually.

unscaled · on April 14, 2017

Go is incapable of ever achieving C like speeds, and that's not because of runtime safety checks like bounds checking. It's because Go still has a garbage collector, and that always comes with a price tag. Go also often has to copy structs and buffers that would be zero-copy in C, C++ and Rust code. Passing slices to functions which requires arrays or converting between strings and byte slices requires copy for instance. There's also the matter of poor control of stack vs. heap allocation and poor escape analysis to mitigate that. All in all, Go provides very impressive performance when compared to other high-level languages, but calling it C-level performance is pure hype.

rbehrends · on April 14, 2017

> It's because Go still has a garbage collector, and that always comes with a price tag.

This is not really true, at least not as such a general claim.

1. malloc()/free() can be either faster or slower than garbage collection. It depends on implementation details and is situational. Note in particular the issues of memory locality and temporary allocations.

2. This assumes an omniscient programmer who has perfect knowledge of when to allocate or free. In reality, memory management is a traditional cross-cutting concern, and in order to deal with it manually in a modular fashion, you will typically introduce additional overhead. Examples:

- in C++, m[a] = b, where m is a map, and a and b are strings requires that both a and b are being copied. This is not necessary in a GCed language.

- naive reference counting has significant overhead, significantly more than a modern tracing GC.

- even unique_ptr has overhead (though less so than reference counting).

Garbage collection allows you to have the necessary global knowledge to (mostly) eliminate memory management as a cross-cutting concern, because the GC is allowed to peek behind the scenes and ignore abstraction boundaries.

3. Things that C is fast at is usually also allocation-free code, i.e. iterating over arrays or matrices. The performance of such code is not going to be any different in a GCed language, assuming an optimizer of comparable caliber.

swift · on April 14, 2017

I agree with almost everything in your post, but I'm surprised about your comment that unique_ptr has overhead. Since unique_ptr is a single item struct with no vtable I'd expect no space overhead at all. Similarly, while it does need to run code in its destructor, I'd expect a trivial, non-virtual destructor like that to be reliably inlined and the result to be no worse than a manual call to delete. If that's not true, though, I'd definitely like to know, so please let me know if I've missed something.

rbehrends · on April 14, 2017

It has runtime overhead whenever a move assignment occurs. You have to check if the target of the assignment is null (or otherwise call the destructor) and you have to assign null to the source. This is all overhead compared to a plain pointer assignment. An optimizing compiler will of course try to eliminate most of that overhead, but that's not always possible.

lossolo · on April 14, 2017

>> - in C++, m[a] = b, where m is a map, and a and b are strings requires that both a and b are being copied. This is not necessary in a GCed language.

You don't need to make copy.

std::map<std::string, std::string> m;

m.emplace(std::make_pair(std::string("a"), std::string("a")));

rbehrends · on April 14, 2017

1. You are assuming here that a and b will not be used anymore and are optimizing for a special case.

2. The general problem that I was trying to illustrate is that shared ownership requires either reference counting or avoiding it through copying if you don't have automatic memory management.

lossolo · on April 14, 2017

Your example didn't had any assumptions. What you are assuming then? If you need to reuse variable in Go then it will also copy it inside the map so your point is even more invalid in that case.

rbehrends · on April 14, 2017

No. In a GCed language, both the map and the original code can safely share a reference without having to worry about coordinating lifetime.

lossolo · on April 14, 2017

> In a GCed language, both the map and the original code can safely share a reference without having to worry about coordinating lifetime.

Go is GCed language. Show me the code that will do what you claim (map[var1] = var2 and use var1(type string) and var2(type string) in later code) in Go.

rbehrends · on April 14, 2017

I'll give you OCaml:

  let addr s = 2 * (Obj.magic s) + 1
  
  let main () =
    let table = Hashtbl.create 0 in
    let key = "foo" and value = "bar" in
    Hashtbl.add table key value;
    Hashtbl.iter (fun key' value' ->
      Printf.printf "%b %b %x %x %x %x\n"
        (key == key') (value == value')
        (addr key) (addr key') (addr value) (addr value')
    ) table
  
  let () = main ()

This prints:

  true true 10cbd2490 10cbd2490 10cbd2378 10cbd2378

The exact addresses may vary from run to run.

Note that == is reference equality in OCaml.

I'm honestly baffled why you think that there is a need to copy the variables or why you can't use them later or whatever you believe there.

lossolo · on April 14, 2017

I don't know OCaml but I asked you for Go code in thread about Go and you didn't deliver. You wrote that I am covering only special case in my C++ code but you are doing exactly the same. What I see in your code are pointers to static part of memory as you are using string literals, show me code+asm of function that takes unknown values of strings at compile time and then it mutate (with third string supplied to function not known at compile time) and return those values later in the function.

> I'm honestly baffled why you think that there is a need to copy the variables or why you can't use them later or whatever you believe there.

What you wrote is SPECIAL case in SOME of the GC'd languages that have immutable strings. If you have language with mutable strings then this will not work. I know it will not work in Go also and they have immutable strings (so even not all gc'd languages with immutable strings optimize this) that's why I've asked for Go code in Go thread.

rbehrends · on April 14, 2017

> I don't know OCaml but I asked you for Go code in thread about Go and you didn't deliver.

I don't really use Go and made a general comment about GC that was not limited to Go in a subthread about fairly general observations about GC, because somebody made a general statement about GC that was not limited to Go.

The language is not relevant for the observation I made. Nor does it really matter if we're storing strings or other heap-allocated objects. It's a question of lifetime management.

> What you wrote is SPECIAL case in SOME of the GC'd languages that have immutable strings.

OCaml strings are mutable, actually. But you seem to be confusing lifetime issues with aliasing issues, anyway.

The underlying problem is that without copying C++ would not know when to free the strings. It would be whenever they went out of scope in the calling function or when they were deleted from the map, whichever is later; the alternative is reference counting (also expensive, and not used by std::string). A GCed language can avoid that because the garbage collector will only free them when they are no longer reachable from either (or any other location).

lossolo · on April 14, 2017

I wrote that I disagree to what you wrote because what you wrote was special case. Then I wrote specifically about Go in this comment:

https://news.ycombinator.com/item?id=14116435

You replied "No" to that what I wrote about Go which is not true. I never even once wrote about lifetimes. Don't you see it? Look closely again what I wrote in my comments. I think you are confused about what I was disagreeing because you didn't read carefully what I wrote. I agree with what you wrote about lifetimes 100% and I was before that discussion started because I write C++ and those things are basic knowledge. But it wasn't about lifetimes from the beginning...

> The language is not relevant for the observation I made. Nor does it really matter if we're storing strings or other heap-allocated objects.

No, because what you wrote is only true depending on the language and depending on where those strings are stored and in which language, because in some cases they will be copied so what you wrote at the beginning would not be entirely true. And I disagreed with that only, I didn't mention lifetimes even once.

If instead of what you wrote you would write (big letters to show diff):

- in C++, m[a] = b, IN SPECIAL CASE where m is a map, and a and b are strings AND A AND B OUTLIVE MAP ASSIGNMENT requires that both a and b are being copied UNLESS YOU USE SHARED OWNERSHIP. This is not necessary in SOME OF GCed languages in SPECIAL CASES.

Then I would have no reason to disagree with you in the first place.

rbehrends · on April 14, 2017

> I agree with what you wrote about lifetimes 100%

Then you wasted a lot of time for both of us by getting sidetracked by a detail that I hadn't even mentioned in my original comment. This is about the issues with having one object being referenced from multiple locations. I gave a concrete example to illustrate this issue, and you spent an entire subthread arguing the specifics of the example without seeming to understand the issue it was meant to illustrate. This is not specific to maps or strings. It's a general issue of manual memory management whenever you're dealing with shared ownership of the same object.

> You replied "No" to that what I wrote about Go which is not true.

I think you are reading to much into an expression of disagreement. You kept fixating on the specifics of go, I was trying to get back to the semantics of GCed languages vs. manual memory management. That's what my "no" was about.

Edit: I also did a quick test for Go, just to end that part of the argument, too. Contrary to your statement, Go doesn't seem to copy strings when adding them to maps, either. While there's no way to take the address of a string literal in Go, you can add lots of instances of a very large string and check the memory usage (compared to an empty string). It turns out to be nearly the same.

lossolo · on April 15, 2017

You could exactly see what I was writing about, you just didn't read it carefully enough or choose to interpret as you fit. I was right in what I wrote from the beginning. You are generalizing too much which brought this whole discussion in the first place and when I argued about specific details which I wrote in my comments you didn't address them at all, you were fixated on lifetimes whole time ignoring what I was writing. So if you want to blame someone for wasting the time, blame yourself for not being specific enough and then not replying to what I really wrote.

> I think you are reading to much into an expression of disagreement. You kept fixating on the specifics of go, I was trying to get back to the semantics of GCed languages vs. manual memory management. That's what my "no" was about.

There was only one sentence in that comment to which you could write No, next time if you want to get back to lifetimes in discussion with someone, write "I want to discuss lifetimes" instead of writing "No" to something that someone wrote, you see a difference? Words matter.

> I also did a quick test for Go, just to end that part of the argument, too. Contrary to your statement, Go doesn't seem to copy strings when adding them to maps, either. While there's no way to take the address of a string literal in Go, you can add lots of instances of a very large string and check the memory usage (compared to an empty string). It turns out to be nearly the same.

No, you are wrong. Look at runtime hashmap implementation and generated assembly (go tool objdump binary_name > asm.S) for function that do what I was writing in my comments. You will see that there is a copy.

rbehrends · on April 15, 2017

> You are generalizing too much which brought this whole discussion in the first place and when I argued about specific details which I wrote in my comments you didn't address them at all, you were fixated on lifetimes whole time ignoring what I was writing.

Because you were derailing the discussion and I was trying to get it back on track. My whole original comment was about lifetimes and GC vs. manual memory management. You were getting sidetracked by details that weren't relevant for that point, which was specifically about GCed languages vs. manual memory management in general and didn't even mention Go [1].

> No, you are wrong. Look at runtime hashmap implementation and generated assembly (go tool objdump binary_name > asm.S) for function that do what I was writing in my comments. You will see that there is a copy.

If this were true, the following program would take >65 GB of memory. In reality, it requires some 110 MB.

  package main

  import ("strings"; "fmt")

  func main() {
    m := make(map[int]string)
    value := strings.Repeat(".", 65536)
    for i := 0; i < 1000000; i++ {
      m[i] = value
    }
    fmt.Println(len(m))
  }

Note that even if it were as you said, nothing would prevent Go from changing to an implementation that doesn't copy strings.

[1] https://news.ycombinator.com/item?id=14115633

lossolo · on April 15, 2017

You are doing the same again, I wrote 3 times about use case to test and you are ignoring it, living in your own bubble. This was my last message, replying to you is waste of my time.

_ph_ · on April 14, 2017

The garbage collector does not impact the speed of tight code - when it runs it of course takes cpu power, but it runs in parallel to the program code, so this comes pretty cheap. And the GC only consumes any resources, when you allocate heap space. This is generally not necessary, and also comes with a steep price when done in C. Coding efficiently in Go requires some different best practices in C, so this might have gotten you the impression of inefficiencies, but once you have adapted to the good Go practices, there is very little difference.

_ondq · on April 14, 2017

Exactly. If you write in a way that allocations are controlled, GC becomes irrelevant to performance. Well-optimized Go without allocations is indistinguishable from C (modulo compiler differences).

woah · on April 14, 2017

Are there guides on how to do this? I've always assumed that passing by value helps but I'm wondering if there's more explicit guidance out there?

_ph_ · on April 14, 2017

I have no guide at hand, but here a few hints:

- in general, understand which types are values, and which have to be allocated. E.g. the slice structure itself is a value type (typically 24 bytes), but it points to an array, which might be heap allocated. Re-slicing creates a new slice structure, but does not touch the heap - extending a slice might.

- to analyze your code, use go build -gcflags="-m", this will print out optimization information, especially which allocations "escape to heap", which means, they are heap allocated and not stack allocated.

- Interfaces are really great, but copying values of interface{} does involve heap allocation of 16 bytes. Also method invocation via interfaces has some overhead. Method invocation of non-interface types is a fast a plain function call.

- profile :)

igouy · on April 18, 2017

Please show some examples that support what you say.

lossolo · on April 14, 2017

Then go for it and make those 7/10 benchmarks in which Go is at least 2x slower than C to be the same speed.

http://benchmarksgame.alioth.debian.org/u64q/compare.php?lan...

kardianos · on April 14, 2017

This is true. But much of the GC can be done on parallel threads, reducing the amount of total work that can be done, but keeping the threads that do work nice and speedy.

blauditore · on April 14, 2017

> Single thread performance of processors has not increased very much in the recent years, while the core count continued to grow

Frequencies have not increased much, but performance certainly has. Better single-core performance at the same frequency is the main advantage Intel CPUs currently have over AMD ones.

abecedarius · on April 14, 2017

'Much' is a matter of perspective. Mine was formed in the 80s and 90s, when doubled performance took maybe a couple years, not a decade.

noir_lord · on April 14, 2017

Interesting with the Ryzen parts coming to market as well.

I'm in the market for a new developer desktop and if it shakes out the way it's looking an 8 core/16 thread Ryzen is hitting the price/perf sweetspot.

_ph_ · on April 14, 2017

Yes, I was so happy to learn that AMD is finally pushing 8 cores for mainstream processors, they were too long stuck on 4 cpus. Of course, building software fully using those cpus is the challenge, but thats where Go comes in.

geezerjay · on April 16, 2017

> Yes, I was so happy to learn that AMD is finally pushing 8 cores for mainstream processors, they were too long stuck on 4 cpus.

What do you mean by "finally pushing 8 cores"? AMD has been selling 8 core CPUs for years with their AMD FX 8xxx line.

noir_lord · on April 14, 2017

It's not just the building, it's the ability to run multiple virtualised environments (for example at the moment I'm running Xubuntu, two vagrant machines and Microsoft's Windows 10/Edge testing VM).

pcwalton · on April 14, 2017

As I elaborated on elsewhere, if you focus on task parallelism to the exclusion of data parallelism, like Go tends to do, then you are not making good use of the hardware.

weberc2 · on April 14, 2017

What does it mean for a programming language to "tend to focus on task vs data parallelism"? In my mind, Go gives you primitives from which to build either data or task parallel algos, and the programmer can choose the kind of algorithm that they find appropriate (perhaps data parallelism is always faster, but perhaps some developers find task parallelism easier and their time is not well spent squeezing out the performance delta?). Is there some set of primitives that would cater more naturally to data parallelism? If these primitives exist, are they really 'better' than Go's primitives, or do they simply make task parallelism harder without making data parallelism easier? These aren't rhetorical questions; I'm genuinely curious to hear your opinion.

pcwalton · on April 14, 2017

You want SIMD and parallel for, parallel map/reduce, etc. Goroutines are too heavyweight for most of these tasks: you will be swamped in creation, destruction, and message passing overhead. What you need is something like TBB or a Cilk-style scheduler.

weberc2 · on April 14, 2017

I see. It's worth noting that there are SIMD libraries for Go. I can't comment about them because I haven't used them.

zzzcpan · on April 14, 2017

"Go gives you primitives from which to build either data or task parallel algos"

No, Go doesn't give you simd/vector primitives for data parallelism.

weberc2 · on April 14, 2017

SIMD is an implementation of data parallelism. Data parallelism doesn't require SIMD.

zzzcpan · on April 14, 2017

"with a good concurrency model, making your program comparatively easy to parallelize on a per-function basis"

Actually Go's shared-memory multithreading concurrency model encourages program architectures that are not easily parallelizable at all. And it's harmful to praise it. We have much better share-nothing models that encourage parallelizable programs from the start.

rbehrends · on April 14, 2017

Compilation is not a good example of this, though. Parallel compilation is something that any language that allows for separate compilation has supported for decades (make -j N at its most basic, with some language implementations providing more sophisticated support).

burntsushi · on April 14, 2017

Go has supported that too. But this particular change enables parallelism within a single compilation unit.

mtanski · on April 14, 2017

Yes, but often times you've been stuck with single threaded / single process linking.

Also, why LLVM has spent time recently creating a fast(er) linker.