can somebody comment on what's happening there? I don't know go and i can't put the pieces together right now. Also aren't optimising-transformations not checked (verified) for correctness? If not, how can the authors be confident in the optimisations? Simple tests? Or is it a bug?
Variable writes that look like they have no side effects may actually be intended for another thread to read (but hopefully most reasonable programs don't do this without some good explanation). However, many concurrency models say that this method of inter-thread communication is Okay to optimize out, because most threading libraries (including Go's) make no guarantees about the order in which threads are scheduled. Therefore, removing the "writing" thread entirely is equivalent to it never being scheduled.
However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.
Go was incorrectly "optimizing" the atomic operations out entirely. It's a fairly subtle bug, actually, but still likely a big deal for the few programs communicating via shared variables.
> However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.
That sounds more like volatile than atomic. At its core, atomic just means that observer may see either pre-update or post-update state but won't see intermediate states.
I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.
According to the comments on the bug report, this behavior is inspired by C++'s <atomic> library, which guarantees cross-thread visibility.
> I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.
Only in the sense that they won't see inconsistent (intermediate) states.
Atomicity shouldn't guarantee order per se - if you want both, you need a mutex.
Ordered atomic operations like C++ std::atomic or java/C# volatile do guarantee some ordering. For example if one thread modifies an object and then stores a pointer to an atomic variable, under default visibility rules, another thread reading that pointer from the atomic variable is guaranteed to see the modifications to the object itself, i.e. the update, the store and the remote load are ordered.
It is possible to have a model where all atomic operations are relaxed (i.e. unordered) and use explicit barriers for ordering (which is closer to how some, but not all, CPUs work), but it is significantly harder to reason about and prove algorithms correct under it.
No. Read the spec. There is no deadline for atomics to be visible. What is true is that some operations have happens-before ordering, and if a store happens-before a load, the load will see the result of the store.
Did you reply to the wrong post? My post was only about ordering.
Still the C++ spec has wording about making stores visible in a reasonable time. The wording is non-normative as they run out of time (ah!) while trying to come up with a more formal (and implementable) specification.
edit: oh, I see I wrote 'default visibility' instead of 'default ordering'.
One point of view is that the removal of the atomic increment is justified because the programmer cannot rely on the execution of the go routine that contains the increment. If you add an explicit call to the scheduler the problem is fixed. However, the justification is a bit of a stretch and it looks like it will be changed back to old behavior by the go developers.
I think how to verify real compiler optimisations is still an open research question. Does any other compiler do it formally? I'm not sure it's reasonable to expect Go to do it.
In the general case, it's definitely not solved. Java and C/C++ (gcc) are known to have had compiler optimizations introduce bugs. (I assume the same is true about clang and other compilers; I just happen to remember about these ones).
For a recent Java one with Hotspot JIT, there is this FOSDEM 2017 talk.
"....Escape Analysis and Intrinsics, two commonly used HotSpot optimization techniques. I'll show how a combination of these two features can optimize away IndexOutOfBoundsExceptions in some corner cases where they are required by the standard..."
It seems that they unified the assembler used in the go compiler to use intrinsics for amd64.
In doing so, they must have botched something, because now some atomic operations (IE. operations that shouldn't be interruptible by another thread) are completely optimized away.
This is quite a big issue because any multi-threaded program will use these (they are the only way to do lock-free code).
This isn't correct, if it was this would be a much bigger issue. I'd be quite surprised if this affects any code in the wild.
This bug is related to the compiler deciding to optimize the loop. All modifications of variables in the loop are being batched till the loop exits, but it never exits therefore none of the modifications ever happen. In most cases this would be fine as the side-effects of modifying the variables in the loop wouldn't need to visible anywhere else until the loop exited.
If you remove the loop and just modify the atomic variable once, the modification is eventually happens and is seen by the second go routine.