This is a great paper; it is very readable and motivated and I learned quite a bit. I'm also now looking forward to perusing the 2012 Ninja C paper.
One small change I would make to the preprint would be to better normalize the graph symbols. In particular, failing to read the legends of each graph carefully might cause readers to misattribute results in subsequent graphs (for example, my mind wanted to associate Intel's HRC with "the lighter gray ones with the boxes", which is not a stable representation across all graphs).
It is interesting to read how Haskell optimized the algorithm based on the instrinsic properties of the data structures. In contrast, C compilers leveraged on the knowledge of the underlying machine. It is amazing how far Haskell compilers have come.
It's a good conclusion I feel: this is always the issue with language benchmarks - who wrote the code, and how good were they with each of the languages.
Similarly as the article points out, the compiler matters a lot: ICC can in certain cases be more than 200% faster than GCC with similar flags, and is generally 15-20% faster anyway, mainly due to more intelligent inlining and much faster (and more accurate with fpmath=fast) math libs.
As an AMD user I really hope most programmers know this by now. If you make a build for the general public as opposed to only targeting Intel machines, please don't use ICC.
..suggests nothing has changed. Search for "non-Intel".
Maybe ICC generates code which beats GCC even when run on an AMD chip in one particular benchmark but that doesn't mean it generates better code in general.
Personally I will never trust the Intel compiler, because it's part of their business strategy to generate bad code for AMD processors.
Even if the claim in the original post about being "generally 15-20% faster" were true for Intel chips, it wouldn't be 15-20% faster on AMD or Intel's documentation - which clearly states the compiler generates inferior code for non-Intel chips - is wrong.
You could look at the graphs - it's pretty obvious ICC wins in almost all benchmarks on all processors, not just one benchmark.
Intel state they do different optimisations for different chips, and I'd guess it does it based on how many load/store ports there are as this seriously affects the fp throughput.
These change per-chip - i.e. the core i7 sandy bridge doubled the number of front end float load ports from Nahalem, so more OOO execution can be done, and thus the compiler can generate code differently to take this into account.
You can't expect Intel to optimise very thoroughly their compiler for all the processor models of their competitors.
>Intel state they do different optimisations for different chips,
The documentation says more highly optimized for Intel® microprocessors than for non-Intel microprocessors. again and again. Not just different, inferior.
Also the optimization notice mentioned in the Wikipedia article I linked, the one Intel was mandated to add by the courts is still there. Maybe I should quote the current version in full:
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
>You can't expect Intel to optimise very thoroughly their compiler for all the processor models of their competitors.
As I said I don't, I expect it to generate bad code for the products of the competition. And that's what it did and does. AMD dragged Intel to court over this and won. That's why the compiler documentation is now full of these "non-Intel" disclaimers.
ICC exists to sell Intel processors, one should always remember that. Intel isn't trying to make money selling compilers..
Because they're not going to spend time working out by trial and error (it's possible based on timing and evaluating code) how many float ops / cycle each AMD chip can do. For their own chips they know the numbers themselves.
So they make an assumption for non Intel chips. Maybe they assume 2 when some AMD chips can do 4.
> ICC exists to sell Intel processors
And strangely enough, if you use ICC you'll generally be getting better code out the other end regardless of what chip you run it on compared to the other two major compilers.
I don't know the current state of ICC, but previously it would ignore the instruction set the CPU claimed to support and not use a large fraction of SSE instructions on non Intel CPUs that supported it.
Centaur did a study where they changed their CPU ID to pretend claim to be an Intel part and they got a significant performance boost when running code compiled with ICC.
> The documentation says more highly optimized for Intel® microprocessors than for non-Intel microprocessors. again and again. Not just different, inferior.
No, it's "less superior".
I don't see anything wrong with this, unless somehow Intel is preventing AMD from writing their own compiler and investing more into code generation for AMD chips than Intel's.
> ICC exists to sell Intel processors, one should always remember that.
I think it's probably more reasonable to look at it as ICC exists to ensure that Intel can ship new features, optimizations, and instructions in their chips and not be totally dependent on 3rd parties to make them available to C/C++ programmers.
I would like to add that I am not actually universally opposed to using ICC for general builds but one should carefully benchmark the resulting binary in those cases.. on a non-Intel machine.
As I pointed out it is reasonable to assume that the code will run slower on non-Intel chips. Basically I just replied to the OP because I was worried about people blindly doing release builds with ICC assuming it means "higher performance for free". That maybe the case in some situations but given the background of the compiler one should always make sure.
>> who wrote the code, and how good were they with each of the languages <<
I'm repeatedly surprised when programmers don't emphasize how much comparisons can be about good programs and not so good programs, and instead drift into somewhat tribal language comparisons.
I kind-of think we know better but that knowledge just doesn't serve language advocacy, so it's not mentioned.
As we often we get hired by programming language, language advocacy has obvious importance to us ;-)
I'm surprised by how dramatic the difference is between the speed of C and Haskell. One of my professors (at The University of Glasgow, so appropriately a Haskell fan) once claimed that it had "c-like performance".
I suppose that's the point of this paper though, that "c-like performance" is a terribly vague term, meaningless without knowledge of the specific comparisons being made.
For lots of algorithms, fairly naive Haskell can get very close in performance (within 10%) of pretty decent C or C++.
For example[1] shows that for very advanced algorithms (such as BLAS), Haskell can be very performant - with the optimisation being reusable and transparent to the programmer.
That's kind of an odd comparison, using unfused C code compared to fused Haskell code. Their point seems to focus on stream fusion's advantages and possibly that optimizing C takes more effort.
To quote the paper: "Clearly “properly”-written C++ can outperform Haskell. The challenge is in figuring out what “proper” means."
Summary: Intel's HRC (Haskell Research Compiler) is an optimizing compiler for GHC's (Glasgow Haskell Compiler) "Core" intermediate language.* On six common benchmarks, it improves the performance of Haskell dramatically. But Haskell is still 4 times slower than the best C implementations of these benchmarks, on average.
* Core is just desugared Haskell and should not be confused with GHC's other intermediate languages, STG and C--. And there is no relation to Intel's "Core" microarchitecture.
Isn't Google's viewer really the default PDF viewer in Chrome? The one which doesn't add a toolbar or any tooltips that are popping up despite my mouse not hovering over them.
Scribd is for people who want to share a PDF but don't know how to do it in any other way. It's pretty much never been welcome on HN because that's the only problem it solves and everything else it does is inferior to just having a native PDF you can view with no problems and save with no problems. We are not the target market so I never understood why it was pushed on HN at all.
> Isn't Google's viewer really the default PDF viewer in Chrome? The one which doesn't add a toolbar or any tooltips that are popping up despite my mouse not hovering over them.
No, it's not, unless, Google figured out a way to install their PDF viewer plugin in my Firefox without me noticing. ;) The link 6ren posted is to Google Docs's file viewing component, now sold separately: https://docs.google.com/viewer
I'm not completely sure, but I think that for PDFs what the Google Docs viewer does is (1) generate a PNG image of each page, (2) extract the text and put in on the page as invisible HTML, positioned carefully to correspond to the PNG. The point of (2) is that it makes text selectable.
Did Scribd get 'better' in an empirical performance comparison with google's viewer?
For me to take such a comparison seriously, you need to present evidence to us transparently, with the understanding that every such comparison inevitably relies on making choices and is hence only meaningful insofar as those choices can be seen and understood by the reader
One small change I would make to the preprint would be to better normalize the graph symbols. In particular, failing to read the legends of each graph carefully might cause readers to misattribute results in subsequent graphs (for example, my mind wanted to associate Intel's HRC with "the lighter gray ones with the boxes", which is not a stable representation across all graphs).