How LLVM Optimizes a Function

wiz21c · on Sept 4, 2018

Maybe useless, but the article is extremely intersting and easy to read. One can definitely try to read it.

Amusing anecdote : During the very beginning of the 90’s, I happened to code a lot of very optimized code (3D engines) in assembler.

I was doing assembler because, well, that was the de facto way to get speed. So at that time, I forgot about the possibility of writing things in C (and I had not access to a C compiler) -- a bit dumb of me. So, one day, tired of writing tedious assembler, and at the same time discovering C, I rewrote one of my assembler program in C. BIG suprise, the C code was much faster than my assembler. So I disassembled the generated code and understood what the compiler did. First, it used very clever bit manipulations (unknown to me) to alter some comparisons I did. Second, it understood that because my code was just a test that didn’t produce anything, then it didn’t need to generate any code. So a big chunk of my C code was just not compiled (and rightly so). So the resulting program, was mostly doing nothing.

It was the first time I saw what an optimizing compiler means, I was totally blown away. I was beaten at the optimization game by a program :-)

The other thing I understood that day is that an optimizer is no magic. It just carefully applies little knowledge about optimization on little cases here and there. It basically automates what one would have done to optimize some assembler code by hand (except for algorithmic optimization). It also means that one can sleep well knowing that 90% of the chores of optimization are done automatically in a reliable way. We all take that for granted but when I was around 20 years old, it was just super exciting.

krylon · on Sept 4, 2018

In a previous job, I once tried to rewrite those small - tiny, really! - functions that were called all over the place in assembly, so I kind of looked at it from the opposite direction.

The experience was both frustrating and elating. In the end, the "optimized" assembly code I could come up with was pretty much identical to the code generated by the compiler. So I missed that rush of showing the compiler who's boss. This is what led me to abandon the project very quickly.

On the other hand, I was very relieved that my hand-written code was at least not worse than what the compiler could come up with. So at least I could get out with my dignity intact.

To be fair, those small functions were so small and simple (no more than three or four instructions) that there really was only one obvious way to implement them in x86 assembly. So maybe my problem set was a bad choice. It was fun, though.

eesmith · on Sept 4, 2018

No more than three or four instructions? See also https://en.wikipedia.org/wiki/Superoptimization .

daeken · on Sept 4, 2018

LLVM is seriously one of the coolest projects going on right now. On a whim, I decided to see if I could compile LLVM IR to .NET, and it took maybe a grand total of 12 hours to get it all working. I've done a lot of compiler work in the past, but nothing is so universally nice as LLVM.

dataflow · on Sept 4, 2018

Out of curiosity, did this involve actually using the LLVM code itself much? Or do you mean you wrote your own interpreter for the IR that it emits?

daeken · on Sept 4, 2018

I have a piece of code in C++ that uses the LLVM API to walk over a bytecode file generated by clang, which generates my own more easily parsed format. Then my C# takes over from there. So it has quite a lot of LLVM code, but it technically doesn't need to, in theory.

dataflow · on Sept 4, 2018

I see, okay thanks!

flamedoge · on Sept 4, 2018

I think easiest way is just to write a backend that emits .NET IR since infra is already there.

daeken · on Sept 4, 2018

There are four ways I considered for implementing this: write an LLVM backend that spits out .NET binaries; write an LLVM backend that spits out C# or CIL assembly; read the LLVM bytecode from .NET and then generate code from there; use the LLVM API to generate some intermediary then handle the compilation from .NET.

I tried #3 first, but landed on #4 in the interest of time. It's not the cleanest approach -- that'd be either using the LLVM API from .NET or implementing a proper backend (assuming there's some decent implementation of a .NET binary writer, since that's a mess) -- but it's stupid simple and effective.

lostmsu · on Sept 8, 2018

There is one, actually: https://github.com/UbiquityDotNET/Llvm.NET

lostmsu · on Sept 4, 2018

Open source?

daeken · on Sept 4, 2018

https://github.com/daeken/Nettens It's crazy early stage -- literally my project this weekend -- but it's a start. My goal is to compile Skia; maybe in a month or two.

shawn · on Sept 4, 2018

Kids these days and using the `auto` keyword... Back in my day you had to use template typename metaprogramming! </lawn>

It's quite pretty code. Thanks for making it available.

lostmsu · on Sept 8, 2018

Looks cool. You should add a license.

petermcneeley · on Sept 4, 2018

The similarity between optimization passes and transformations verses what one sees with mathematical algebraic simplifications and substitutions is not mere coincidence.

chriswarbo · on Sept 4, 2018

The tricky part is that algebraic properties of C/C++/x86/etc. are ridiculously complicated, due to things like aliasing, multiprocessing, overflow, interrupts, etc.

The "metadata" mentioned in the article is LLVM trying to figure out when these complications can be safely ignored, so that the "real" optimisation/transformation can begin.

It's interesting to compare with something like Haskell's GHC compiler, which can do a crazy amount of rearranging and optimising due the language itself having fewer of these complications (although, of course, programs written in the language may sometimes end up more complicated by "simulating" such features/effects).

Also interesting to compare with languages like Python, which are "so dynamic" that we can infer almost nothing about a piece of code ahead of time (since there's the possibility that, e.g. some class will get its methods replaced at runtime by another thread, and other such reasoning-killers). Hence why an (ahead-of-time) Python compiler can't do much optimising.

snaky · on Sept 5, 2018

There are clever optimization tricks that GHC or MLton really do, but the better point is how C and C++ lowered the bar. Quoting Fran Allen

> Seibel: When do you think was the last time that you programmed?

> Allen: Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue....

> Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels?

> Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities.

rphlx · on Sept 5, 2018

According to Compiler Explorer, while 6.0.0 does it, llvm trunk is no longer doing the GVN load optimization. It reverted to using a double load:

        mov     edx, dword ptr [rdi + 4*rcx]
        cmp     edx, dword ptr [rdi + 4*rcx + 4]

per loop. Dunno if intentional or just a regression.