> But to build a compiler, you need to be able to see the output. A disassembler had to be built along with the compiler.
Until very recently the dominant paradigm, at least for ahead-of-time compilers on Unix, was to emit textual assembler code and run a separate assembler on it behind the scenes. No disassembler needed, you can ask GCC or LLVM based compilers to just give you the intermediate data with -S.
> But running obj2asm is a separate process, and the output is filled with all the boilerplate needed to create a proper object file. The boilerplate is rarely of interest, and I’m only interested in the generated code for a function.
Sounds like a bug in obj2asm. GNU objdump, for example, has handy -d and -D flags for disassembling only the code for one or all symbols.
> One would think that the way to do this would be to have the compiler generate the assembler source code, which would then be run through an assembler like MASM or gas to create the object file. I figured this would be slow and too much work.
It was fast enough even for the very first C compilers... <checks calendar> 40 years ago. For actual compilations. Not to mention that when you as a human want to read the assembly code, it doesn't matter how "slow" the -S flag's output is. No matter how "Alpha" you are, the bottleneck will be you, the human.
> Instead, the disassembler logic actually intercepts the binary data being written to the object file and disassembles it [...] I am not aware of any other compiler that does this in the same way.
The HotSpot JVM has -XX:PrintAssembly and -XX:CompileCommand=ClassName.methodName flags that do this in the same way.
Just like the author, I am not aware of any other, other compiler that does this in the same way. Because just like the author I haven't bothered to check. But if I had to bet, I would bet that all major just-in-time compilers do this in the same way.
Yes, indeed, there are other ways to look at the assembler output. I've been doing that for 40 years. I heard the same things when D added builtin unittests, and builtin documentation generator. But we discovered that when things are builtin, and very convenient, it changes things dramatically.
I suggest you try it before dismissing it. It's changed the way I work, and has started changing the way others do, too.
> It was fast enough even for the very first C compilers... <checks calendar> 40 years ago.
I actually did write a C compiler 40 years ago. Emitting assembler would mean another round trip to the floppy disk, and one might even have to swap floppies to do it. It would have been slowed to the point of making the compiler uncompetitive. Even if one forgave MASM for being miserably slow, as it had a linear symbol table and was multipass itself.
Consider also that in those days, compilers did not run the linker. You compiled the code with one command, and linked with another. Mine (Datalight C) would do both in one step.
After seeing that, every other compiler vendor did the same thing.
Thanks for the tip on what Hotspot does. That seems to have been added starting with JDK8, after I had stopped working with Java. When I used it, one had to use the debugger.
> I suggest you try it before dismissing it. It's changed the way I work, and has started changing the way others do, too.
Try what? I know (and mentioned) two distinct ways of looking at the assembly code produced by my C compiler. It doesn't matter to me how -S is implemented because even if it's "slow" I am physically unable to count nanoseconds.
I also know (and mentioned) that I know how to dump disassembly from HotSpot's memory. Yes, it's useful.
> Consider also that in those days, compilers did not run the linker. You compiled the code with one command, and linked with another.
https://en.wikipedia.org/wiki/Datalight says that Datalight was founded in 1983; I have no way of telling whether you had started developing your compiler well before that.
In those days was indeed 1983. None of the many C compilers I had access to at the time could do it in one command. Everyone I showed it to was surprised and pleased that it could be done in one command. I was in the DOS world, not Unix. Very few DOS programmers had any experience with Unix, or any access to a Unix machine. This was long before Linux.
> Try what? I know (and mentioned) two distinct ways of looking at the assembly code produced by my C compiler
Do you find:
cc -c test.c
objdump -d test.o
faster or slower to type than:
dmd -c test.c -vasm
especially when using command completion? For me, it's no contest, especially when doing it repeatedly, and with command completion.
For all but the smallest work I don't think it actually makes any difference. I always dump the disassembly to a file so I couldn't care less where it comes from, when debugging real code at least.
Testing the code generator is a different question, which I hope will be helped by vasm since I've actually had to move demonstrations off dmd and onto GCC since the code was so bad (in particular bad register allocation, I'm not that bothered about redundant cmp-s).
<shrug> At this point we have reduced your blog post's message from "Look at this amazing new feature! It's so novel and unique! All other compiler developers are complete idiots!" to "Sometimes when I read assembly code I'm not interested in debug information.". I am happy to concede this point.
There is rather a lot of garbage in those .s files for things like debuginfo, stray comments the GCC developers accidentally put in their asm output, linker commands, etc.
On the other hand it properly shows relocations (like global variable references) which disassemblers tend to get wrong.
A moisture vaporator, also known as a vapor spire, was a device used on moisture farms to capture water from the atmosphere. Though stationary, they included programming that used a binary language to communicate with other equipment. Author employs moisture vaporators as tremendously clever reference to a notorious feature film and modern mythology.[1]
Until very recently the dominant paradigm, at least for ahead-of-time compilers on Unix, was to emit textual assembler code and run a separate assembler on it behind the scenes. No disassembler needed, you can ask GCC or LLVM based compilers to just give you the intermediate data with -S.
> But running obj2asm is a separate process, and the output is filled with all the boilerplate needed to create a proper object file. The boilerplate is rarely of interest, and I’m only interested in the generated code for a function.
Sounds like a bug in obj2asm. GNU objdump, for example, has handy -d and -D flags for disassembling only the code for one or all symbols.
> One would think that the way to do this would be to have the compiler generate the assembler source code, which would then be run through an assembler like MASM or gas to create the object file. I figured this would be slow and too much work.
It was fast enough even for the very first C compilers... <checks calendar> 40 years ago. For actual compilations. Not to mention that when you as a human want to read the assembly code, it doesn't matter how "slow" the -S flag's output is. No matter how "Alpha" you are, the bottleneck will be you, the human.
> Instead, the disassembler logic actually intercepts the binary data being written to the object file and disassembles it [...] I am not aware of any other compiler that does this in the same way.
The HotSpot JVM has -XX:PrintAssembly and -XX:CompileCommand=ClassName.methodName flags that do this in the same way.
Just like the author, I am not aware of any other, other compiler that does this in the same way. Because just like the author I haven't bothered to check. But if I had to bet, I would bet that all major just-in-time compilers do this in the same way.