The Binary Language of Moisture Vaporators

disas · on Jan 24, 2022

> But to build a compiler, you need to be able to see the output. A disassembler had to be built along with the compiler.

Until very recently the dominant paradigm, at least for ahead-of-time compilers on Unix, was to emit textual assembler code and run a separate assembler on it behind the scenes. No disassembler needed, you can ask GCC or LLVM based compilers to just give you the intermediate data with -S.

> But running obj2asm is a separate process, and the output is filled with all the boilerplate needed to create a proper object file. The boilerplate is rarely of interest, and I’m only interested in the generated code for a function.

Sounds like a bug in obj2asm. GNU objdump, for example, has handy -d and -D flags for disassembling only the code for one or all symbols.

> One would think that the way to do this would be to have the compiler generate the assembler source code, which would then be run through an assembler like MASM or gas to create the object file. I figured this would be slow and too much work.

It was fast enough even for the very first C compilers... <checks calendar> 40 years ago. For actual compilations. Not to mention that when you as a human want to read the assembly code, it doesn't matter how "slow" the -S flag's output is. No matter how "Alpha" you are, the bottleneck will be you, the human.

> Instead, the disassembler logic actually intercepts the binary data being written to the object file and disassembles it [...] I am not aware of any other compiler that does this in the same way.

The HotSpot JVM has -XX:PrintAssembly and -XX:CompileCommand=ClassName.methodName flags that do this in the same way.

Just like the author, I am not aware of any other, other compiler that does this in the same way. Because just like the author I haven't bothered to check. But if I had to bet, I would bet that all major just-in-time compilers do this in the same way.

WalterBright · on Jan 24, 2022

Yes, indeed, there are other ways to look at the assembler output. I've been doing that for 40 years. I heard the same things when D added builtin unittests, and builtin documentation generator. But we discovered that when things are builtin, and very convenient, it changes things dramatically.

I suggest you try it before dismissing it. It's changed the way I work, and has started changing the way others do, too.

> It was fast enough even for the very first C compilers... <checks calendar> 40 years ago.

I actually did write a C compiler 40 years ago. Emitting assembler would mean another round trip to the floppy disk, and one might even have to swap floppies to do it. It would have been slowed to the point of making the compiler uncompetitive. Even if one forgave MASM for being miserably slow, as it had a linear symbol table and was multipass itself.

Consider also that in those days, compilers did not run the linker. You compiled the code with one command, and linked with another. Mine (Datalight C) would do both in one step.

After seeing that, every other compiler vendor did the same thing.

Thanks for the tip on what Hotspot does. That seems to have been added starting with JDK8, after I had stopped working with Java. When I used it, one had to use the debugger.

disas · on Jan 24, 2022

> I suggest you try it before dismissing it. It's changed the way I work, and has started changing the way others do, too.

Try what? I know (and mentioned) two distinct ways of looking at the assembly code produced by my C compiler. It doesn't matter to me how -S is implemented because even if it's "slow" I am physically unable to count nanoseconds.

I also know (and mentioned) that I know how to dump disassembly from HotSpot's memory. Yes, it's useful.

> Consider also that in those days, compilers did not run the linker. You compiled the code with one command, and linked with another.

Not sure what "those days" you mean. Running a single "cc" command and getting a single executable seems to have been well established by 1978: https://archive.org/details/TheCProgrammingLanguageFirstEdit...

https://en.wikipedia.org/wiki/Datalight says that Datalight was founded in 1983; I have no way of telling whether you had started developing your compiler well before that.

WalterBright · on Jan 24, 2022

In those days was indeed 1983. None of the many C compilers I had access to at the time could do it in one command. Everyone I showed it to was surprised and pleased that it could be done in one command. I was in the DOS world, not Unix. Very few DOS programmers had any experience with Unix, or any access to a Unix machine. This was long before Linux.

> Try what? I know (and mentioned) two distinct ways of looking at the assembly code produced by my C compiler

Do you find:

    cc -c test.c
    objdump -d test.o

faster or slower to type than:

    dmd -c test.c -vasm

especially when using command completion? For me, it's no contest, especially when doing it repeatedly, and with command completion.

Besides, I like the concise output of -vasm:

    foo:
    0000:   89 F8                    mov       EAX,EDI
    0002:   01 C0                    add       EAX,EAX
    0004:   C3                       ret

better than objdump's:

    test.o:     file format elf64-x86-64


    Disassembly of section .text:

    0000000000000000 <foo>:
       0:   55                      push   %rbp
       1:   48 89 e5                mov    %rsp,%rbp
       4:   89 7d fc                mov    %edi,-0x4(%rbp)
       7:   8b 45 fc                mov    -0x4(%rbp),%eax
       a:   01 c0                   add    %eax,%eax
       c:   5d                      pop    %rbp
       d:   c3                      retq

mhh__ · on Jan 24, 2022

For all but the smallest work I don't think it actually makes any difference. I always dump the disassembly to a file so I couldn't care less where it comes from, when debugging real code at least.

Testing the code generator is a different question, which I hope will be helped by vasm since I've actually had to move demonstrations off dmd and onto GCC since the code was so bad (in particular bad register allocation, I'm not that bothered about redundant cmp-s).

bombcar · on Jan 25, 2022

I've always preferred MASM output compared to whatever it is GNU does by default.

disas · on Jan 25, 2022

I find

    cc -S test.c

fastest to type.

WalterBright · on Jan 25, 2022

You forgot the:

    cat test.s

disas · on Jan 25, 2022

For actual analysis as opposed to cutesy blog post demos I'd actually use

    vi test.s

But if you insist on doing things the hard way on stdout, GCC has got you covered as well. Add -o - to the command line.

WalterBright · on Jan 25, 2022

Then I get:

        .file   "test.c"
        .text
        .globl  fn
        .type   fn, @function
    fn:
    .LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
    .LFE0:
        .size   fn, .-fn
        .ident  "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4"
        .section        .note.GNU-stack,"",@progbits

No thanks.

disas · on Jan 26, 2022

<shrug> At this point we have reduced your blog post's message from "Look at this amazing new feature! It's so novel and unique! All other compiler developers are complete idiots!" to "Sometimes when I read assembly code I'm not interested in debug information.". I am happy to concede this point.

astrange · on Jan 25, 2022

There is rather a lot of garbage in those .s files for things like debuginfo, stray comments the GCC developers accidentally put in their asm output, linker commands, etc.

On the other hand it properly shows relocations (like global variable references) which disassemblers tend to get wrong.

disas · on Jan 24, 2022

> the very first C compilers... <checks calendar> 40 years ago

Ah. Checking the calendar is not sufficient, one should also be able of doing correct arithmetic. 1972 was 50 years ago, not 40.

yusefnapora · on Jan 24, 2022

One of my first jobs was programming binary load lifters, very similar to your vaporators in most respects!

StrictDabbler · on Jan 25, 2022

My job is literally speaking the binary language of cooling and dehumidification systems.

I hadn't realized/noticed that Luke bought C-3PO to do my job.

Not sure how to feel about it.

Zababa · on Jan 24, 2022

I don't understand what "moisture vaporators" in that article refers to.

Maursault · on Jan 24, 2022

A moisture vaporator, also known as a vapor spire, was a device used on moisture farms to capture water from the atmosphere. Though stationary, they included programming that used a binary language to communicate with other equipment. Author employs moisture vaporators as tremendously clever reference to a notorious feature film and modern mythology.[1]

[1] https://www.youtube.com/watch?v=eUH2_n8jE70&t=0m30s

Zababa · on Jan 24, 2022

Thank you for the explanation.

WalterBright · on Jan 24, 2022

I always think of moisture vaporators when thinking about binary machine code.

WalterBright · on Jan 24, 2022

Walter here, AMA!