Hacker News new | past | comments | ask | show | jobs | submit login
A Proper x86 Assembler in Haskell Using the Escardó-Oliva Functional (vmchale.com)
90 points by Smaug123 5 months ago | hide | past | favorite | 19 comments



I know I might be barking up "just a showcase" tree, but what's the memory complexity of this? E g. a nice property of multipass assemblers is that these only need to keep a list of symbols in the memory rather than the whole program. Not really a concern for modern systems with large memories, but then those neat haskell tricks usually carry some extra O(n)s - in addition to (avoidable) extra constant factors due to hidden linked lists and immutability. Aka the infamous haskell quicksort - much shorter and clear, but also not really a quicksort.


One pass, but.. I'm willing to bet there is some transitional internal state which had to be looped over to do this, subsuming the "pass" inside lazy eval. You have to know the indeterminate "where" to jump to, which depends on subsequent state. So you preserve execution state, calculate on, and return at end to fill in the gaps.


Yes - it's pretty opaque to trace through, but you can see in `valid` that for a given move history, a "good" test is performed on each instruction in the assembly listing - a full pass. This is invoked, asymptotically at least, once per jump per level of the search tree explored in `bigotimes`. Each level 'i' of the search tree has a history of `i` jumps decided to be short or near and branches 'n-i' times; both of those average out to n/2, so the tree searches n^2 nodes with each node doing 'n' iterations over the list of instructions to check all jumps are valid, so the total running time is O(n^3).

(With some modifications to the code as shown, anyway: `lookup` is linear but used in such a way it can be replaced with a constant vector index, instruction lists can be compressed to just jumps & labels with byte counts following, avoiding the duplicate invocations of `p` in `arginf`, etc).

I'd be dubious that there's an algorithm to find the true minimum in less than cubic time, but there's plenty of "good enough" approaches: most branches will be definitely short or definitely not; jumps backwards can be resolved when encountered; jumps forward can be written long and replaced with a short jump plus no-op padding when the target is encountered within short jump distance.


Yes, that seems to be what the aptly-named "tardis" monad does, but the article notes that it isn't sufficient because of the minutiae of how jump instructions themselves can vary in length under x86.

In an act of brilliance that surely proves beyond all doubt that Haskell's main reason for existing is to upstage the esolang scene, the author solves the problem by trotting out a special-purpose constraint satisfaction algorithm. (You can imagine what kind of computational complexity this implies. The creators of this algorithm note that it can enumerate all tic-tac-toe games in just over a second!)



> The creators of this algorithm note that it can enumerate all tic-tac-toe games in just over a second!

This is achievable in most languages, no?


That bigotimes function that it's talking about is doing something like tree search with backtracking - lazy evaluation isn't really relevant as far as I can see. I don't think the statement that it's "one pass" is meant to mean that it's a closed form solution or something, the point is that you can write your code as plain single pass logic and this generic functional tool does the rest.


It's a showcase for Haskell and well worth reading. I should be less snide.

Nowadays "multipass" means Mila Jovovich but it used to mean "another complex language with massive implications we deal with by divide-and-conquer-through-the-filesystem" and in the case of the York ada compiler that was some amazing number like 10 or more passes.

There was a fashion for running your C instrumented and then recompiling it after runtime branch choice evidence was collected. I think the Dec Alpha OSF/1 compiler did it optionally.


> There was a fashion for running your C instrumented and then recompiling it after runtime branch choice evidence was collected. I think the Dec Alpha OSF/1 compiler did it optionally.

This is called profile-guided optimization and new compilers (well, GCC and LLVM) have it.

I mostly think it's bad and not statistically sound; given profile data the compiler is both overly trusting of it (no error bars) and can't find much useful to do with it (because there's no way to hint a modern OoO CPU).


Or you could just be pessimistic in your offset calculation and assume that all "in between" unconditional jumps are 5 bytes in length. I know I know, not proper. But I doubt that in real code, that simpler algo would make you miss a lot of opportunities for 2 bytes jumps.


Yeah, this is a fun problem. This is where the majority of my work went when writing an x86-virus that lived in the NOP areas between functions. Compilers produced those blocks of NOPs after function bodies to make the following function addresses aligned to to 16-byte boundaries. I chained all of those cavities together with jumps, distributing as much code as would fit and then putting a jump to the next cavity at the end. The trick is that you could fit more instructions (3 more bytes?) if you had a shorter jump.

I still want to write my own assembler some day.


This is basically irrelevant now that better ISAs like RISC-V have a fixed instruction length (2 or 4 bytes) so the fancy algorithm here isn't necessary.


That fancy algorithm is relevant to RISC-V (and in fact, most fixed-length ISAs) because loading an immediate into a register needs one or two instructions depending on the immediate; you surely want to elide a redundant LUI instruction if you can. Of course such redundant instructions don't harm by itself, but that equally applies to x86 as the algorithm is an optimization.


As a result of RISC-V existing, all x86 processors have ceased to exist or be produced.


Accurate, if said sometime in the future rather than today.


There are still people making z80 machines today, so no.


This same problem applies to RISC-V with the C extension, because the J and JAL instructions have a larger range than the C.J and C.JAL instructions.


Having fixed instruction length doesn't make the need to load large constants magically disappear. These just get split between multiple instructions. If anything, RISC-V might be worse. See also https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-li....


ARM would have been a better example because the amount of people that care about RISC-V is a rounding error compared to x86 or ARM.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: