Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Is anyone still using assembly language? You betcha (2007) (eetimes.com)
83 points by agomez314 on May 20, 2022 | hide | past | favorite | 119 comments


My father still uses assembly language, on 8-bit Motorola (now NXP via Freescale) microprocessors. He's ben doing it since the early 80s, for data monitoring and control devices for various applications in agriculture, industry, academia and other sectors.

I've been working with him for the past few years to modernise his products and give them a future beyond his retirement, which is finally happening now (he's 74).

The device we've been working is an agricultural data monitor that reads from sensors in the soil and atmosphere (measuring impedance, voltage, pulse and digital signals), stores it in local flash memory then periodically uploads it via to my web app via a U-Blox cellular/IoT modem.

It's amazing what he's been able to make it do, and how simple/cheap the componentry is and how low the power consumption is. And whilst the coding and debugging is pretty slow and laborious, once it works, it's incredibly reliable and stable.

Now he's retiring, if the business is to continue we'll basically have to find someone who can migrate it to higher level platforms and languages. There aren't many (any?) people around willing/able to work with this kind of assembly code these days.

It feels like a shame. Other players doing this kind of work use higher-level platforms, which may be faster to develop for and debug but are more costly to manufacture, use more power and are less reliable.

I'm hoping we can find a balance between the easier/faster development cycle of newer platforms, whilst retaining at least some of the simplicity and elegance of my father's approach.


> Other players doing this kind of work use higher-level platforms, which (...) are less reliable

Is my assumption right that the reliability of these programs is based on the proficiency and expertise of your father, rather than being a platform/language issue? Or do these platforms come with complexity that makes them more fragile in your opinion?


It's not specifically about my father's proficiency/expertise. (There's a quirky brilliance to the way he builds this stuff, but it's unconventional. He's an electronics engineer, not a computer scientist; he taught himself assembly language when the 6800 micros came out in the 70s, and has never felt the need to learn newer techniques).

So yeah, it's more about the newer platforms having more hardware and software abstraction layers, so more things that can break and they have much higher power consumption.

In our applications, the weakest links are always the more "modern" communications modules, of which we've used Bluetooth and Cellular/IoT. In both cases, they are high-level SoC devices, with all kinds of capabilities and the ability to run BASIC or Python scripts on them. But they turn out to be much less reliable than our assembly code on the basic micros we use, and are the source of almost all device faults.

To illustrate the real-world advantage of the way we do it: our biggest competitor (a significant player that has shipped thousands of devices globally), uses a more modern/high-level platform (I don't know what exactly - it may be Arduino). One of our customers was given one to try out. Like most of our customers, they wanted data uploaded to the web app every 60 or even 30 minutes in order to have precise visibility of their soil moisture throughout the day. Ours can do that, running on just two 3.7V Li-ion cells that last for 1-2 years per charge. For the competitor product, to upload more than twice a day, it needs a solar panel, which needs to be mounted high up on a post above the crop. But that's a deal-breaker for this grower and many others, due to the use of pivot sprinklers and machinery.


I'm not convinced it's an assembly language advantage per se. Most of these very low power MCUs now come in variants with ridiculous amounts of flash that negate the space advantage of hand tuned assembly. The binary a C compiler will generate will be bloated but the tight loops that matter will run with comparable speed and induction count, and the extra dead code and flash will not impact power draw.

So you are down to C knowledge, a datasheet and a register and pin map, which many people in their 20s definitely knew how to handle 10 years ago, when I used to work for Freescale/NXP. To expedite that further you can go with Processor Expert, which will generate C code and abstract much of the datasheet swordsmanship. Can be finicky to setup, but the "drivers" you get (C routines) benefit from long debugging sessions and lessons we learned about the "proper" way to use a peripheral, sometimes using info from the hardware teams that was not in any datasheet.

The thing to win with newer, power hungry platforms is the so called "race to idle". They are so powerful and have such efficient low power modes that the best battery life is obtained by cramming as much work in a wakeup as possible and then going to sleep, as opposed to having an exceptionally low power machine that is awake most of the time. So someone with an 80s paradigm might need to adjust to this reality.

Thousands of units is nothing and definitely not enough to (profitably) stay on hand tuned assembly.


Sounds like they didn't optimize for low power. You can easily get 1-2 years out of a Nordic MCU


It sounds like this is the abstraction/capability/risk tradeoff. Higher level languages and systems can let you do more things, at the reduced cost of not having to understand all the components - but then you're not perceiving the quality of the components and are exposed to their limitations.


That’s a gigantic difference. Almost comically so.


If I were in a different place, I would ask for the job.

You should ask around.

Assembly language itself is not hard. At least not on those smaller 8 bit Moto chips.

What needs to happen is your father's notes, hopefully code comments, math, etc. and how that maps to the problem space needs to be recorded.

Others can write the code. Frankly, you can likely do that given a little time.

8 bit assembly is both fun and a niche skill. Tons of people write it for fun these days, myself included.

There is probably someone near you who knows this stuff from the retro computing scene.


Why do you have to “migrate to higher level platforms and languages”? Assembly is perfectly valid for this application.


C is much better for this now, you are not tied to the processor architecture and thus can switch to a better/cheaper cpu easily.

The cell modem he is using already has a 32 arm cpu maybe even multi-core.

32bit it the new 8bit.

C is the new assembly for 99% of code. You can drop to assembly for the 1% where it matter.


  not tied to the processor architecture and thus can switch to a better/cheaper cpu easily.
The core processor arch yes, but for many (most?) of these small embedded systems, a large part of the code is basically device drivers used to communicate with GPIO, a plethera of serial buses, and any number of other devices which vary from microcontroller to microcontroller. So unless you offload all that to a RTOS of some kind (which your then limited to devices the RTOS supports) switching devices isn't easy. And for many things, the RTOS overhead will be the largest consumer of CPU+RAM. That is sorta the beauty of the ardunio ecosystem were much of this has been abstracted into common libraries across a number of device families.


State machines in C work great when you don’t want a preemptive RTOS.

I did a embedded project on 8051 like that, we had the legacy code 100% in asm and we moved to C. In the end I think we had about 10-20 lines of asm and everything else in C.

You are right switching cpu and thus all the peripheral is major headache, not something to take lightly.


Is relatively trivial. Have done it countless times since the early '80's.

In fact I find that learning new high level packages much harder.


This is pretty important right now, since there's tons of micros you literally cannot buy, some venerable 8-bitter lines are being cut off entirely and there's a whole new RISC-V panoply coming down the tracks.

With C, you just need to rewrite the device drivers (because your business logic is portable and unit tested, right?!). In assembly, it's time to throw it all away and start again.

For a trivial programme like read the ADC and do something like alert on a UART, that's not so bad. But some products are a lot more complex than that.


Or you could simply learn the command set for a new micro. Once you are fluent in one, moving to another is no big deal.


Moving your skills might not be a big deal. Moving an entire mature application might be.

Or it might not be. If it's just a simple loop, probably fine. If you have to move a custom assembly RTOS and an entire stack of business logic, maybe it's going to be very painful, project-risky and error-prone. Whereas if it was C, you just need to port drivers (which may already be provided by the manufacturer) and maybe do the RTOS port (if not already done). That could be days to weeks of work. A rewrite in a new assembly language could be months, at the end of which, to a customer, you have the exact same product you had before, except without a proven track record for any single byte of code.

Like everything, it depends on the project.

This year especially, it's going to be hard enough to stay alive by just shipping anything at all while competitors go to the wall under the double whammy of customer hesitancy and component shortages that can cut an established product of at the knees. And the last thing a company needs then is a long and expensive rewrite blocking shipping.

Not only that, if the chips you rewrote for end up on 100 week lead times (and thats not even the longest I've seen this month), you're completely dead in the water.


One could emulate those 8 bit CPU's and get good performance too.

The bonus being able to continue using well tuned, mature routines.

An emulation done once, would bring the whole works onto a new CPU.


Maybe, but a well-tested routine running in a new emulator is no longer well-tested.

There's a careful value judgement to be made there. For example, if you have many products all in some dying 8-bit architecture, and it's easier to write the emulator once and then all the products are rescued, then good. If you spend 6 months writing the emulator to save yourself from 6 weeks of porting one simple program, less good.

This is when "hey wouldn't it be really cool if" meets the cold realities of commercial engineering.


I have seen all sorts of stuff done small scale commercial. I agree with you.

I took this particular case to mean there are lots of products.

Could also just put the CPU into an FPGA, which would be a choice I would look hard at personally. Many 8 bitters are out there, some cycle exact.

Then one has a hardware solution to build. Could go quicker.


Watch out for FPGAs this year: the prices are running something like 200x normal, if you can buy them at all. A $5 part is over $1000 on the grey market.


Yes, prices and availability are all over the place.

I personally would emulate.

Over all, the take away is portability has more flex in it than is often recognized.


Yep, that's exactly what we've been talking about.


If we could find developers willing/able to do it, I'd happily stick with it. Such people are hard to find, but anyone reading this who is interested, feel free to get in touch (email address in profile).


I imagine it's a lot easier to find someone to hire with python (picopython) or c# (nano .net framework) skills.


That's what other players in the space do. I'd happily settle for C, in which case we wouldn't need to change the hardware much.


Serious question, do these run on stuff outside of “toy MCU’s” almost everytime I hear them mentioned it’s in the realm of education, hobbyist stuff, etc.


That's not my area of expertise, I was commenting on the hiring side. Maybe OP could use those "toy MCU's" as bait to attract some college interns? They might get lucky.


The Arduino portenta boards are pretty powerful. I just don't see where you would want to use one instead of a full OS


The full OS is what blows the power budget.


his company eventually did mobile ports of some of his games

I'd be interested in knowing how they did that as it was all x86..!


> Now he's retiring, if the business is to continue we'll basically have to find someone who can migrate it to higher level platforms and languages. There aren't many (any?) people around willing/able to work with this kind of assembly code these days.

I think you can definitely hire someone for training and go from there. I don't see a huge problem about assembly language.

Does your father keep notes? Is it possible to share them?


Mate, how does this "migration" look like (I have no idea how to imagine it) and what platforms are you writing of?


Transport Tycoon Deluxe and Rollercoaster Tycoon were famously programmed in assembly by Chris Sawyer, essentially on his own. I recall they worked on my Pentium 166 practically without issue, even with large maps; perhaps only a tiny bit of lag when zooming out to view the whole thing. They've since been unofficially rewritten in C++ [1, 2].

[1] https://www.openttd.org/ [2] https://openrct2.org/


http://www.chrissawyergames.com/faq3.htm

His site's still up!

> What language was RollerCoaster Tycoon programmed in?

> It's 99% written in x86 assembler/machine code (yes, really!), with a small amount of C code used to interface to MS Windows and DirectX.


I am not really familiar with assembly, I only tried it once, but a game in assembly seems a little too complicated!


Everything used to be, one of the reasons for CISC instruction sets is the complex instructions do many things at once, which is very convenient for programmers. An assembly instruction becomes more like a function call that does several low level things. Sure CISC assembly is more complex, but if your are an experience programmer those complex instructions do things that you find very useful and make your program more maintainable. Part of the problem with modern assembly is new instructions have been shoe-horned into existing CPU designs instead of elegantly designed in from scratch, the fact that they had to fit those instructions in meant that compromises had to be made.

Today there is no point. Optimizing compilers have gotten very good, and while it is still possible for hand tuned assembly to beat compiled code, with compiled code you change a flag and your code is optimized better for some other CPU that the hand tuned code for a different but compatible CPU. Then change a flag again and your code runs on a completely unrelated CPU family.


I started game development around the Gameboy Color days and everything was in assembly. Every other platform had shifted to C/C++.

It wasn't bad actually. We had macros to do 16 bit work with 8 bit registers, etc. so the code was much more high level than it sounds. We did have to be very careful with the stack (making sure it's balanced), but otherwise very similar to coding in C.

With modern architectures, I much prefer C and intrinsics and let the compiler deal with it (and nudge it when it gets it wrong).


Almost all console and microcomputer games were assembly until well into the 80s. Atari 2600 games, for example, had 2kb of ROM and 128 bytes of memory. Careful handcrafting was required not only to get the game to fit in the cartridge but to "race the beam": reprogramming the sprite registers as the image was generated, line-by-line, to achieve more sophisticated graphics.

(Original Spacewar was PDP-11 assembly. Original Colossal Cave appears to have been FORTRAN.)


Gotta say that was a bad choice even then.


By what measure? It's a great game, that people still talk about, ran more smoothly than a lot of current title. And it sounds like he knew the language. So how was this a bad choice?


He could’ve gotten essentially equivalent performance writing it mostly in C, with only the time-critical parts in assembly. Assuming equivalent knowledge of C, it would’ve been quicker to write, easier to update and port, and easier for others to work with the code.


Not to sound rude, but this mostly seem like "random" assumptions, even on the goals side.

Was there others to work on the code? Was portability a goal? Would it matter if the interface is a C for DirectX anyways?


You don’t sound rude. These assumptions are common for games in general, but who knows what the assumptions were for this game in particular? Or whether this programmer knew C well at the time? Or if Microsoft was likely to ask for ports or updates?


Maybe. Maybe not. The C compilers back then were not as good at optimising machine code as modern C compilers are.


It was possible: profile your program to find the hotspots, rewrite those in assembly, rinse and repeat.


My Prof. once said that the only difference between writing assembly and coding in a high-level language is that you have to type more.


And that opinion would probably turn grey on HN. At best, it's a not-so-funny joke.


That is technically correct. Which is the best kind of correct.


Bad choice that netted Chris a bunch of money and made iconic games. It's not stupid if it works.


Totally. The sausage factory makes delicious sausage, but you might not want to see inside!


Have you ever programmed in assembly? Code that I wrote 20 years ago in Motorola 68000 assembly is easy to read and understand for me today. While I have seen “modern” code written in modern languages (by other people) that made me want to throw up and/or throw things at those people :)


Yes, though not for a long time now. 68000 was actually some of the hardest for me to deal with, when folks would creatively use EVERY register…and in a different way in each procedure.

But of course you are right, good and bad code has been written in every language. And guess what folks, your code is NOT “self-documenting,” you’re just being lazy by not writing comments! OK, soapbox ended! :-)


> good and bad code has been written in every language

Yes indeed. There seems to be no limit to how bad code can be written in any programming language :)


There are degrees to "assembly". There's .s source files. There is inline assembly with varying degrees of compiler plumbing/spilling/automation. There are intrinsics where you delegate register allocation to the compiler and allow it to optimize if it can.

Then, sometimes you write specific C idioms that you know LLVM is going to turn into certain instructions.

For highest performance, you will always think in assembly, and in fact in the internals of your target CPU (as far as they are known), even if you have various layers of software to produce the bitstream in your object files.


Slightly late, but assembly is still used fairly often in the modding scene for older video games, as well as for creating homebrew titles for said systems. The resources used for your average Super Mario Bros 1/3/World hack are used written directly in assembly language, as are quite a few standalone games.

It's because the systems are underpowered enough that using a higher level language wouldn't get very good results, and because access to the original source code is basically nonexistent. For those titles which have been entirely disassembled (like say, Super Mario 64), the average mod now seems to be made in a language like C instead.


Years ago I wrote a Space Invaders clone in Motorola 68000 assembly on the Commodore Amiga. That was a fun project! I recently found a printout of the assembly source code and I was surprised how easy it was to read and understand. I felt ready to jump in and continue working on it. It was much easier to read and understand than most “higher level language” code I have seen in my career. I suspect that the lack of higher level language features kinda forces you to write simple clear code.


I don't agree here. What is "simple clear code"? Space Invaders is no benchmark for me. I myself do demo coding on Amiga 68000 and debugging consumes a lot of time with the infamous Gurus. Compare this to a breakpoint in your code on modern machines. Rebooting takes time etc. Workflow is also tough. Corrupt diskettes - and there goes your source code.

Also on a machine code level you have to deal with everything yourself. How to represent floating point math? Fast division? Large datasets? Clipping to deal with 3d vertices?

Blitter settings sometimes feel like magic not science.

Small projects are indeed fun, however larger one in my opinion not really.


Yep I agree with the Blitter. I never fully understood how it worked :)


Can anyone recommend an x86 assembler to use in 2022? Is NASM still the default choice? Also, what about a modern x86 reference, most of what I have and pretty much stops at the pentium era. Any source for an authoritative reference that covers x86-64, all the varying SIMD flavors, and any other new and interesting developments in the ISA in the last ~20 years?


fasmg: https://flatassembler.net/ it is written in x86_64 assembly (namely bootstrapping is sane not like gcc/clang/etc). x86/x86_64 intel syntax. It is actually the most powerful macro processor I know of, then be careful not to lose yourself in there and not to forget to write assembly. It has experimental support for other ISAs like arm (maybe risc-v in the future). The best way to see it, is as a macro language specification which targets writting assembler languages, with a x86/x86_64 reference implementation.

nasm: x86/x86_64 intel syntax used by ffmpeg. Main issue: its SDK is horrible with disgusting code generators and so on. It has also a powerfull macro processor, then carefull not to abuse it.

gas: from GNU binutils, tons of ISAs, but for x86/x86_64 you should switch it from its default, AT&T syntax, to intel syntax.

If some definitions have to be shared with C, I would recommend to combine their usage with a C preprocessor (gas is made to be friendly to a C pre-processor).


How compatible is fasmg with fasm? I still use fasm on Windows on Linux, but no luck in MacOS since fasm is still written in 32-bit assembly and MacOS doesn't support 32-bit apps anymore...


Heard the compatibility should be "mostly" ok: fasmg introduces some stuff missing from fasm in order to write more comprehensive assemblers (for instance the CALM macro language (in addition of the classic macro language).


So in general, any fasm code should be compilable with fasmg? In some cases, perhaps some tiny modifications are needed?


Sorry, I don't recall the details, but you can head on their real web (no javascript) forums and ask them.

I did cheat since I started directly with fasmg. Nowadays, I use 2 assemblers: fasmg and gas (I am only a user of nasm via ffmpeg building, and I _really_ dislike nasm SDK).


Here is the AMD64 reference manual https://www.amd.com/system/files/TechDocs/40332.pdf

The stuff you will need most is in Volume 3.


For a standalone assembler, I think YASM is a little more the default nowadays than NASM, although honestly the two are probably very interchangeable. Another major option is to use your C compiler's assembler (gas, clang -cc1as, or masm), or suck it up and use inline assembly directly.


There's this misconception that Assembly language is old and obsolete, and there is nothing further from the truth, and I consider it the ever-present programming language.

Sometimes, when you do the performance profiling of your software, you can find bottlenecks for which your programming language could be the issue. I'm particularly a fan of the ctypes Python library, libffi and similar. Android NDK is a brilliant piece of SDK to get closer to the metal in terms of performance.

Assembly language is not old; it just evolves with micro-architecture.


I love assembly language and I'm sad that I haven't used it for work since ~2005. I had just discovered SIMD... I have finished Shenzhen IO and Exapunks (and some TIS-100) in recent years, but they're not really comparable to x86/64.

Timely?

https://i.ibb.co/tZZdMH7/hn.png


TLS libraries also typically still have a fair amount of ASM. I remember when it was pretty much mandatory to have an SSL accelerator board to run a largish website. The optimised ASM in TLS libs + newer gen processors finally killed that off.

Of course, generally, lots of crypto libraries have ASM, but the TLS example is interesting because it's everywhere.


From 2007 but most of the contents of the article are still true today.


In the past 15 years ARM has continued to steamroll the embedded field making those weird/small architectures discussed in the article more and more obscure and niche. Using C instead of asm is pretty much the norm on ARM

For reference, STM32 series, launched in 2007, were one of the earliest Cortex-M MCUs.


There is a company named WorldSpan in Atlanta that wrote all of Delta’s flight reservation software in Assembly.


I worked at multiple airline reservation shops in the late 80's and 90's, all of which used IBM's Transaction Processing Facility (TPF). There were C compilers available towards the end of the 90's, but they were slow to be adopted, nominally because the hand-written assembly could be better optimized to meet processing time constraints. I think some of that might have been a bias on the part of the long-term programmers who had vast experience in assembly and did not care to learn C.


How come?


Those airline reservation systems were built in the 70s and 80s, and had to be extremely efficient, so, evidently they felt assembly was a better option than any of the higher level languages available around that time.


They also evolved from the original Airline Control Program (ACP), which was written in the 1960's, at which point assembly was pretty much the only way to get the level of performance required. I think there was some incidental "vendor lock-in", given that IBM was so dominant in the mainframe space for so long, but I think the more relevant principle is "conceptual lock-in", kind of like why banks still rely on COBOL. There was such a large investment in assembly that rewrites into higher level languages -- and we did consider them -- were prohibitively expensive.


C wasn't really an option in the 1970s. It wasn't far into C's existence before Bell Labs internally had a C compiler for IBM OS/360 (predecessor of MVS), but it wasn't available as a commercial product (they did agree to share it with external researchers, most significantly the group at Princeton which used it to produce the first port of Unix to IBM mainframes.) C compilers for IBM mainframes running Unix were commercially available by 1980 (as part of Amdahl UTS, which was the productisation of Princeton's port), but it took a few more years before you they were commercially available for non-Unix IBM mainframe operating systems.

There was one alternative to assembler which saw some use (albeit never reached assembler's popularity) – SabreTalk (aka PL/TPF) [0]. That was jointly developed by IBM, American Airlines and Eastern Airlines. It was a custom PL/I dialect, rather than just standard PL/I, because (i) standard PL/I includes a lot of facilities which didn't make sense in the TPF environment (such as IO and floating point); (ii) standard PL/I lacked inline assembly, but that made it harder to invoke the TPF API which revolved around assembler macros. IBM has a long history of secret internal dialects of PL/I, which it used to write its mainframe/midrange operating systems (PL/S and its descendants)–SabreTalk belongs to the same tradition, although it is not identical to any of those internal IBM dialects. Eventually, someone developed a SabreTalk to C converter, and SabreTalk sites used that to move to a mainstream language.

Most of these airline reservation systems also used COBOL. COBOL was too inefficient for the real-time transaction processing part of the system, but these systems also involved some background batch processing (aggregate reporting, accounting, etc), and that part of the system was generally written in COBOL running under MVS. It was normal for TPF sites to have TPF mainframes to run the transaction processing, and MVS mainframes to run the batch reporting and other business applications, and also to host the development environment – TPF has never been self-hosting, developers would edit/assemble/compile their code under MVS and then transfer it to TPF for testing. (In TPF's contemporary successor z/TPF, the development platform has shifted from MVS to Linux.)

[0] See https://en.wikipedia.org/wiki/SabreTalk and also http://teampli.net/Sabretalk_Reference_Guide.pdf


Back then it was the better choice. Not only where CPUs much slower, but optimizing compilers were not nearly as good (if they existed - often they didn't).


with out knowing anything my guess would be it was using some ibm mainframe. AIUI the z/architecture's assembly is fairly high level (by assembly standards)


I’ve been kinda wanting to look into z/Arch machinery, ofc I don’t have a multimillion dollar mainframe unfortunately. Hercules can do S/390 which is close enough I suppose.


Vendor lock-in, I guess :)


Vendor lock in.


There are few things as satisfying as beating GCC through pipelining, tail call recursion, and other absurd hacks as a 2nd year student who just learned to write hello world in assembly.

I feel the desire to actually learn it again and build bigger stuff.


How about updating GCC to do your optimizations?

I mean, let's be real: a needle and a steady hand hasn't been the key to getting the little man in the box to do your bidding for a while!


The optimizations were very domain specific for them to work on anything other than that particular problem, and I certainly don't know enough to contribute to projects like LLVM or GCC.


pro: you are not dependent of grotesquely and absurdely massive and complex compilers, until you don't over abuse the preprocessor.

and that alone...


Another pro is that it's often possible to beat the performance of said compilers in places where that matters if you're willing to put enough time into it.


12 years ago, we used asm for the inner loop that does peak metering in Ardour. It is essentially doing a greater_than comparison on every single sample that passes through the program, which could potentially be millions per second. Using SIMD instructions directly from asm with a not-small buffer size reduced the CPU cost by 30%.

Last year, someone was updating this code to include ARM and some newer SIMD instruction sets. We discovered that in that time period, gcc has gotten to the point where even our hand-crafted asm is barely any better (and in some cases worse) than gcc's own generated code.

Compilers improve!


Its true that compilers improve, but i would be willing to bet that if you took the C code and ran it on the original machine there is a good chance that the assembly would out perform it.

Hand optimized assembly doesn't necessarily age well, because a large part of the skillset is conforming the code to the processor micro-architecture. Compilers do a lightweight version of this with the -mtune where they utilize tables reflecting the number and type of functional units, instruction latency and throughput, etc to make various tradeoffs in instruction selection and/or placement. The usual "simple" example is a memcpy operation. There are about a dozen different ways (int register moves, utilizing vectors of various sizes, nontemporal, rep movXX, etc) to implement it, and depending on processor, memory subsystem, alignment, and transfer size there can be significant perf diffrences when picking one alternative over another even before considering whether to try prefetch hints, strip mining, or any of the other things that can made a difference.

Bottom line, just like playing with differing compilers, and compiler options can easily provide a 2-5x uplift its likely that an expert assembly programmer can gain that much or sometimes significantly more (because they can fundamentally change the algorithm) against a compilers best output, there remain a fair number of things that can be done with hand tuning that compilers simply aren't yet capable of. Since your using SIMD the compiler could be beating it simply because its utilizing one of the wider vector instruction sets vs the original code. Although, for a software package where you don't know the final machine much of this becomes more difficult unless your willing to have a half dozen optimization targets selected at runtime.

Bottom, line if it matters that much, there are still people who can take the best code you get out of a C/C++ compiler and generally provide an uplift. Whether its worth the effort, that is another question.


So, I used to do this for a living. The main reasons assembler worked better were:

- The compiler had to make conservative assumptions. Less true now you have things like restrict pointers

- The compiler didn't know how to do SIMD. Now it can with intrinsics.

- The compiler didn't know how to do the necessary transforms to make use of in-order cores. out-of-order makes the cpu do this itself, and is making its way down to smaller processors.


It depends where you draw the complexity lines: some will prefer to do anything otherwise than to depend on those grotesquely and absurdely massive and complex compilers: Rube Goldberg Machine Syndrom.


Sure, but by the time you do the CPU manufactures have come out with a different CPU with different ideal optimizations. If your code is compiled you just flip a switch on the compiler and your code is now better optimized for the new CPU than your previously well optimized code. If this really matters you can compile several hundred different versions of your code for the several hundred different X86 variants out there. And if management wants to try ARM, compiled code it is just a small change to your compiler to do that, again possibly hundreds of different executable for all the ARM variants if you feel the need.


Indeed, usually it would be appropriate only in very specific places, and if it _really_ needs to be chip implementation specific, you could install specific machine code for those implementations.

This is not perfect, but in all cases, it is much less worse than to depend on compilers like gcc/clang.


Isn't this mo ponger an issue with things like Godbolt?

When you can actually plug in and see what you get out to vet for correctness, I'd think there shouldn't be as much need to per se ignore the compiler.


I’ve not used this but looks promising

http://flatassembler.net/


Example of modern assembly in dav1d decoder implementation https://code.videolan.org/videolan/dav1d/-/tree/master/src/x...


Today lack of portability is the main problem I see with assembly.

I absolutely love AVR8 assembly but the AVR8 is a dead end in terms of performance (other than maybe "build a soft AVR8 inside an FPGA and delegate tasks to the gate away")

The fact that I can recompile C to ARM keeps we writing code in C even though I hate it.


I think people overestimate the value of 'portable' code. Even with C or other higher level languages you often have to code around platform specific problems or jump through hoops for platform specific optimizations. Regardless, if you understand what the code is doing then re-writing it for a different platform is pretty trivial, and isn't rewriting code the favorite sport of programmers anyway?

Granted it is not a modern-style piece of software, but look at David Murray's Attack of the PETSCII Robots: originally hand-written in 6502 assembly for the Commodore PET and since ported to something like 20 different platforms, many of which have wildly different video and audio hardware and several of which use a completely different processor.


This exactly. In the 80s, it was common practice to write in macro assembler, which would then be translated to the target ISA.

Microsoft wrote BASIC and others significant software in such a way.

https://devblogs.microsoft.com/commandline/microsoft-open-so...

Also compilers are hungry, wasteful, complex piece of codes fairly unsuited for 8 bits and 16 bits machines. DOS games started being written in C only when the 386 got commonplace.

Heck, even on 32 bits machines of the times (VAX), most things were written in pure assembly. That includes VMS.


Turbo Pascal wasn't bad at all for the 8088.


Granted, Turbo Pascal was nice for small CP/M and DOS machines, at least until version 3 when the IDE was still ~30kB and could fit in 64kB of RAM.

When TP4 and Turbo C were introduced with their full TUI environment, the 8088 started to struggle.


It is still widely used in microcontroller programming. Although general purpose programming is often easier in C or similar, if timing is critical, you either have to start with C and tweak the assembly or just do assembly from the beginning.


We've got https://github.com/SnellerInc/sneller which is a query engine for JSON based on AVX-512 assembly


I think there's another point here:

To be fluent in Assembler you must understand the hardware in detail.

However once you move to a high level language, the hardware becomes relatively invisible.

I started out with calculator chips in the '70s, then moved to the 6502, the 8051, and then to the PIC series.

I absolutely love the larger PICs.

But then the Arduino came along and suddenly the hardware became invisible.

For that reason I've always disliked programming in C, python, etc, and absolutely detest C++

I reckon that many millenial programmers tend to avoid Assembler because they simply don't understand the hardware.


At my first job I wrote a bunch of AVR assembly because C was too slow for what I needed it to do. I believe the application was having the microcontroller do some measurements of sound propagation through a medium that was much denser than air, and thus it needed to wake up nearly immediately after the sound happened. By the time whatever initialization the C code did was done, it was too late. Assembly did the job superbly.

This was not that long ago, around 15 years ago. I really enjoyed AVR assembly. Are these microcontrollers still in use nowadays?


Pretty sure one of the biggest educational/hobbyist dev boards of all time is uses an AVR:

https://en.m.wikipedia.org/wiki/Arduino


Funny thing is that you can do web development in assembly now: https://webassembly.org/

I feel like we've gone full circle.


Isn't everything just compiled to wasm?


Well, normally, but you can write it directly in the text format (.wat). It's doable.


Oh man. The 9900 architecture was awesome. Designed at a time when bipolar memory was faster than TTL logic, it put the register file in main memory. Only the PC, STATUS register and "WORKSPACE POINTER" were on the CPU. The general purpose registers were in memory, pointed to by the WP.

It was trivial to implement threads on that machine, just execute a BLWP (branch and load workspace pointer) when jumping between threads.


Having done some production assembler coding in the late 70's, I expect most new assembler work is now done more for fun than by necessity. Unless the code is well commented and laid out, working with existing assembler code can be very difficult. That said, I've been experimenting using the m4 macro language to make Web Assembly (WASM) a little easier to use. WASM is a well thought out language and a number of people are doing some hand coding in it.


Yes, for writing good old NES, Atari, GameBoy etc games: https://www.assemblytutorial.com/

I mostly write mobile app codes (Java/Kotlin), and there's no way I'll use assembly for work. But for fun? Heh why not? :)


If RISC-V becomes a de-facto standard (I wish), very probably a LOT of software will go back to assembly:much less planned obsolescence that you usually get with c11/c17/c23/c7893743984798, a new gcc builtin/extension used in the core of linux each weak (irony), etc.


Perhaps somewhat surprisingly, assembly (when it is a direct representation of the machine code) is indispensable in detailed analysis of algorithms, which is why Knuth uses it in his treatise (instead of, say, Algol or Fortran).


Does anyone here know of any good assemblers for ARM? Aside from binutils/gas, I mean. It feels like that architecture has never gotten as many assemblers written for it as other architectures.


Yes! The only language that comes to a distant second place in terms of ease of use and readability is python. Mixing python directly with assembly code is fucking amazing!


If anyone needs a 6502 programmer, I’m available and cheap! :-)


My first dev job used Forth (with inline assembly) on Z80-class CPUs for SCADA applications.

I'd claw my eyes out if I had to write entire applications in assembly.


ctrl+f=rust: 0/0

I am shocked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: