The "asynchronous state machine" name here is a bit strange, when searching for this term used elsewhere I couldn't find any formal definition what it is. Reading further in the README it looks like the author implies that it really just means a DFA? Not entirely sure.
I'd also like to add the Plan 9 implementation[0], which also uses the properties of utf8 as part of its state machine and anecdotally has been quite performant when I've used it.
"Asynchronous" isn't part of the name of some really cool state machine :-) Its just an adjective and means the same as when you put it in front of any other noun.
A synchronous state machine is one where the incoming stream of events is always "in sync" with the state transitions, in the following sense:
1. When an event happens, the state machine can transition to the next state and perform any necessary actions before the next event happens
2. After the the state machine has transitioned and performed any necessary actions, the program must wait for the next event to happen. It can't do anything else until it does.
An asynchronous state machine doesn't make the main program wait until the next event happens. It can go on and do other things in the meantime. It doesn't have to wait for then next event to arrive.
When the next event does arrive, the program pauses whatever else it is doing, and hands control back to the state machine, which process the event, and then hands control back over to the main program.
I was not treating "asynchronous state machine" as a noun, even if taken as a generic adjective it doesn't make sense in this context. What "other things" is this wc2.c doing while the state machine is churning? There is no multi threading or multi processing going on here. So I find it hard to believe that this use of "asynchronous" is inside of what I would generally see it used as. As such I thought perhaps it referred to a specific methodology for designing the code, something akin to how the "lock free" adjective implies a certain design sensibility.
AFAICT, wc2.c isn't written to be an asynchronous state machine. It doesn't ever seem to transfer the control to any other place.
// So I find it hard to believe that this use of "asynchronous" is inside of what I would generally see it used as
Yeah, you are legitimately confused. The post talks about asynchronous state machines, but w2c.c isn't an example of that. I'm sure this gave you a severe case of WTF?!??
// thought perhaps it referred to a specific methodology for designing the code
It does---that's exactly what it is, a programming methodolog, or perhaps better put, a design pattern. But w2c.c isn't an example of code written using that methodology. Again, you are legitimately confused here, because the post talks about something and w2c.c isn't that.
Do you know python? If you google for "asynchronous programming in python" you'll get all kinds of blog posts and youtube videos which explain the technique.
Why would the author of this repository make "wc2 - asynchronous state machine parsing" his header of his README if indeed wc2 was not by his own definition an "asynchronous state machine"? I ask you to consider what is more likely: that your blanket definition of asynchronous is incorrect as applied here or the author is just fucking with us by adding random words as the description of his project.
Indeed this is very confusing! The program implements a pretty standard state machine (ok), but there is nothing apparently async here. The auth alludes to combining the state machine with async IO in this paper (https://github.com/angea/pocorgtfo/blob/master/contents/arti...), but this implementation is just using fread to (synchronously) read a chunk of bytes.
Furthermore, given disk caching and memory mapping, I'm not convinced async IO would really be that astonishingly different, as individual reads are going to be amortized over pretty much the same bulk reads that the sample program is doing.
As the author says themselves, it seems the main win is hand implementing the incremental utf8 parsing instead of calling a library/os function.
> I ask you to consider what is more likely: that your blanket definition of asynchronous is incorrect as applied here or the author is just [elided] with us by adding random words
LMAO!!! Well, when you put it that way, I can't blame you for not believing me. Your skeptical mindset will no doubt serve you well in this era of deep fakes and AI hallucinations.
Alas, it is also an example of how this skepticism, however necessary, is going to slow down the sharing of information :-( Its the price we're going to pay for so much lying and such a breech of the social contract.
I assure you, however, w2c.c is not asynchronous. It would be nice if the author could step in here and clarify, because it is hella confusing.
I don't believe the author is Effing with us either--documentation and comments are not automatically synced with the code they describe, so its easy for them to drift apart. Perhaps the author is intending to implement asynchronous features in the future, or perhaps he changed his goals between when he wrote the README and when he wrote the code.
As I see it, state machines are particularly good for expressing logic in asynchronous systems. For instance in the late 1980s I wrote assembly language XMODEM implementations for the 6809 and the 80286 and since that kind of code is interrupt drive it is efficient to make a state machine that processes one character at a time. Today when you use async/await the compiler converts your code, loops and all, into a state machine.
I'd also like to add the Plan 9 implementation[0], which also uses the properties of utf8 as part of its state machine and anecdotally has been quite performant when I've used it.
[0] http://git.9front.org/plan9front/plan9front/107a7ba9717429ae...