There are fewer branches but there is now a data dependency between loop iterations which makes each iteration slightly slower (maybe 1-2 cycles additional latency per iteration).
Because newlines are relatively common and unpredictable a state machine is likely better. But on a long file with no new lines and no spaces the branching one should be slightly faster. (All reasoning from first principles; I have not done any benchmarking!)
The README covers the exact case you describe - the `word.txt` benchmark is just a file with 93MB of the char `x`. The state machine is still faster in this case.
wc2 is faster than wc, but unless I am missing something, I can't find an instance where the author benchmarked wc2 with the core state machine loop replaced with branches.
wc2 and wc might be compiled with different flags / use a different strategy for IO, which makes it hard to compare speeds directly. The theoretical speedup of branches is tiny compared potential speedup of changing your compiler flags!
Because newlines are relatively common and unpredictable a state machine is likely better. But on a long file with no new lines and no spaces the branching one should be slightly faster. (All reasoning from first principles; I have not done any benchmarking!)