Another article in the series *beating C by moving the goal posts*. My comment o...

ajeetdsouza · on Nov 20, 2019

Author here. This is not true - I included a link to the manpage (https://ss64.com/osx/wc.html) in the article to avoid this confusion. I did not use GNU wc; I used the OS X one, which, by default, counts single byte characters. From the manpage:

> The default action is equivalent to specifying the -c, -l and -w options.

> -c The number of bytes in each input file is written to the standard output.

> -m The number of characters in each input file is written to the standard output. If the current locale does not support multi-byte characters, this is equivalent to the -c option.

Moreover, I also mentioned in the article that I was using us-ascii encoded text, which means that even -m would have been treated as ASCII text.

Hope that clarifies your issue.

danieldk · on Nov 20, 2019

It is not about the character count, but the word count. wc decodes characters to find non-ASCII whitespace as word separators. If you read further in the same man page:

White space characters are the set of characters for which the iswspace(3) function returns true.

That your text is ASCII encoded does not matter, since ASCII is a subset of UTF-8. So at the very least, you need an extra branch to check that a byte's value is smaller than 128 (since any byte that does not start with a leading zero is a multi byte character in UTF-8).

However, if you look at the implementation at

https://opensource.apple.com/source/text_cmds/text_cmds-68/w...

You can see that in this code path it actually uses mbrtowc, so there is also the function call overhead.

eMSF · on Nov 20, 2019

It only calls mbrtowc if domulti is set (and MB_CUR_MAX > 1), i.e. only when given the option -m.

danieldk · on Nov 20, 2019

You are right! So that's a Darwin oddity, still a wide char function is called in that code path, iswspace, which adds the function call overhead in a tight loop.

tom_mellior · on Nov 20, 2019

If domulti is not set, the wide char function is not called as far as I can tell. Why would it? It's explicitly meant not to do wide char stuff in that case.

FWIW, when this was going around for the first time I took this Darwin version of wc and experimented with setting domulti to const 0, statically removing all paths where it might do wide character stuff. I didn't measure any performance difference to just running it unmodified.

danieldk · on Nov 21, 2019

It's about iswspace as I mentioned in the parent comment. Replace the line

    if (iswspace(wch))

by

    if (wch == L' ' || wch == L'\n' || wch == L'\t' || wch == L'\v' || wch == L'\f')

And I get a ~1.7x speedup:

    $ time ./wc ../wiki-large.txt
      854100 17794000 105322200 ../wiki-large.txt
    ./wc ../wiki-large.txt  0.47s user 0.02s system 99% cpu 0.490 total
    time ./wc2 ../wiki-large.txt                     
      854100 17794000 105322200 ../wiki-large.txt
    ./wc2 ../wiki-large.txt  0.28s user 0.01s system 99% cpu 0.293 total

Remove unnecessary branching introduced my multi-character handling [1]. This actually resembles the Go code pretty closely. We get a speedup of 1.8x.:

    $ time ./wc3 ../wiki-large.txt
      854100 17794000 105322200 ../wiki-large.txt
    ./wc3 ../wiki-large.txt  0.25s user 0.01s system 99% cpu 0.267 total

If we take the second table from the article and divide the C result (5.56) by 1.8, the C performance would be ~3.09, which is faster than the Go version (3.72).

Edit: for comparison, the Go version from the article:

    $ time ./wcgo ../wiki-large.txt
      854100 17794000 105322200 ../wiki-large.txt
    ./wcgo ../wiki-large.txt  0.32s user 0.02s system 100% cpu 0.333 total

So, when removing the multi-byte character white space handling, the C version is indeed faster than the (non-parallelized Go version).

[1] https://gist.github.com/danieldk/f8cdaed4ba255fb2954ded50dd2...

tom_mellior · on Nov 21, 2019

Thanks, I finally understood what you are saying. Indeed, the code uses iswspace to test all characters, wide or normal. Strange design choice. For whatever it's worth, even just changing

    if (iswspace(wch))

to something like

    if (domulti && iswspace(wch))
        ...
    else if (!domulti && isspace(wch))
        ...

got something like a 10% speedup on my machine. And replacing isspace with an explicit condition like yours is much faster still. I checked, isspace is macro-expanded to a table lookup and a mask, but apparently that's still slower than your explicit check. I'm a bit surprised by this but won't investigate further at the moment.

danieldk · on Nov 21, 2019

Thanks, I finally understood what you are saying.

I am sorry for the unclear comments. I'll stop commenting on a phone ;).

Indeed, the code uses iswspace to test all characters, wide or normal. Strange design choice.

I agree, it's really strange. This seems to be inherited by the FreeBSD version, which still does that as well:

https://github.com/freebsd/freebsd/blob/8f9d69492c3da3a8c1ea...

It has the worst of both worlds: it incorrectly counts the number of words when there is non-ASCII whitespace (since mbrtowc is not used), but it pays the penalty of using iswspace. It's also not in correspondence with POSIX, which states:

The wc utility shall consider a word to be a non-zero-length string of characters delimited by white space.

[...]

C_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files) and which characters are defined as white space characters.

danieldk · on Nov 20, 2019

For the people down voting, please read the source code of GNU and Apple wc before down voting.

weberc2 · on Nov 20, 2019

https://en.wikipedia.org/wiki/Hitchens%27s_razor