Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the answer.

I was curious mostly because I never recalled any SIMD intrinsics in GNU code (ok, probably GIMP has them, so maybe I should say GNU utilities), so that would be a first.

It's interesting how much stuff leans on memchr, shame there aren't systematic wider versions taking more bytes to avoid false positives for longer literals (ignoring wmemchr): these could be nice and fast with SIMD.




Yeah I think the wider versions get a lot more complicated. memchr is a bit of a sweet spot, since its implementation is relatively simple. And things like glibc end up implementing specialized versions of it for most architectures _and_ instruction set extensions. (So e.g., there's one for SSE2 and for AVX on x86_64.)

And then of course there's PCMPESTRI (and its variants), but that has largely been a failure because of its high latency. :-( That's a shame, because that instruction does accept substrings up to 16 bytes.


Yeah I had some kind of brain fart thinking say a 4-character memchr() could be just as fast using the native method, but no of course it's 4x as slow (only an "aligned" memchr() like wmemchr() works like that). So yeah, it starts to get complicated quite if you want it to be fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: