Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah I think the wider versions get a lot more complicated. memchr is a bit of a sweet spot, since its implementation is relatively simple. And things like glibc end up implementing specialized versions of it for most architectures _and_ instruction set extensions. (So e.g., there's one for SSE2 and for AVX on x86_64.)

And then of course there's PCMPESTRI (and its variants), but that has largely been a failure because of its high latency. :-( That's a shame, because that instruction does accept substrings up to 16 bytes.



Yeah I had some kind of brain fart thinking say a 4-character memchr() could be just as fast using the native method, but no of course it's 4x as slow (only an "aligned" memchr() like wmemchr() works like that). So yeah, it starts to get complicated quite if you want it to be fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: