Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> in real systems a small simple loop will perform better due to icache pressure

Do you have a source for this? Seems surprising GCC would leave such low-hanging fruit. G++ makes the effort to reduce std::copy to a memmove call when it can, or at least some of the time when it can (or at least, it did so in 2011). [0]

Related to this: does GCC treat memcpy differently when it can determine at compile-time that it's just a small copy?

[0] https://stackoverflow.com/a/4707028/



Problem is superscalar processors the correspondence between number of instructions and speed breaks down. Partly because the processor does it's own optimization on the fly and can do multiple things in parallel.

A programmer should be careful about second guessing the compiler. And a compiler should be careful about second guessing the processor.


I'm not sure if you're implying this is premature optimisation. It isn't.

It's a performance-sensitive standard-library function, the kind of thing that deserves optimisation in assembly. It's also the kind of problem that can be accelerated with SIMD, but that necessarily means more complex code. That's why the standard library implementations aren't always dead simple.

Here's a pretty in-depth discussion [0]. They discuss CPU throttling, caches, and being memory-bound.

[0] https://news.ycombinator.com/item?id=18260154


Only personal experience. If you look at the memcpy in llvm's libc, it was contributed by Googlers who share my experience and perspective.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: