I've looked at the (standard) libpng codebase, and it's really surprising how un-optimized it is, for an ultra-popular standard library that's used everywhere. I don't understand the sociology of why everyone uses that implementation, and not a far faster replacement, such as this one. There's quite a few places where PNG decoding is a noticeable, user-facing UX issue, and if you multiply small latencies by billions...
https://news.ycombinator.com/item?id=26714831